![]() |
Slight modification for a multi-core machine
I've a multi-core machine and will be running several instances in parallel in the same directory. The command given in the first post doesn't work too well in that environment and I changed it to:
[code]#!/bin/sh ../gnfs-lasieve4I16e -v -r t.poly -f 100000000 -c 125000 -o t.poly.lasieve-1.100000000-100125000 & ../gnfs-lasieve4I16e -v -r t.poly -f 100125000 -c 125000 -o t.poly.lasieve-1.100125000-100250000 & ../gnfs-lasieve4I16e -v -r t.poly -f 100250000 -c 125000 -o t.poly.lasieve-1.100250000-100375000 & ../gnfs-lasieve4I16e -v -r t.poly -f 100375000 -c 125000 -o t.poly.lasieve-1.100375000-100500000 & ../gnfs-lasieve4I16e -v -r t.poly -f 100500000 -c 125000 -o t.poly.lasieve-1.100500000-100625000 & ../gnfs-lasieve4I16e -v -r t.poly -f 100625000 -c 125000 -o t.poly.lasieve-1.100625000-100750000 & [/code]The other 250K special-q from my initial 1M block will be run in a similar fashion on a dual-core laptop. Paul |
Up and running on a[code]vendor_id : AuthenticAMD
cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1090T Processor stepping : 0 cpu MHz : 3780.456 cache size : 512 KB[/code]Seem to be getting about 0.78 sec/rel on each processor. Curiously enough, I'm getting almost the same performance from a 2.13GHz Core2 Duo P7540 laptop running Win7-64. Perhaps these are just small-number statistics or perhaps I need to see what needs tweaking on the AMD Linux box. Paul |
[QUOTE=xilman;252332]Up and running on a[code]vendor_id : AuthenticAMD
cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1090T Processor stepping : 0 cpu MHz : 3780.456 cache size : 512 KB[/code]Seem to be getting about 0.78 sec/rel on each processor. Curiously enough, I'm getting almost the same performance from a 2.13GHz Core2 Duo P7540 laptop running Win7-64. Perhaps these are just small-number statistics or perhaps I need to see what needs tweaking on the AMD Linux box. Paul[/QUOTE]After seven hours, which ought to be long enough to get credible numbers, the AMD is averaging 0.771 \pm .005 sec/rel and the Intel 0.813 \pm 0.2 sec/rel. The ratio of these rates is 1.05 but the ratio of the clock frequencies is 1.77 so the AMD is significantly less efficient here. Perhaps I should check compilation options on the Linux siever. Paul |
[QUOTE=xilman;252332]Up and running on a[code]vendor_id : AuthenticAMD
cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1090T Processor stepping : 0 cpu MHz : 3780.456 cache size : 512 KB[/code]Seem to be getting about 0.78 sec/rel on each processor. Curiously enough, I'm getting almost the same performance from a 2.13GHz Core2 Duo P7540 laptop running Win7-64. Perhaps these are just small-number statistics or perhaps I need to see what needs tweaking on the AMD Linux box. Paul[/QUOTE]After seven hours, which ought to be long enough to get credible numbers, the AMD is averaging 0.771 \pm .005 sec/rel and the Intel 0.813 \pm 0.02 sec/rel. The ratio of these rates is 1.05 but the ratio of the clock frequencies is 1.77 so the AMD is significantly less efficient here. Perhaps I should check compilation options on the Linux siever. Paul |
[QUOTE=xilman;252368]After seven hours, which ought to be long enough to get credible numbers, the AMD is averaging 0.771 \pm .005 sec/rel and the Intel 0.813 \pm 0.2 sec/rel.
The ratio of these rates is 1.05 but the ratio of the clock frequencies is 1.77 so the AMD is significantly less efficient here. Perhaps I should check compilation options on the Linux siever. Paul[/QUOTE] L1_BITS (settable at compile time) may be set to 15, which is optimal for the core2 but not the AMD. I don't know if that would be enough to explain the entire difference though. |
I have posted my own L1_bits=16 binary in the top message - it may be better for AMD. Paul, your binary seems to be a bit slow (maybe non-asm?). Give this one a try. I have 0.30-0.31s/rel on a similar 1090T.
When building from source, use the src/experimental/lasieve4_64/ (well you know that) |
[QUOTE=Batalov;252377]I have posted my own L1_bits=16 binary in the top message - it may be better for AMD. Paul, your binary seems to be a bit slow (maybe non-asm?). Give this one a try. I have 0.30-0.31s/rel on a similar 1090T.
When building from source, use the src/experimental/lasieve4_64/ (well you know that)[/QUOTE]Yes, that is markedly better, thank you. Even after a few seconds the rate is around 0.36 s/r and that is still influenced by the set-up time, including the creation of the factorbases I'll kill off the currently running sievers and continue from where they finished. (Any chance of you providing comparable builds of gnfs-lasieve4I1[1-5]e please? I'm currently fighting my way through the oft-times depressing difficulties of building anything from the Franke/Kleinjung sources. If it helps, I can provide sftp to my machine and/or ssh access to you for building on this system.) Many thanks! Paul |
Will do (as long as the first one runs on your system; the usual showstopper is the glibc compatibility). If you can find Tom's binary in the forum - that one is L1_bits=15.
|
[QUOTE=Batalov;252318]Yes, just allow for time in mail.
Thanks.[/QUOTE] I have another 5 million relations to send. Let me know when you want me to send you my data. I am gathering about 5M relations/week. |
It is hard to predict yet, but there's most probably two weeks to go here (or more); so let's get back to this question after one week?
|
[QUOTE=Batalov;252482]It is hard to predict yet, but there's most probably two weeks to go here (or more); so let's get back to this question after one week?[/QUOTE]If it aids the ETA calculation, something over 1.35M relations have already turned up here in around 1 day effective computation (effective because I changed to a much more efficient siever on the faster machine 21 hours ago despite having started 32 hours ago).
Paul |
| All times are UTC. The time now is 08:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.