mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cunningham Tables (https://www.mersenneforum.org/forumdisplay.php?f=51)
-   -   Distributed finishing for 2,1870L (https://www.mersenneforum.org/showthread.php?t=15249)

xilman 2011-02-13 10:52

Slight modification for a multi-core machine
 
I've a multi-core machine and will be running several instances in parallel in the same directory. The command given in the first post doesn't work too well in that environment and I changed it to:
[code]#!/bin/sh
../gnfs-lasieve4I16e -v -r t.poly -f 100000000 -c 125000 -o t.poly.lasieve-1.100000000-100125000 &
../gnfs-lasieve4I16e -v -r t.poly -f 100125000 -c 125000 -o t.poly.lasieve-1.100125000-100250000 &
../gnfs-lasieve4I16e -v -r t.poly -f 100250000 -c 125000 -o t.poly.lasieve-1.100250000-100375000 &
../gnfs-lasieve4I16e -v -r t.poly -f 100375000 -c 125000 -o t.poly.lasieve-1.100375000-100500000 &
../gnfs-lasieve4I16e -v -r t.poly -f 100500000 -c 125000 -o t.poly.lasieve-1.100500000-100625000 &
../gnfs-lasieve4I16e -v -r t.poly -f 100625000 -c 125000 -o t.poly.lasieve-1.100625000-100750000 &
[/code]The other 250K special-q from my initial 1M block will be run in a similar fashion on a dual-core laptop.


Paul

xilman 2011-02-13 11:00

Up and running on a[code]vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1090T Processor
stepping : 0
cpu MHz : 3780.456
cache size : 512 KB[/code]Seem to be getting about 0.78 sec/rel on each processor.

Curiously enough, I'm getting almost the same performance from a 2.13GHz Core2 Duo P7540 laptop running Win7-64. Perhaps these are just small-number statistics or perhaps I need to see what needs tweaking on the AMD Linux box.

Paul

xilman 2011-02-13 18:53

[QUOTE=xilman;252332]Up and running on a[code]vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1090T Processor
stepping : 0
cpu MHz : 3780.456
cache size : 512 KB[/code]Seem to be getting about 0.78 sec/rel on each processor.

Curiously enough, I'm getting almost the same performance from a 2.13GHz Core2 Duo P7540 laptop running Win7-64. Perhaps these are just small-number statistics or perhaps I need to see what needs tweaking on the AMD Linux box.

Paul[/QUOTE]After seven hours, which ought to be long enough to get credible numbers, the AMD is averaging 0.771 \pm .005 sec/rel and the Intel 0.813 \pm 0.2 sec/rel.

The ratio of these rates is 1.05 but the ratio of the clock frequencies is 1.77 so the AMD is significantly less efficient here. Perhaps I should check compilation options on the Linux siever.

Paul

xilman 2011-02-13 19:17

[QUOTE=xilman;252332]Up and running on a[code]vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1090T Processor
stepping : 0
cpu MHz : 3780.456
cache size : 512 KB[/code]Seem to be getting about 0.78 sec/rel on each processor.

Curiously enough, I'm getting almost the same performance from a 2.13GHz Core2 Duo P7540 laptop running Win7-64. Perhaps these are just small-number statistics or perhaps I need to see what needs tweaking on the AMD Linux box.

Paul[/QUOTE]After seven hours, which ought to be long enough to get credible numbers, the AMD is averaging 0.771 \pm .005 sec/rel and the Intel 0.813 \pm 0.02 sec/rel.

The ratio of these rates is 1.05 but the ratio of the clock frequencies is 1.77 so the AMD is significantly less efficient here. Perhaps I should check compilation options on the Linux siever.

Paul

bsquared 2011-02-13 19:51

[QUOTE=xilman;252368]After seven hours, which ought to be long enough to get credible numbers, the AMD is averaging 0.771 \pm .005 sec/rel and the Intel 0.813 \pm 0.2 sec/rel.

The ratio of these rates is 1.05 but the ratio of the clock frequencies is 1.77 so the AMD is significantly less efficient here. Perhaps I should check compilation options on the Linux siever.

Paul[/QUOTE]

L1_BITS (settable at compile time) may be set to 15, which is optimal for the core2 but not the AMD. I don't know if that would be enough to explain the entire difference though.

Batalov 2011-02-13 20:00

I have posted my own L1_bits=16 binary in the top message - it may be better for AMD. Paul, your binary seems to be a bit slow (maybe non-asm?). Give this one a try. I have 0.30-0.31s/rel on a similar 1090T.

When building from source, use the src/experimental/lasieve4_64/ (well you know that)

xilman 2011-02-13 21:36

[QUOTE=Batalov;252377]I have posted my own L1_bits=16 binary in the top message - it may be better for AMD. Paul, your binary seems to be a bit slow (maybe non-asm?). Give this one a try. I have 0.30-0.31s/rel on a similar 1090T.

When building from source, use the src/experimental/lasieve4_64/ (well you know that)[/QUOTE]Yes, that is markedly better, thank you. Even after a few seconds the rate is around 0.36 s/r and that is still influenced by the set-up time, including the creation of the factorbases

I'll kill off the currently running sievers and continue from where they finished.

(Any chance of you providing comparable builds of gnfs-lasieve4I1[1-5]e please? I'm currently fighting my way through the oft-times depressing difficulties of building anything from the Franke/Kleinjung sources. If it helps, I can provide sftp to my machine and/or ssh access to you for building on this system.)

Many thanks!

Paul

Batalov 2011-02-13 21:48

Will do (as long as the first one runs on your system; the usual showstopper is the glibc compatibility). If you can find Tom's binary in the forum - that one is L1_bits=15.

R.D. Silverman 2011-02-14 14:24

[QUOTE=Batalov;252318]Yes, just allow for time in mail.
Thanks.[/QUOTE]

I have another 5 million relations to send. Let me know when
you want me to send you my data. I am gathering about 5M relations/week.

Batalov 2011-02-14 18:38

It is hard to predict yet, but there's most probably two weeks to go here (or more); so let's get back to this question after one week?

xilman 2011-02-14 18:54

[QUOTE=Batalov;252482]It is hard to predict yet, but there's most probably two weeks to go here (or more); so let's get back to this question after one week?[/QUOTE]If it aids the ETA calculation, something over 1.35M relations have already turned up here in around 1 day effective computation (effective because I changed to a much more efficient siever on the faster machine 21 hours ago despite having started 32 hours ago).

Paul


All times are UTC. The time now is 08:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.