mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   NFSNET Discussion (https://www.mersenneforum.org/forumdisplay.php?f=17)
-   -   Open Discussion (https://www.mersenneforum.org/showthread.php?t=7753)

R.D. Silverman 2007-04-05 14:47

Open Discussion
 
I would like to open a discussion concerning how best to utilize multi-core
processors to run NFS.

It seems as if cache-contention would be a major constraint against simply
running multiple instances (i.e. give a separate special-q to each core and
let it proceed independently). It might be the case that e.g. running 4
separate instances on a quad core might actually have lower total throughput
than a single core (or 2 cores) owing to cache and bus contention.

An alternative approach would be to give separate parts of the computation
to different cores; while one core is sieving, another core could be computing
the sieve start points and sieve boundary for the next special q to be
processed. Also, once sieving is finished, we could let separate cores
do the trial division on separate candidate smooth values. etc. Will this
approach be better than running multiple copies?

If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.

R.D. Silverman 2007-04-05 16:10

[QUOTE=R.D. Silverman;103062]I would like to open a discussion concerning how best to utilize multi-core
processors to run NFS.

It seems as if cache-contention would be a major constraint against simply
running multiple instances (i.e. give a separate special-q to each core and
let it proceed independently). It might be the case that e.g. running 4
separate instances on a quad core might actually have lower total throughput
than a single core (or 2 cores) owing to cache and bus contention.

An alternative approach would be to give separate parts of the computation
to different cores; while one core is sieving, another core could be computing
the sieve start points and sieve boundary for the next special q to be
processed. Also, once sieving is finished, we could let separate cores
do the trial division on separate candidate smooth values. etc. Will this
approach be better than running multiple copies?

If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.[/QUOTE]


I would also be interested in seeing how fast a single NFS thread is on the
new processors when compared with (say) a P4 @3.6GHz. What is the
effect of having a much larger L2 cache on the efficiency of a *single*
instance? (even running at lowe clock rates)

jasonp 2007-04-06 13:02

[QUOTE=R.D. Silverman;103062]
If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.[/QUOTE]
I've moved windows development to a dual-core 1.86GHz system with 2MB of shared cache; for most tasks it's slightly slower than a dual 2GHz opteron system. It should be fairly typical of modern systems, and I'd be happy to run any benchmarks you're thinking of. Maybe we should continue this via email?

em99010pepe 2007-04-07 18:11

I have access to a few Windows based multi-core system like one dual P4 2.8Ghz, one AMD X2 3800+, Intel Core Duo T5500 and two P4 3.0GHz HT.

Carlos

R.D. Silverman 2007-04-09 12:40

[QUOTE=em99010pepe;103224]I have access to a few Windows based multi-core system like one dual P4 2.8Ghz, one AMD X2 3800+, Intel Core Duo T5500 and two P4 3.0GHz HT.

Carlos[/QUOTE]

If you will send me your email address, I will bundle up my lattice
siever, along with input data and instructions. Send it to my
private message box.

R.D. Silverman 2007-04-09 13:58

[QUOTE=R.D. Silverman;103313]If you will send me your email address, I will bundle up my lattice
siever, along with input data and instructions. Send it to my
private message box.[/QUOTE]

I am sending the code and data. I have timings for a P4 and P4 HT.
I would like timings for a single thread on a dual core and for 2 threads
on a dual core run simultaneously. The output from the siever gives verbose
timing info. Please post those files.

My 1.6 GHz P4 laptop takes just under 20 seconds to process a single
special q.

R.D. Silverman 2007-04-09 14:01

[QUOTE=jasonp;103122]I've moved windows development to a dual-core 1.86GHz system with 2MB of shared cache; for most tasks it's slightly slower than a dual 2GHz opteron system. It should be fairly typical of modern systems, and I'd be happy to run any benchmarks you're thinking of. Maybe we should continue this via email?[/QUOTE]


Send me your email in my private messages. I will send code and
data.

em99010pepe 2007-04-09 14:11

[quote=R.D. Silverman;103315]I am sending the code and data. I have timings for a P4 and P4 HT.
I would like timings for a single thread on a dual core and for 2 threads
on a dual core run simultaneously. The output from the siever gives verbose
timing info. Please post those files.

My 1.6 GHz P4 laptop takes just under 20 seconds to process a single
special q.[/quote]

I will test it as soon as I get home.

Carlos

R.D. Silverman 2007-04-09 18:19

[QUOTE=em99010pepe;103317]I will test it as soon as I get home.

Carlos[/QUOTE]

Here's the output from my laptop:

Siever built on Apr 9 2007 09:39:45
Clock rate = 1594930.693333
We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423
======================================================
Total time to process 243542 : 19.396596
Total sieve time = 9.274764
Int line: 0.457527
Alg line: 0.420746
Int vector: 4.171477
Alg vector: 4.225013
Total resieve time = 1.742181
Int resieve: 1.096560
Alg resieve: 0.645621
Trial Int time: 0.118909
Trial Alg time: 2.399764
Find startpts time: 3.148531
Alg scan time: 1.095609
Lattice reduce time: 1.431896
QS/Squfof time: 1.072377
Prepare regions time: 1.088817
Inverse time = 1.117249
Prime Test time = 0.431831
======================================================
======================
Time for Q = 19.396670
======================
======================================================
Total time to process 243543 : 19.351597
Total sieve time = 9.285043
Int line: 0.458326
Alg line: 0.473001
Int vector: 4.174365
Alg vector: 4.179350
Total resieve time = 1.769205
Int resieve: 1.095318
Alg resieve: 0.673887
Trial Int time: 0.119059
Trial Alg time: 2.474075
Find startpts time: 3.128221
Alg scan time: 1.099960
Lattice reduce time: 1.445799
QS/Squfof time: 1.126384
Prepare regions time: 1.131905
Inverse time = 1.151887
Prime Test time = 0.456115
======================================================
======================
Time for Q = 19.351665

em99010pepe 2007-04-09 18:55

I didn't get your email.....please check you PM. Thanks.

R.D. Silverman 2007-04-10 14:53

[QUOTE=em99010pepe;103334]I didn't get your email.....please check you PM. Thanks.[/QUOTE]

Sent to your alternate address. The zip file contains exe's.


All times are UTC. The time now is 09:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.