mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   NFSNET Discussion (https://www.mersenneforum.org/forumdisplay.php?f=17)
-   -   Open Discussion (https://www.mersenneforum.org/showthread.php?t=7753)

R.D. Silverman 2007-04-05 14:47

Open Discussion
 
I would like to open a discussion concerning how best to utilize multi-core
processors to run NFS.

It seems as if cache-contention would be a major constraint against simply
running multiple instances (i.e. give a separate special-q to each core and
let it proceed independently). It might be the case that e.g. running 4
separate instances on a quad core might actually have lower total throughput
than a single core (or 2 cores) owing to cache and bus contention.

An alternative approach would be to give separate parts of the computation
to different cores; while one core is sieving, another core could be computing
the sieve start points and sieve boundary for the next special q to be
processed. Also, once sieving is finished, we could let separate cores
do the trial division on separate candidate smooth values. etc. Will this
approach be better than running multiple copies?

If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.

R.D. Silverman 2007-04-05 16:10

[QUOTE=R.D. Silverman;103062]I would like to open a discussion concerning how best to utilize multi-core
processors to run NFS.

It seems as if cache-contention would be a major constraint against simply
running multiple instances (i.e. give a separate special-q to each core and
let it proceed independently). It might be the case that e.g. running 4
separate instances on a quad core might actually have lower total throughput
than a single core (or 2 cores) owing to cache and bus contention.

An alternative approach would be to give separate parts of the computation
to different cores; while one core is sieving, another core could be computing
the sieve start points and sieve boundary for the next special q to be
processed. Also, once sieving is finished, we could let separate cores
do the trial division on separate candidate smooth values. etc. Will this
approach be better than running multiple copies?

If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.[/QUOTE]


I would also be interested in seeing how fast a single NFS thread is on the
new processors when compared with (say) a P4 @3.6GHz. What is the
effect of having a much larger L2 cache on the efficiency of a *single*
instance? (even running at lowe clock rates)

jasonp 2007-04-06 13:02

[QUOTE=R.D. Silverman;103062]
If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.[/QUOTE]
I've moved windows development to a dual-core 1.86GHz system with 2MB of shared cache; for most tasks it's slightly slower than a dual 2GHz opteron system. It should be fairly typical of modern systems, and I'd be happy to run any benchmarks you're thinking of. Maybe we should continue this via email?

em99010pepe 2007-04-07 18:11

I have access to a few Windows based multi-core system like one dual P4 2.8Ghz, one AMD X2 3800+, Intel Core Duo T5500 and two P4 3.0GHz HT.

Carlos

R.D. Silverman 2007-04-09 12:40

[QUOTE=em99010pepe;103224]I have access to a few Windows based multi-core system like one dual P4 2.8Ghz, one AMD X2 3800+, Intel Core Duo T5500 and two P4 3.0GHz HT.

Carlos[/QUOTE]

If you will send me your email address, I will bundle up my lattice
siever, along with input data and instructions. Send it to my
private message box.

R.D. Silverman 2007-04-09 13:58

[QUOTE=R.D. Silverman;103313]If you will send me your email address, I will bundle up my lattice
siever, along with input data and instructions. Send it to my
private message box.[/QUOTE]

I am sending the code and data. I have timings for a P4 and P4 HT.
I would like timings for a single thread on a dual core and for 2 threads
on a dual core run simultaneously. The output from the siever gives verbose
timing info. Please post those files.

My 1.6 GHz P4 laptop takes just under 20 seconds to process a single
special q.

R.D. Silverman 2007-04-09 14:01

[QUOTE=jasonp;103122]I've moved windows development to a dual-core 1.86GHz system with 2MB of shared cache; for most tasks it's slightly slower than a dual 2GHz opteron system. It should be fairly typical of modern systems, and I'd be happy to run any benchmarks you're thinking of. Maybe we should continue this via email?[/QUOTE]


Send me your email in my private messages. I will send code and
data.

em99010pepe 2007-04-09 14:11

[quote=R.D. Silverman;103315]I am sending the code and data. I have timings for a P4 and P4 HT.
I would like timings for a single thread on a dual core and for 2 threads
on a dual core run simultaneously. The output from the siever gives verbose
timing info. Please post those files.

My 1.6 GHz P4 laptop takes just under 20 seconds to process a single
special q.[/quote]

I will test it as soon as I get home.

Carlos

R.D. Silverman 2007-04-09 18:19

[QUOTE=em99010pepe;103317]I will test it as soon as I get home.

Carlos[/QUOTE]

Here's the output from my laptop:

Siever built on Apr 9 2007 09:39:45
Clock rate = 1594930.693333
We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423
======================================================
Total time to process 243542 : 19.396596
Total sieve time = 9.274764
Int line: 0.457527
Alg line: 0.420746
Int vector: 4.171477
Alg vector: 4.225013
Total resieve time = 1.742181
Int resieve: 1.096560
Alg resieve: 0.645621
Trial Int time: 0.118909
Trial Alg time: 2.399764
Find startpts time: 3.148531
Alg scan time: 1.095609
Lattice reduce time: 1.431896
QS/Squfof time: 1.072377
Prepare regions time: 1.088817
Inverse time = 1.117249
Prime Test time = 0.431831
======================================================
======================
Time for Q = 19.396670
======================
======================================================
Total time to process 243543 : 19.351597
Total sieve time = 9.285043
Int line: 0.458326
Alg line: 0.473001
Int vector: 4.174365
Alg vector: 4.179350
Total resieve time = 1.769205
Int resieve: 1.095318
Alg resieve: 0.673887
Trial Int time: 0.119059
Trial Alg time: 2.474075
Find startpts time: 3.128221
Alg scan time: 1.099960
Lattice reduce time: 1.445799
QS/Squfof time: 1.126384
Prepare regions time: 1.131905
Inverse time = 1.151887
Prime Test time = 0.456115
======================================================
======================
Time for Q = 19.351665

em99010pepe 2007-04-09 18:55

I didn't get your email.....please check you PM. Thanks.

R.D. Silverman 2007-04-10 14:53

[QUOTE=em99010pepe;103334]I didn't get your email.....please check you PM. Thanks.[/QUOTE]

Sent to your alternate address. The zip file contains exe's.

em99010pepe 2007-04-10 15:24

Got the email. Thanks.

em99010pepe 2007-04-10 15:56

Here's the output from my AMD 64 3000+ (2GHz)

[code]Siever built on Apr 9 2007 09:39:45
Clock rate = 2008790.026667
We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423
======================================================
Total time to process 243051 : 14.957819
Total sieve time = 6.347350
Int line: 0.327693
Alg line: 0.331017
Int vector: 2.830311
Alg vector: 2.858329
Total resieve time = 1.723709
Int resieve: 1.131123
Alg resieve: 0.592586
Trial Int time: 0.143164
Trial Alg time: 2.409095
Find startpts time: 2.147755
Alg scan time: 1.057686
Lattice reduce time: 1.025221
QS/Squfof time: 0.939109
Prepare regions time: 0.484812
Inverse time = 0.741717
Prime Test time = 0.329211
======================================================
======================
Time for Q = 14.957869
======================
======================================================
Total time to process 243052 : 14.974745
Total sieve time = 6.304219
Int line: 0.339790
Alg line: 0.331826
Int vector: 2.812848
Alg vector: 2.819755
Total resieve time = 1.740309
Int resieve: 1.143335
Alg resieve: 0.596974
Trial Int time: 0.129974
Trial Alg time: 2.516539
Find startpts time: 2.114074
Alg scan time: 1.051407
Lattice reduce time: 1.008271
QS/Squfof time: 0.955024
Prepare regions time: 0.484765
Inverse time = 0.726874
Prime Test time = 0.304210
======================================================
======================
Time for Q = 14.974796
======================
======================================================
Total time to process 243053 : 14.900223
Total sieve time = 6.343369
Int line: 0.334369
Alg line: 0.331037
Int vector: 2.847171
Alg vector: 2.830793
Total resieve time = 1.732617
Int resieve: 1.129759
Alg resieve: 0.602859
Trial Int time: 0.113038
Trial Alg time: 2.382724
Find startpts time: 2.167068
Alg scan time: 1.057641
Lattice reduce time: 1.030443
QS/Squfof time: 0.869075
Prepare regions time: 0.508135
Inverse time = 0.723305
Prime Test time = 0.309061
======================================================
======================
Time for Q = 14.900274
======================
======================================================
Total time to process 243054 : 14.955055
Total sieve time = 6.301058
Int line: 0.335241
Alg line: 0.328434
Int vector: 2.811675
Alg vector: 2.825708
Total resieve time = 1.739072
Int resieve: 1.126458
Alg resieve: 0.612614
Trial Int time: 0.122711
Trial Alg time: 2.513721
Find startpts time: 2.114314
Alg scan time: 1.053397
Lattice reduce time: 1.010336
QS/Squfof time: 0.974110
Prepare regions time: 0.487998
Inverse time = 0.729075
Prime Test time = 0.324837
======================================================
======================
Time for Q = 14.955106
======================

[/code]Here's the output from my T5500 (1.66GHz) - 1 Instance

[code]Siever built on Apr 9 2007 09:39:45
Clock rate = 1662264.786667
We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423
======================================================
Total time to process 243000 : 13.835894
Total sieve time = 6.221507
Int line: 0.357100
Alg line: 0.353817
Int vector: 2.755602
Alg vector: 2.754988
Total resieve time = 1.092979
Int resieve: 0.671530
Alg resieve: 0.421449
Trial Int time: 0.091012
Trial Alg time: 1.901134
Find startpts time: 2.790766
Alg scan time: 0.670023
Lattice reduce time: 1.240321
QS/Squfof time: 0.853117
Prepare regions time: 0.939043
Inverse time = 1.049021
Prime Test time = 0.351571
======================================================
======================
Time for Q = 13.835957
======================
======================================================
Total time to process 243001 : 13.705041
Total sieve time = 6.170074
Int line: 0.355732
Alg line: 0.353821
Int vector: 2.757821
Alg vector: 2.702701
Total resieve time = 1.093736
Int resieve: 0.672198
Alg resieve: 0.421538
Trial Int time: 0.093397
Trial Alg time: 1.895056
Find startpts time: 2.769108
Alg scan time: 0.663930
Lattice reduce time: 1.222624
QS/Squfof time: 0.835098
Prepare regions time: 0.940400
Inverse time = 1.042576
Prime Test time = 0.341527
======================================================
======================
Time for Q = 13.705102
======================
======================================================
Total time to process 243002 : 13.605536
Total sieve time = 6.161381
Int line: 0.360552
Alg line: 0.353247
Int vector: 2.752164
Alg vector: 2.695419
Total resieve time = 1.086112
Int resieve: 0.670768
Alg resieve: 0.415344
Trial Int time: 0.087008
Trial Alg time: 1.837913
Find startpts time: 2.766950
Alg scan time: 0.663756
Lattice reduce time: 1.225755
QS/Squfof time: 0.849267
Prepare regions time: 0.938710
Inverse time = 1.045291
Prime Test time = 0.361549
======================================================
======================
Time for Q = 13.605598
======================
======================================================
Total time to process 243003 : 13.703781
Total sieve time = 6.160060
Int line: 0.357130
Alg line: 0.355403
Int vector: 2.749761
Alg vector: 2.697765
Total resieve time = 1.098564
Int resieve: 0.676726
Alg resieve: 0.421838
Trial Int time: 0.070406
Trial Alg time: 1.879977
Find startpts time: 2.775045
Alg scan time: 0.660769
Lattice reduce time: 1.227441
QS/Squfof time: 0.709288
Prepare regions time: 0.936485
Inverse time = 1.031073
Prime Test time = 0.268086
======================================================
======================
Time for Q = 13.703842
======================
======================================================
Total time to process 243004 : 13.868451
Total sieve time = 6.157564
Int line: 0.356334
Alg line: 0.354715
Int vector: 2.750806
Alg vector: 2.695708
Total resieve time = 1.097898
Int resieve: 0.674672
Alg resieve: 0.423226
Trial Int time: 0.083782
Trial Alg time: 2.053950
Find startpts time: 2.774038
Alg scan time: 0.663943
Lattice reduce time: 1.223030
QS/Squfof time: 0.907044
Prepare regions time: 0.937371
Inverse time = 1.053620
Prime Test time = 0.345697
======================================================
======================
Time for Q = 13.868514
======================
======================================================
Total time to process 243005 : 13.541794
Total sieve time = 6.163748
Int line: 0.356556
Alg line: 0.354484
Int vector: 2.756578
Alg vector: 2.696130
Total resieve time = 1.090137
Int resieve: 0.671432
Alg resieve: 0.418705
Trial Int time: 0.080781
Trial Alg time: 1.751009
Find startpts time: 2.770760
Alg scan time: 0.660466
Lattice reduce time: 1.225651
QS/Squfof time: 0.740213
Prepare regions time: 0.937966
Inverse time = 1.031939
Prime Test time = 0.286757
======================================================
======================
Time for Q = 13.541856
======================
======================================================
Total time to process 243006 : 13.731324
Total sieve time = 6.162438
Int line: 0.357100
Alg line: 0.354651
Int vector: 2.754762
Alg vector: 2.695924
Total resieve time = 1.096276
Int resieve: 0.674671
Alg resieve: 0.421605
Trial Int time: 0.091823
Trial Alg time: 1.908861
Find startpts time: 2.775741
Alg scan time: 0.667201
Lattice reduce time: 1.228524
QS/Squfof time: 0.838577
Prepare regions time: 0.937290
Inverse time = 1.044161
Prime Test time = 0.344188
======================================================
======================
Time for Q = 13.731386
======================
======================================================
Total time to process 243007 : 13.629225
Total sieve time = 6.164690
Int line: 0.356743
Alg line: 0.355134
Int vector: 2.755292
Alg vector: 2.697521
Total resieve time = 1.088752
Int resieve: 0.672286
Alg resieve: 0.416466
Trial Int time: 0.103814
Trial Alg time: 1.829308
Find startpts time: 2.768182
Alg scan time: 0.661569
Lattice reduce time: 1.224861
QS/Squfof time: 0.825621
Prepare regions time: 0.940132
Inverse time = 1.041511
Prime Test time = 0.361711
======================================================
======================
Time for Q = 13.629288
======================
[/code]Here's the output from my T5500 (1.66GHz) - 2 Instances - Started at the same time

[code]
[B]Instance 1[/B]

Siever built on Apr 9 2007 09:39:45
Clock rate = 1662064.833333
We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423
======================================================
Total time to process 243051 : 14.580573
Total sieve time = 6.600088
Int line: 0.359703
Alg line: 0.357588
Int vector: 2.945099
Alg vector: 2.937698
Total resieve time = 1.329593
Int resieve: 0.833927
Alg resieve: 0.495666
Trial Int time: 0.109872
Trial Alg time: 1.893963
Find startpts time: 2.771878
Alg scan time: 0.740067
Lattice reduce time: 1.238865
QS/Squfof time: 0.876138
Prepare regions time: 0.940204
Inverse time = 1.040855
Prime Test time = 0.375727
======================================================
======================
Time for Q = 14.580636
======================
======================================================
Total time to process 243052 : 14.292252
Total sieve time = 6.381648
Int line: 0.356710
Alg line: 0.354272
Int vector: 2.895975
Alg vector: 2.774692
Total resieve time = 1.310392
Int resieve: 0.835661
Alg resieve: 0.474730
Trial Int time: 0.100813
Trial Alg time: 1.970652
Find startpts time: 2.754841
Alg scan time: 0.725335
Lattice reduce time: 1.216864
QS/Squfof time: 0.899784
Prepare regions time: 0.943710
Inverse time = 1.043538
Prime Test time = 0.347551
======================================================
======================
Time for Q = 14.292316
======================
======================================================
Total time to process 243053 : 14.125554
Total sieve time = 6.467821
Int line: 0.355420
Alg line: 0.352732
Int vector: 2.971669
Alg vector: 2.787999
Total resieve time = 1.240207
Int resieve: 0.701692
Alg resieve: 0.538515
Trial Int time: 0.084089
Trial Alg time: 1.872890
Find startpts time: 2.762798
Alg scan time: 0.672852
Lattice reduce time: 1.226863
QS/Squfof time: 0.818615
Prepare regions time: 0.941759
Inverse time = 1.036199
Prime Test time = 0.355633
======================================================
======================
Time for Q = 14.125622
======================
======================================================
Total time to process 243054 : 14.320856
Total sieve time = 6.406322
Int line: 0.357683
Alg line: 0.357351
Int vector: 2.863983
Alg vector: 2.827305
Total resieve time = 1.138968
Int resieve: 0.711662
Alg resieve: 0.427306
Trial Int time: 0.092486
Trial Alg time: 1.998459
Find startpts time: 2.760300
Alg scan time: 0.679006
Lattice reduce time: 1.217441
QS/Squfof time: 0.924893
Prepare regions time: 0.940851
Inverse time = 1.053341
Prime Test time = 0.373247
======================================================
======================
Time for Q = 14.320921
======================


[B]Instance 2[/B]

Siever built on Apr 9 2007 09:39:45
Clock rate = 1662220.280000
We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423
======================================================
Total time to process 243026 : 15.264312
Total sieve time = 6.784279
Int line: 0.367059
Alg line: 0.371216
Int vector: 3.019164
Alg vector: 3.026841
Total resieve time = 1.371629
Int resieve: 0.895815
Alg resieve: 0.475813
Trial Int time: 0.099601
Trial Alg time: 2.134747
Find startpts time: 2.872866
Alg scan time: 0.802379
Lattice reduce time: 1.287763
QS/Squfof time: 0.949184
Prepare regions time: 0.965290
Inverse time = 1.075985
Prime Test time = 0.352865
======================================================
======================
Time for Q = 15.264375
======================
======================================================
Total time to process 243027 : 14.595826
Total sieve time = 6.687776
Int line: 0.371986
Alg line: 0.370803
Int vector: 3.054791
Alg vector: 2.890196
Total resieve time = 1.145648
Int resieve: 0.710936
Alg resieve: 0.434712
Trial Int time: 0.083563
Trial Alg time: 2.019955
Find startpts time: 2.849884
Alg scan time: 0.715398
Lattice reduce time: 1.259328
QS/Squfof time: 0.911535
Prepare regions time: 0.969085
Inverse time = 1.073501
Prime Test time = 0.366919
======================================================
======================
Time for Q = 14.595890
======================
======================================================
Total time to process 243028 : 14.565658
Total sieve time = 6.823503
Int line: 0.367511
Alg line: 0.367869
Int vector: 3.055582
Alg vector: 3.032541
Total resieve time = 1.133835
Int resieve: 0.700398
Alg resieve: 0.433436
Trial Int time: 0.076535
Trial Alg time: 1.882279
Find startpts time: 2.876771
Alg scan time: 0.696136
Lattice reduce time: 1.267809
QS/Squfof time: 0.761307
Prepare regions time: 0.974920
Inverse time = 1.072995
Prime Test time = 0.306361
======================================================
======================
Time for Q = 14.565720
======================


[/code]

em99010pepe 2007-04-10 16:00

By the way, the client crashes when running at normal priority. Later I will test it again in low priority and on another dual core machine, the AMD64 X2 3800+. Tomorrow I can test it on the intel 2.8 GHz dual-core machine and on the P4 3.0Ghz HT.

Carlos

R.D. Silverman 2007-04-11 12:41

[QUOTE=em99010pepe;103378]By the way, the client crashes when running at normal priority. Later I will test it again in low priority and on another dual core machine, the AMD64 X2 3800+. Tomorrow I can test it on the intel 2.8 GHz dual-core machine and on the P4 3.0Ghz HT.

Carlos[/QUOTE]

I was twiddling with some of the internal array parameters. I may have
set an array size too small. This might be causing your crash.

R.D. Silverman 2007-04-11 12:50

[QUOTE=em99010pepe;103377]Here's the output from my AMD 64 3000+ (2GHz)

<snip>

[code][/QUOTE]

I think it is amazing (and terrific :tu: ) that two instances running show only
about a 10% slowdown per process over 1 instance running. I expected
a lot worse.

Note that the 2Ghz Athlon showed about a 30% improvement over my
laptop. This is in good agreement with their relative clock rates.


All times are UTC. The time now is 09:48.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.