![]() |
![]() |
#1 |
Nov 2003
22×5×373 Posts |
![]()
I would like to open a discussion concerning how best to utilize multi-core
processors to run NFS. It seems as if cache-contention would be a major constraint against simply running multiple instances (i.e. give a separate special-q to each core and let it proceed independently). It might be the case that e.g. running 4 separate instances on a quad core might actually have lower total throughput than a single core (or 2 cores) owing to cache and bus contention. An alternative approach would be to give separate parts of the computation to different cores; while one core is sieving, another core could be computing the sieve start points and sieve boundary for the next special q to be processed. Also, once sieving is finished, we could let separate cores do the trial division on separate candidate smooth values. etc. Will this approach be better than running multiple copies? If someone has access to a Windows based multi-core system, I can provide code and data to perform benchmarks with respect to running multiple instances. It would be nice to get a feel for how these processors will perform. |
![]() |
![]() |
#2 | |
Nov 2003
22·5·373 Posts |
![]() Quote:
I would also be interested in seeing how fast a single NFS thread is on the new processors when compared with (say) a P4 @3.6GHz. What is the effect of having a much larger L2 cache on the efficiency of a *single* instance? (even running at lowe clock rates) |
|
![]() |
![]() |
#3 |
Tribal Bullet
Oct 2004
2×3×19×31 Posts |
![]()
I've moved windows development to a dual-core 1.86GHz system with 2MB of shared cache; for most tasks it's slightly slower than a dual 2GHz opteron system. It should be fairly typical of modern systems, and I'd be happy to run any benchmarks you're thinking of. Maybe we should continue this via email?
Last fiddled with by jasonp on 2007-04-06 at 13:10 |
![]() |
![]() |
#4 |
Sep 2004
1011000011102 Posts |
![]()
I have access to a few Windows based multi-core system like one dual P4 2.8Ghz, one AMD X2 3800+, Intel Core Duo T5500 and two P4 3.0GHz HT.
Carlos |
![]() |
![]() |
#5 | |
Nov 2003
1D2416 Posts |
![]() Quote:
siever, along with input data and instructions. Send it to my private message box. |
|
![]() |
![]() |
#6 | |
Nov 2003
1D2416 Posts |
![]() Quote:
I would like timings for a single thread on a dual core and for 2 threads on a dual core run simultaneously. The output from the siever gives verbose timing info. Please post those files. My 1.6 GHz P4 laptop takes just under 20 seconds to process a single special q. |
|
![]() |
![]() |
#7 | |
Nov 2003
22·5·373 Posts |
![]() Quote:
Send me your email in my private messages. I will send code and data. |
|
![]() |
![]() |
#8 | |
Sep 2004
2×5×283 Posts |
![]() Quote:
Carlos |
|
![]() |
![]() |
#9 |
Nov 2003
22·5·373 Posts |
![]()
Here's the output from my laptop:
Siever built on Apr 9 2007 09:39:45 Clock rate = 1594930.693333 We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423 ====================================================== Total time to process 243542 : 19.396596 Total sieve time = 9.274764 Int line: 0.457527 Alg line: 0.420746 Int vector: 4.171477 Alg vector: 4.225013 Total resieve time = 1.742181 Int resieve: 1.096560 Alg resieve: 0.645621 Trial Int time: 0.118909 Trial Alg time: 2.399764 Find startpts time: 3.148531 Alg scan time: 1.095609 Lattice reduce time: 1.431896 QS/Squfof time: 1.072377 Prepare regions time: 1.088817 Inverse time = 1.117249 Prime Test time = 0.431831 ====================================================== ====================== Time for Q = 19.396670 ====================== ====================================================== Total time to process 243543 : 19.351597 Total sieve time = 9.285043 Int line: 0.458326 Alg line: 0.473001 Int vector: 4.174365 Alg vector: 4.179350 Total resieve time = 1.769205 Int resieve: 1.095318 Alg resieve: 0.673887 Trial Int time: 0.119059 Trial Alg time: 2.474075 Find startpts time: 3.128221 Alg scan time: 1.099960 Lattice reduce time: 1.445799 QS/Squfof time: 1.126384 Prepare regions time: 1.131905 Inverse time = 1.151887 Prime Test time = 0.456115 ====================================================== ====================== Time for Q = 19.351665 |
![]() |
![]() |
#10 |
Sep 2004
2·5·283 Posts |
![]()
I didn't get your email.....please check you PM. Thanks.
Last fiddled with by em99010pepe on 2007-04-09 at 19:01 |
![]() |
![]() |
#11 |
Nov 2003
11101001001002 Posts |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Open letter to Bob | davieddy | Soap Box | 10 | 2012-04-01 03:43 |
Can't open Prime 95.... | sach160 | Software | 12 | 2011-03-27 18:35 |
Open discussion: whither the project? | schickel | Aliquot Sequences | 9 | 2010-12-11 09:59 |
In Open Projects? | Mini-Geek | No Prime Left Behind | 14 | 2008-02-14 17:07 |
26.5-26.6 and 26.7-26.8 now open to all | bayanne | Lone Mersenne Hunters | 3 | 2004-05-25 16:39 |