mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Archived Projects > NFSNET Discussion

 
 
Thread Tools
Old 2007-04-05, 14:47   #1
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default Open Discussion

I would like to open a discussion concerning how best to utilize multi-core
processors to run NFS.

It seems as if cache-contention would be a major constraint against simply
running multiple instances (i.e. give a separate special-q to each core and
let it proceed independently). It might be the case that e.g. running 4
separate instances on a quad core might actually have lower total throughput
than a single core (or 2 cores) owing to cache and bus contention.

An alternative approach would be to give separate parts of the computation
to different cores; while one core is sieving, another core could be computing
the sieve start points and sieve boundary for the next special q to be
processed. Also, once sieving is finished, we could let separate cores
do the trial division on separate candidate smooth values. etc. Will this
approach be better than running multiple copies?

If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.
R.D. Silverman is offline  
Old 2007-04-05, 16:10   #2
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

746010 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I would like to open a discussion concerning how best to utilize multi-core
processors to run NFS.

It seems as if cache-contention would be a major constraint against simply
running multiple instances (i.e. give a separate special-q to each core and
let it proceed independently). It might be the case that e.g. running 4
separate instances on a quad core might actually have lower total throughput
than a single core (or 2 cores) owing to cache and bus contention.

An alternative approach would be to give separate parts of the computation
to different cores; while one core is sieving, another core could be computing
the sieve start points and sieve boundary for the next special q to be
processed. Also, once sieving is finished, we could let separate cores
do the trial division on separate candidate smooth values. etc. Will this
approach be better than running multiple copies?

If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.

I would also be interested in seeing how fast a single NFS thread is on the
new processors when compared with (say) a P4 @3.6GHz. What is the
effect of having a much larger L2 cache on the efficiency of a *single*
instance? (even running at lowe clock rates)
R.D. Silverman is offline  
Old 2007-04-06, 13:02   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101110010002 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
If someone has access to a Windows based multi-core system, I can provide
code and data to perform benchmarks with respect to running multiple
instances. It would be nice to get a feel for how these processors will
perform.
I've moved windows development to a dual-core 1.86GHz system with 2MB of shared cache; for most tasks it's slightly slower than a dual 2GHz opteron system. It should be fairly typical of modern systems, and I'd be happy to run any benchmarks you're thinking of. Maybe we should continue this via email?

Last fiddled with by jasonp on 2007-04-06 at 13:10
jasonp is offline  
Old 2007-04-07, 18:11   #4
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2×5×283 Posts
Default

I have access to a few Windows based multi-core system like one dual P4 2.8Ghz, one AMD X2 3800+, Intel Core Duo T5500 and two P4 3.0GHz HT.

Carlos
em99010pepe is offline  
Old 2007-04-09, 12:40   #5
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
I have access to a few Windows based multi-core system like one dual P4 2.8Ghz, one AMD X2 3800+, Intel Core Duo T5500 and two P4 3.0GHz HT.

Carlos
If you will send me your email address, I will bundle up my lattice
siever, along with input data and instructions. Send it to my
private message box.
R.D. Silverman is offline  
Old 2007-04-09, 13:58   #6
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

746010 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
If you will send me your email address, I will bundle up my lattice
siever, along with input data and instructions. Send it to my
private message box.
I am sending the code and data. I have timings for a P4 and P4 HT.
I would like timings for a single thread on a dual core and for 2 threads
on a dual core run simultaneously. The output from the siever gives verbose
timing info. Please post those files.

My 1.6 GHz P4 laptop takes just under 20 seconds to process a single
special q.
R.D. Silverman is offline  
Old 2007-04-09, 14:01   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by jasonp View Post
I've moved windows development to a dual-core 1.86GHz system with 2MB of shared cache; for most tasks it's slightly slower than a dual 2GHz opteron system. It should be fairly typical of modern systems, and I'd be happy to run any benchmarks you're thinking of. Maybe we should continue this via email?

Send me your email in my private messages. I will send code and
data.
R.D. Silverman is offline  
Old 2007-04-09, 14:11   #8
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

B0E16 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I am sending the code and data. I have timings for a P4 and P4 HT.
I would like timings for a single thread on a dual core and for 2 threads
on a dual core run simultaneously. The output from the siever gives verbose
timing info. Please post those files.

My 1.6 GHz P4 laptop takes just under 20 seconds to process a single
special q.
I will test it as soon as I get home.

Carlos
em99010pepe is offline  
Old 2007-04-09, 18:19   #9
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

164448 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
I will test it as soon as I get home.

Carlos
Here's the output from my laptop:

Siever built on Apr 9 2007 09:39:45
Clock rate = 1594930.693333
We try to factor: 790041893431046581209233185025824311175827003282978140813822078685169030310463219677902163905473613516505865387434570494416331465957580100723618236415905252281207423
======================================================
Total time to process 243542 : 19.396596
Total sieve time = 9.274764
Int line: 0.457527
Alg line: 0.420746
Int vector: 4.171477
Alg vector: 4.225013
Total resieve time = 1.742181
Int resieve: 1.096560
Alg resieve: 0.645621
Trial Int time: 0.118909
Trial Alg time: 2.399764
Find startpts time: 3.148531
Alg scan time: 1.095609
Lattice reduce time: 1.431896
QS/Squfof time: 1.072377
Prepare regions time: 1.088817
Inverse time = 1.117249
Prime Test time = 0.431831
======================================================
======================
Time for Q = 19.396670
======================
======================================================
Total time to process 243543 : 19.351597
Total sieve time = 9.285043
Int line: 0.458326
Alg line: 0.473001
Int vector: 4.174365
Alg vector: 4.179350
Total resieve time = 1.769205
Int resieve: 1.095318
Alg resieve: 0.673887
Trial Int time: 0.119059
Trial Alg time: 2.474075
Find startpts time: 3.128221
Alg scan time: 1.099960
Lattice reduce time: 1.445799
QS/Squfof time: 1.126384
Prepare regions time: 1.131905
Inverse time = 1.151887
Prime Test time = 0.456115
======================================================
======================
Time for Q = 19.351665
R.D. Silverman is offline  
Old 2007-04-09, 18:55   #10
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

283010 Posts
Default

I didn't get your email.....please check you PM. Thanks.

Last fiddled with by em99010pepe on 2007-04-09 at 19:01
em99010pepe is offline  
Old 2007-04-10, 14:53   #11
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

164448 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
I didn't get your email.....please check you PM. Thanks.
Sent to your alternate address. The zip file contains exe's.
R.D. Silverman is offline  
 

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Open letter to Bob davieddy Soap Box 10 2012-04-01 03:43
Can't open Prime 95.... sach160 Software 12 2011-03-27 18:35
Open discussion: whither the project? schickel Aliquot Sequences 9 2010-12-11 09:59
In Open Projects? Mini-Geek No Prime Left Behind 14 2008-02-14 17:07
26.5-26.6 and 26.7-26.8 now open to all bayanne Lone Mersenne Hunters 3 2004-05-25 16:39

All times are UTC. The time now is 21:08.

Thu Aug 13 21:08:26 UTC 2020 up 17:43, 2 users, load averages: 1.71, 2.05, 2.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.