mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-04-26, 21:20   #12
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

111510 Posts
Default

OK, what I have is an 8-core system with no GPU and up to 32 cores and 4 GPUs (2 K20s and 2 K10s) available. 16GB RAM on the former and 32GB on the latter.

Uncwilly, I can P-1 your number if 32GB is enough RAM. Will it need any TF first?

I will also try out CUDA P-1...is it better to DC a known P-1 result or is it ready for production work?

The rep at the NVIDIA dealer providing the trial says that it *may* be possible to arrange future testing on machines with bigger RAM. They have systems with 1TB RAM that retail for $100,000. I think we need three of them...
NBtarheel_33 is offline   Reply With Quote
Old 2013-04-27, 03:51   #13
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

265A16 Posts
Default

Quote:
Originally Posted by NBtarheel_33 View Post
Uncwilly, I can P-1 your number if 32GB is enough RAM. Will it need any TF first?
It is at 79, which is the GPU level for factoring (CPU level is 77). If you bump it up one or two levels, that is fine, either way.
Uncwilly is offline   Reply With Quote
Old 2013-04-27, 10:13   #14
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3×17×97 Posts
Default

NBtarheel_33,

Will you be willing to make a benchmark test on your machine with msieve?
Please look at this post

http://www.mersenneforum.org/showpos...ostcount=7%29?

If you are on windows you can substitute the msieve.exe binary with the one in here http://gilchrist.ca/jeff/factoring/; if you are under linux you have it here

http://www.sendspace.com/file/aih4dw

Run under windows

start /low /min msieve.exe -v -nc -t 32

or under linux

./msieve -v -nc -t 32

Thank you in advance,

Carlos

PS( Benchmarks to compare in here http://www.mersenneforum.org/showthread.php?t=16348)

Last fiddled with by pinhodecarlos on 2013-04-27 at 10:17
pinhodecarlos is offline   Reply With Quote
Old 2013-04-28, 10:53   #15
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

5×223 Posts
Cool Linux...where 16=32 or is it 16=8? Oh no...

So, either I'm dumb or Linux is, but I'm not sure why it is so difficult to spit out how many physical cores are on a system, and whether or not hyperthreading is on, and indeed, whether or not "32 cores available" really just means 16 hyperthreaded physical cores... And so it goes...

I think I have finally figured out the available resources once and for all. I have three systems. The first has 16 non-hyperthreaded cores, 16 GB RAM, and no GPUs. The second has 16 hyperthreaded cores, 32 GB RAM, and two K20 GPUs. And the third has 16 hyperthreaded cores, 32 GB RAM, and two K10 GPUs.

After several days of playing around with settings in mprime and trying to figure out just how in the hell /proc/cpuinfo counts and maps logical vs. physical CPUs, I believe that the best performance for running LLs comes by giving an entire CPU (i.e. 8 cores or 16 hyperthreads) to each LL. In the 50M range, this is yielding iteration times of 4.5ms or so, or 225,000 seconds = 2.6 days for the whole test. 30M doublechecks will complete in just under 24 hours. Suddenly what was a backlog of assignments is turning into not having enough to keep these beasts fed!

On the GPUs - running CUDALucas - I have a 54M and a 56M running on the K20s @ 4.1ms/iteration...which isn't really too much better than an 8-threaded CPU LL test. On the K10s, I am running a 49M @ 9.8ms/iteration, and an 82M @ 14.3ms/iteration. Seems as though if you have big CPU power, it's not as efficient to LL on the GPU...maybe the GPUs are better used for TF or P-1.

By the way, I think I grabbed up an old version of Prime95, because I have run into the dreaded huge roundoff bug. It doesn't actually damage the result, AFAICT, correct?
NBtarheel_33 is offline   Reply With Quote
Old 2013-04-28, 11:00   #16
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

111510 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
It is at 79, which is the GPU level for factoring (CPU level is 77). If you bump it up one or two levels, that is fine, either way.
K20s will be free in about 40 hours . I will take it at least to 80, maybe 81, then try P-1 with 32GB RAM on at least 8 cores, if not 16. (Of course, I have no idea how long P-1 will take, but I have login privileges until May 18th...)
NBtarheel_33 is offline   Reply With Quote
Old 2013-04-28, 11:01   #17
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

111510 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
NBtarheel_33,

Will you be willing to make a benchmark test on your machine with msieve?
Please look at this post

http://www.mersenneforum.org/showpos...ostcount=7%29?

If you are on windows you can substitute the msieve.exe binary with the one in here http://gilchrist.ca/jeff/factoring/; if you are under linux you have it here

http://www.sendspace.com/file/aih4dw

Run under windows

start /low /min msieve.exe -v -nc -t 32

or under linux

./msieve -v -nc -t 32

Thank you in advance,

Carlos

PS( Benchmarks to compare in here http://www.mersenneforum.org/showthread.php?t=16348)
I'd be happy to...does this just run on CPUs, or GPUs as well?
NBtarheel_33 is offline   Reply With Quote
Old 2013-04-28, 11:03   #18
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

45B16 Posts
Default

Quote:
Originally Posted by frmky View Post
CUDAPm1 on the K20 would be helpful.
Yes, I'm interested in trying this. I have some P-1's up in the 80-90M range that I'm thinking of trying.
NBtarheel_33 is offline   Reply With Quote
Old 2013-04-28, 11:06   #19
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

5×223 Posts
Default

Quote:
Originally Posted by Puzzle-Peter View Post
Does it behave like one big machine, i.e. can software be run parallellized over several nodes? If so, try to prove 2^73845+14717 from the Five Or Bust project using PRIMO on 32 cores. It will still take many weeks though.
Unfortunately, there seems to only be the one master login node and two 16-core nodes that I can access. There doesn't seem to be a good way to put the two systems together to get a virtual 32-core system (this would be very nice, but I don't think they are going to go to all that trouble for a test cluster).
NBtarheel_33 is offline   Reply With Quote
Old 2013-04-28, 11:09   #20
NBtarheel_33
 
NBtarheel_33's Avatar
 
"Nathan"
Jul 2008
Maryland, USA

111510 Posts
Default

LOL, I wonder if they realized what kind of "Test Drive" we'd devise to put their systems through. I keep waiting for the "Hmm, you're going to have to scale back your demands on the system" e-mail...
NBtarheel_33 is offline   Reply With Quote
Old 2013-04-28, 13:08   #21
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Quote:
Originally Posted by NBtarheel_33 View Post
So, either I'm dumb or Linux is, but I'm not sure why it is so difficult to spit out how many physical cores are on a system, and whether or not hyperthreading is on, and indeed, whether or not "32 cores available" really just means 16 hyperthreaded physical cores... And so it goes...

I think I have finally figured out the available resources once and for all. I have three systems. The first has 16 non-hyperthreaded cores, 16 GB RAM, and no GPUs. The second has 16 hyperthreaded cores, 32 GB RAM, and two K20 GPUs. And the third has 16 hyperthreaded cores, 32 GB RAM, and two K10 GPUs.

After several days of playing around with settings in mprime and trying to figure out just how in the hell /proc/cpuinfo counts and maps logical vs. physical CPUs, I believe that the best performance for running LLs comes by giving an entire CPU (i.e. 8 cores or 16 hyperthreads) to each LL. In the 50M range, this is yielding iteration times of 4.5ms or so, or 225,000 seconds = 2.6 days for the whole test. 30M doublechecks will complete in just under 24 hours. Suddenly what was a backlog of assignments is turning into not having enough to keep these beasts fed!

On the GPUs - running CUDALucas - I have a 54M and a 56M running on the K20s @ 4.1ms/iteration...which isn't really too much better than an 8-threaded CPU LL test. On the K10s, I am running a 49M @ 9.8ms/iteration, and an 82M @ 14.3ms/iteration. Seems as though if you have big CPU power, it's not as efficient to LL on the GPU...maybe the GPUs are better used for TF or P-1.

By the way, I think I grabbed up an old version of Prime95, because I have run into the dreaded huge roundoff bug. It doesn't actually damage the result, AFAICT, correct?
I don't know about mprime that much, but I doubt your best performance is using the entire CPU on a single exponent. Run a benchmark and see the iteration times for 1/2/3/4/5/6/7/8. On mine at 3072k FFT (which is around the 50M exp) I get 16.680 for 1, 8.987 for 2, 6.541 for 3 and 5.842 for all 4. This means with 2 shared cores I only get 92% of those 2 separate, and it drops to 85% with 3 and 71% with all 4.
bcp19 is offline   Reply With Quote
Old 2013-04-28, 15:17   #22
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by NBtarheel_33 View Post
By the way, I think I grabbed up an old version of Prime95, because I have run into the dreaded huge roundoff bug. It doesn't actually damage the result, AFAICT, correct?
Fixed in 27.9. All earlier v27s had the bug. You are correct in that it does not damage the result.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
New GPU Compute System airsquirrels GPU Computing 90 2017-12-08 00:13
Analog hardware to compute FFT's... WraithX Hardware 1 2012-11-28 13:29
Doubled compute power for a day? Christenson PrimeNet 19 2011-10-26 08:29
New Compute Box Christenson Hardware 0 2011-01-15 04:44
My throughput does not compute... petrw1 Hardware 9 2007-08-13 14:38

All times are UTC. The time now is 06:35.


Mon Aug 2 06:35:32 UTC 2021 up 10 days, 1:04, 0 users, load averages: 1.42, 1.27, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.