mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-01-07, 12:05   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

641910 Posts
Default Second-hand CPU vs brand new GPU

A bargainhardware.com dual E5-2650v1 machine costs £300, uses 220 watts, and does 7.2ms per iteration on four 44M exponents in parallel

A GTX1080 costs £665, uses about 150 watts, and does 2.4ms per iteration on a single 45M exponent.

So the machine designed five years ago for no-holds-barred double-precision wins quite handily even without the benefits of AVX2, and I should probably stop running cudalucas.

Running equivalent comparison for ECM now

Last fiddled with by fivemack on 2017-01-07 at 12:06
fivemack is offline   Reply With Quote
Old 2017-01-07, 12:28   #2
Gordon
 
Gordon's Avatar
 
Nov 2008

7658 Posts
Default

Quote:
Originally Posted by fivemack View Post
A bargainhardware.com dual E5-2650v1 machine costs £300, uses 220 watts, and does 7.2ms per iteration on four 44M exponents in parallel

A GTX1080 costs £665, uses about 150 watts, and does 2.4ms per iteration on a single 45M exponent.

So the machine designed five years ago for no-holds-barred double-precision wins quite handily even without the benefits of AVX2, and I should probably stop running cudalucas.

Running equivalent comparison for ECM now
Isn't the real question, why on earth would you run LL testing on a GPU? The 1080 is a beast at running TF.
Gordon is offline   Reply With Quote
Old 2017-01-07, 12:38   #3
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

Quote:
Originally Posted by Gordon View Post
Isn't the real question, why on earth would you run LL testing on a GPU? The 1080 is a beast at running TF.
I get the impression there is more than enough GPU TF effort already in place. Mostly I got the GPU for factorisation, running ECM and polynomial selection, which it does pretty well.
fivemack is offline   Reply With Quote
Old 2017-01-07, 15:10   #4
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

35×13 Posts
Default

GTX 1080 is one of those newer card with DP = 1/32th of SP performance, 257 GFLOPS vs 8228 GFLOPS, so it is not best for LL tests. There is really a need for a new consumer card with better DP performance.
ATH is offline   Reply With Quote
Old 2017-01-07, 18:09   #5
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

22×733 Posts
Default

Apparently this year's Vega from AMD will have 1/16th DP.

That dual processor machine looks like a good deal for LL.
Mark Rose is offline   Reply With Quote
Old 2017-01-07, 18:47   #6
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36×13 Posts
Default

Quote:
Originally Posted by fivemack View Post
.., I should probably stop running cudalucas.
Amen to that!

In contrast to cudalucas (which makes calls to some vanilla FFT and hits all the artificially NVIDIA-imposed bottlenecks), one should really want to run some DWT or NTT algorithm (like geneFer or Cyclo) to make GPUs really shine.
Attached Files
File Type: pdf geneferMath.pdf (252.9 KB, 453 views)
Batalov is offline   Reply With Quote
Old 2017-01-07, 19:11   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D6616 Posts
Default

Quote:
Originally Posted by Batalov View Post
In contrast to cudalucas (which makes calls to some vanilla FFT and hits all the artificially NVIDIA-imposed bottlenecks), one should really want to run some DWT or NTT algorithm.
CUDALucas does use a DWT algorithm.

I briefly explored an all-integer solution (not NTT though). My conclusion was I was unlikely to significantly beat the current CUDALucas timings. IIRC, it would be roughly +/- 20%. I think this was on a 6xx GPU.
Prime95 is offline   Reply With Quote
Old 2017-01-07, 19:35   #8
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

6608 Posts
Default

Quote:
Originally Posted by fivemack View Post
A bargainhardware.com dual E5-2650v1 machine costs £300, uses 220 watts, and does 7.2ms per iteration on four 44M exponents in parallel
They're taking offers at £250 nominal on ebay for the 32GB model. They seem to use 4GB 2R modules so you shouldn't run into ram bandwidth problems. When I got the 64GB model I offered a bit lower than their asking and they took it.

It would also be more fair to compare against previous generation used. Taking a 980Ti for example, it is approx 2/3 the rated boost SP FLOPS with a target cost under half a 1080, although TDP is higher at 250W.

If you really want DP, what about the R9 280X? It was possibly the last fast consumer card before they started to cripple DP. A quick look on ebay shows them under £150, and that gets you ball park of 1 DP TFLOP. Still 250W TDP though. If anyone can give me idiot proof instructions on how to bench it, I can do it on mine. I've bios mod lowered voltage so in practice it only takes around 200W now.

Quote:
Originally Posted by ATH View Post
GTX 1080 is one of those newer card with DP = 1/32th of SP performance, 257 GFLOPS vs 8228 GFLOPS, so it is not best for LL tests. There is really a need for a new consumer card with better DP performance.
Unfortunately probably not going to happen, unless you can find a compelling consumer DP requirement. If anything, the trend seems to be going the other way, with ever more FLOPS at lower precision.
mackerel is offline   Reply With Quote
Old 2017-01-07, 19:38   #9
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Default Re: DWT

Right! (I must have been thinking about llrCUDA -- which simply calls FFTw)

NTT gave GeNeFer a new life (b ranges were extended, and now that it is implemented in OCL it is free of NVIDIA shackles).
Batalov is offline   Reply With Quote
Old 2017-01-07, 22:57   #10
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

22×733 Posts
Default

Quote:
Originally Posted by mackerel View Post
Unfortunately probably not going to happen, unless you can find a compelling consumer DP requirement. If anything, the trend seems to be going the other way, with ever more FLOPS at lower precision.
Rumours are Vega 20 will have 1/2 DP in 2018.
Mark Rose is offline   Reply With Quote
Old 2017-01-07, 23:06   #11
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

5·172 Posts
Default

Can llrCUDA can be rewritten as llrocl?
pepi37 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
GPU brand kracker GPU Computing 12 2014-04-20 15:56
Brand New to Prime95 bmorgan PrimeNet 5 2013-02-20 22:06
NFS sqrt by hand henryzz Factoring 18 2010-09-26 00:55
Zeta function by hand Damian Math 0 2006-07-27 14:43
What CPU brand do you prefer? eepiccolo Hardware 29 2003-05-11 05:57

All times are UTC. The time now is 03:13.


Sat Jul 17 03:13:17 UTC 2021 up 50 days, 1 hr, 1 user, load averages: 1.53, 1.41, 1.35

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.