mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-07-26, 23:39   #23
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

11001010010102 Posts
Default

Quote:
Originally Posted by Christenson View Post
I'll concur, as long as Davieddy remembers to take the log base 2 of 128 and gets 7. I hadn't even begun to think in terms of the population of available machines.
I said GPU 64x faster, you said 65x faster and said 8 extra bits.
I would have thought a programmer might know that 28=256,
even if he ate quiche.
You tried to wriggle by changing it to 128x faster, and conveniently
ignoring the fact that 70-78 will take twice as long as 77-78.
I say again, 'fess up: you were "thinking" 64=82 instead of 26.

I worked out that a hot GPU could TF 3 bits over the CPU limit
for 500 exponents in the same time it took a CPU to do an LL test,
eliminating ~20 exponents.
Well worth it surely.

I asked you to agree that this meant using a GPU for
LL testing was crazy. You declined to answer without reason..
Quote:
There's a *lot* of operations in an LL test. Suppose our exponent is 40M. We have
40M iterations of
2*40M log2(40M) operations in the FFTs

Letting log2(40M) be 25, we have:
N = 40M * 40M * 50 arithmetic ops to do.
With a hot GPU, with 700M ops/sec * 500 CPUs, this will take
40M * 40M * 50 /700M/500 =~ 4 * 40M * /700 =160M/700 =~ 200K secs
=~3 days

So this isn't TOO far off....
Too far off what exactly?

David

P.S.
We have been saying a hot GPU is 100x faster at TF than "a CPU".
What CPU are we all talking about here?

Last fiddled with by davieddy on 2011-07-27 at 00:00
davieddy is offline   Reply With Quote
Old 2011-07-27, 03:31   #24
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

34038 Posts
Default

Davie:
You'll have to trust me, I was thinking 100 =~127 = 2^7.....and smoking something to turn 7 into 8. It has a lot to do with a 20C gradient across a heatsink where there's not supposed to be one at work right now, and a million dollars or more (including my salary) hanging on the answer, and management amazingly uninterested in the data. I really need this vacation!

Now, working out that your hot GPU can eliminate 20 exponents with 3 extra bits in the time it takes a CPU to do 1 LL test (half the total work) says 3 extra bits is too few....theorem: We are at optimum in a continuous-parameter problem when we are indifferent to small adjustments of the parameters.
The hot GPU can do a 4th bit and eliminate approximately 7 exponents while the CPU does the second LL test, and so on....

The actual numbers are a bit more complicated because mfaktc uses 1-2 cores to accomplish its work, and those cores aren't doing LL.

Do I think LL on a GPU is crazy? No way...I am calculating roughly how many arithmetic operations are involved in an attempt to show why LLs are going to be slow....even on GPUs. We have a *lot* of LL tests to do, and one approach is to take all the help we can get. Especially if the GPU firepower gets 5,6,7 bits ahead of the original CPU limit, it makes sense to set those GPUs doing LL tests.

In fact, I intend to be setting half or more of my GT480 card to doing LL tests when I return from vacation.

It is an accident involving the purchase of a GT210 for work that won't run CUDALucas that I am working on mfaktc. I feel a strong need to finish that project before moving on; talk is cheap, working,correct code has real value. I want to be remembered for working code.

P.S. The CPU is the "mythical average" CPU, with 2.2 children, born in the last year or two.... 4GHz days credit per day per core is a good number....
Christenson is offline   Reply With Quote
Old 2011-07-27, 05:09   #25
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

194A16 Posts
Default

Quote:
Originally Posted by Christenson View Post
Davie:
You'll have to trust me, I was thinking 100 =~127 = 2^7.....and smoking something to turn 7 into 8....
I really need this vacation!
After a few Special Brews, all now becomes clear:

127=27 therefore 128=28

Quote:
Now, working out that your hot GPU can eliminate 20 exponents with 3 extra bits in the time it takes a CPU to do 1 LL test (half the total work) says 3 extra bits is too few....

Especially if the GPU firepower gets 5,6,7 bits ahead of the original CPU limit, it makes sense to set those GPUs doing LL tests.

P.S. The CPU is the "mythical average" CPU, with 2.2 children, born in the last year or two.... 4GHz days credit per day per core is a good number....
I know its a bit too soon to judge, but
http://www.mersenne.info/trial_facto...ta/2/50000000/
suggests that we haven't even reached the CPU limit in some cases,
and 1 extra bit is a rarity.
How many GPUs are actually working on this?

Time for some realistic numbers.
200 LL tests completed per day
Our hot GPU can TF 500 exponents to 3 extra bits in 50 days.
10 per day.
We need 20 hot GPUs on the job 24/7 to cope.

Is this too much to hope for?
We'll see in a month or two!.

David
davieddy is offline   Reply With Quote
Old 2011-07-27, 15:48   #26
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

947 Posts
Default

Quote:
Originally Posted by Christenson View Post
Indeed the LL on the GPU leaves the CPU core(or cores) largely available for whatever you want to point them at....thus my *dreaming* in the previous post.
This is kinda exciting! In effect the GPU (at ~1.6x, for the GT 430) serves as nearly two additional cores.

You know what I'll be buying this week...

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2011-07-27, 16:06   #27
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

2·3·13·83 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
This is kinda exciting! In effect the GPU (at ~1.6x, for the GT 430) serves as nearly two additional cores.

You know what I'll be buying this week...

Rodrigo
Not sure what you mean here, but have you checked that 1.6 isn't
"off by one" (or three)?


David
davieddy is offline   Reply With Quote
Old 2011-07-27, 17:29   #28
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

16638 Posts
Default

Quote:
Originally Posted by davieddy View Post
Not sure what you mean here, but have you checked that 1.6 isn't
"off by one" (or three)?


David
David,

Check out post #18 in this thread. Also see the first post, where the CPU and GPU specs are given. What is kjaget's cited ~1.6x improvement relative to? I've been taking it to mean, GHz-days/day for the given GPU in CUDALucas relative to the given CPU's Prime95 LL productivity.

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2011-07-29, 15:00   #29
kjaget
 
kjaget's Avatar
 
Jun 2005

8116 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
David,

Check out post #18 in this thread. Also see the first post, where the CPU and GPU specs are given. What is kjaget's cited ~1.6x improvement relative to? I've been taking it to mean, GHz-days/day for the given GPU in CUDALucas relative to the given CPU's Prime95 LL productivity.
It was current GPU LL performance on first time LL tests vs. the theoretical improvement if we had code which efficiently used non-power of two FFTs in the GPU code. See e.g. http://www.ece.neu.edu/groups/nucar/...iles/thall.pdf, table 8.

For actual performance of the current implementation, first time tests take ~5 days on a top of the line GPU. That's what, like 4x faster than a single core on an average recent CPU? I'm not sure how other cards scale.
kjaget is offline   Reply With Quote
Old 2011-07-29, 15:41   #30
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

94710 Posts
Default

Quote:
Originally Posted by kjaget View Post
It was current GPU LL performance on first time LL tests vs. the theoretical improvement if we had code which efficiently used non-power of two FFTs in the GPU code. See e.g. http://www.ece.neu.edu/groups/nucar/...iles/thall.pdf, table 8.

For actual performance of the current implementation, first time tests take ~5 days on a top of the line GPU. That's what, like 4x faster than a single core on an average recent CPU? I'm not sure how other cards scale.
Ahh, thanks very much.

A very informative paper, BTW, even if the math is above my pay grade. Table 3 (that must be the one you meant) was particularly interesting.

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2011-07-29, 16:31   #31
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

10011000000012 Posts
Default

Quote:
Originally Posted by kjaget View Post
It was current GPU LL performance on first time LL tests vs. the theoretical improvement if we had code which efficiently used non-power of two FFTs in the GPU code. See e.g. http://www.ece.neu.edu/groups/nucar/...iles/thall.pdf, table 8.

For actual performance of the current implementation, first time tests take ~5 days on a top of the line GPU. That's what, like 4x faster than a single core on an average recent CPU? I'm not sure how other cards scale.
Any news about the source code?

Luigi
ET_ is offline   Reply With Quote
Old 2011-08-01, 19:44   #32
kjaget
 
kjaget's Avatar
 
Jun 2005

3×43 Posts
Default

Quote:
Originally Posted by ET_ View Post
Any news about the source code?

Luigi
Nothing so far, but there's no reason I'd know any quicker than they rest of you. Come to think of if, probably lots of reasons I'd know after many of you. I'm always a bit behind on these things...
kjaget is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking at new GPU card for less than $450 section31 GPU Computing 4 2016-01-19 17:04
Card Tricks davar55 Hobbies 11 2013-05-27 14:28
card probability TimSorbet Math 8 2007-09-25 20:00
Physics Card JCoveiro Software 4 2006-11-30 04:46
Card Games Uncwilly Hobbies 1 2006-06-03 12:45

All times are UTC. The time now is 15:09.


Fri Jul 7 15:09:09 UTC 2023 up 323 days, 12:37, 0 users, load averages: 1.09, 1.15, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔