![]() |
|
|
#12 | |
|
"Lucan"
Dec 2006
England
194A16 Posts |
Quote:
![]() I said 64x faster. It could TF 75-76 in same time as the CPU takes for 69-70 It could TF 70-75 in same time as the CPU takes for 69-70 "Off by 3" more like. Just admit it: you "thought" (if that word is appropriate here) that the sqrt of 64 was 8. David Edit: If you are suggesting that the GPU is going to spend as long TFing one exponent as the CPU takes for an LL test (and it sounds like you are) it could TF from 70-81. Off by 3 again Last fiddled with by davieddy on 2011-07-26 at 03:45 |
|
|
|
|
|
|
#13 | |
|
"Lucan"
Dec 2006
England
2·3·13·83 Posts |
...I'm going to anyway.
Note the first line: Quote:
more than using it for LL? Last fiddled with by davieddy on 2011-07-26 at 06:09 |
|
|
|
|
|
|
#14 | |
|
"Lucan"
Dec 2006
England
2·3·13·83 Posts |
Quote:
Testing one potential factor is a separate operation from testing another. David |
|
|
|
|
|
|
#15 | |
|
Jun 2010
Pennsylvania
947 Posts |
Quote:
![]() Would you be so kind as to elaborate? I'm a words guy rather than a numbers guy. Imagine you were writing "The Complete Idiot's Guide to GPU Computing." ![]() I'll be more specific. What you said about testing one potential factor vs. testing another sounds self-evident, but I don't see what in particular it has to do with the GHz-day credits one gets for doing TF vs. LL, on a graphics card. Simply trying to understand this process (GPU computing). Normally I prefer to have a working idea of how a new (to me) process works before I start on it, but it's entirely possible I won't really "get" it till I actually do it. Rodrigo |
|
|
|
|
|
|
#16 |
|
"Lucan"
Dec 2006
England
11001010010102 Posts |
I am conjecturing in the dark, and would also like elucidation.
This is simply a reasonable guess: Suppose GPUs have lots of rudimentary processors which can test lots of factors simultaneously. LL iterations have to be performed in order (series), so can't take advantage of this. Up to a point, I think chunks of the FFT can be done simultaneously (in parallel) giving GPUs some edge over CPUs. That's enough wild guessing, and I am more than happy to be shot down! (You were right to direct that at Christenson, but there are many here who could give a good simple answer) David |
|
|
|
|
|
#17 |
|
Jun 2010
Pennsylvania
947 Posts |
Ahh, that makes a LOT of sense, thank you.
Let's see if anybody else confirms it. Rodrigo |
|
|
|
|
|
#18 |
|
Jun 2005
3·43 Posts |
Right now there's also other details slowing down CUDALucas GPU LL tests. Unlike CPU tests, the GPU libraries are really slow doing non-power of 2 FFTs. So where a CPU can use, say, a 2560K FFT length for current 1st time LL tests, a GPU is most efficient doing the same test with a 4096K FFT length.
I'm not sure if this is an inherent problem with GPUs or just a limitation of the current CUDA libraries. Assuming you can get the same FFT length for both CPU and GPU to run efficiently and those FFT run times scale similarly on both platforms, you can get a ~ 1.6x improvement in GPU LL throughput. Also remember that TF on a GPU currently uses not just the GPU, but the GPU plus around 2 CPU cores. LL testing on a GPU uses basically no CPU time. Neither of these make the case for LL testing on a GPU even in theory, but they do move the ratio a bit closer. The latter real-life issue may change the calculations used to figure out how deep to efficiently factor using GPUs - or maybe not, since you can't factor to a fraction of a bit level and that may be all it changes the end result. Last fiddled with by kjaget on 2011-07-26 at 20:01 |
|
|
|
|
|
#19 |
|
Jun 2010
Pennsylvania
947 Posts |
kjaget,
Very informative, thanks! The ~1.6x improvement jibes with the figures I've seen. But learning that LL testing on a GPU uses little CPU time is intriguing -- supposing that you have a dual-core PC, does that mean that you could do LL on the GPU and still have both CPU cores available for Prime95? Rodrigo |
|
|
|
|
|
#20 | |
|
Dec 2010
Monticello
5×359 Posts |
Quote:
OK, the TF algorithm goes like this: Sieve, on the CPU, for reasonable, probably-prime factor candidates. (I *dream* of having this happen on the GPU instead). Take each of those reasonable candidates, hand it to a GPU core, and have it compute result= 2^Exponent mod candidate if (result = 1), then report a factor found. The GPU can handle hundreds of these candidates in parallel, someone said "TF is embarrasingly parallel". The LL test goes something like this: Start with 4 For (exponent-2) number of times { Square your number (Take its FFT, multiply it term-wise by itself, take its inverse FFT) subtract 2 Take it modulo 2^exponent-1 } If the result is zero, then M^exponent -1 is prime Remark 1: The LL test for a given exponent is inherently a serial process. Remark 2: The fourier transform itself can be made fairly block parallel, so the GPU CPUs are probably kept busy pretty constantly. Specifically, in the classical power of 2 FFT, at, say, the 4th level, two adjecent blocks of 16 numbers are combined to make one block of 32 transformed numbers. Such a process shouldn't be difficult to get multiple threads working on together. Remark 3: There's a *lot* of operations in an LL test. Suppose our exponent is 40M. We have 40M iterations of 2*40M log2(40M) operations in the FFTs Letting log2(40M) be 25, we have: N = 40M * 40M * 50 arithmetic ops to do. With a hot GPU, with 700M ops/sec * 500 CPUs, this will take 40M * 40M * 50 /700M/500 =~ 4 * 40M * /700 =160M/700 =~ 200K secs =~3 days So this isn't TOO far off.... |
|
|
|
|
|
|
#21 | |
|
Dec 2010
Monticello
70316 Posts |
Quote:
|
|
|
|
|
|
|
#22 |
|
Jun 2011
Henlopen Acres, Delaware
7×19 Posts |
They happen often enough in the programming world, they get their own nickname: Fencepost errors
http://en.wikipedia.org/wiki/Off-by-one_error |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Looking at new GPU card for less than $450 | section31 | GPU Computing | 4 | 2016-01-19 17:04 |
| Card Tricks | davar55 | Hobbies | 11 | 2013-05-27 14:28 |
| card probability | TimSorbet | Math | 8 | 2007-09-25 20:00 |
| Physics Card | JCoveiro | Software | 4 | 2006-11-30 04:46 |
| Card Games | Uncwilly | Hobbies | 1 | 2006-06-03 12:45 |