![]() |
|
|
#232 |
|
"Rob Gahan"
Aug 2013
Ireland
22×32 Posts |
Iteration 289390000 M( 332233123 )C, 0xffffffff80000000, n = 2097152, CUDALucas
v2.03 err = 0.0000 (0:45 real, 4.4768 ms/iter, ETA 53:16:24) Iteration 289400000 M( 332233123 )C, 0xffffffff80000000, n = 2097152, CUDALucas v2.03 err = 0.0000 (0:45 real, 4.4787 ms/iter, ETA 53:17:03) Iteration 289410000 M( 332233123 )C, 0xffffffff80000000, n = 2097152, CUDALucas v2.03 err = 0.0000 (0:44 real, 4.4792 ms/iter, ETA 53:16:38) Does this not look ok? err = 0.0000 is not good then eh? So 20971520 was probably correct then? I wouldn't mind but I was a fifth of the way through when I changed. Maybe Ill continue where I left off and report back in 152 days so :-( I'm new to this, .....Ill get there.....eventually ;-) |
|
|
|
|
|
#233 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
100110001001112 Posts |
|
|
|
|
|
|
#234 |
|
"Rob Gahan"
Aug 2013
Ireland
22·32 Posts |
|
|
|
|
|
|
#235 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
|
|
|
|
|
|
#236 |
|
Romulan Interpreter
Jun 2011
Thailand
72·197 Posts |
@Robish: FFT is used to multiply big numbers very fast. Imagine you have to multiply by pencil numbers of 1000 digits, you will have to do one million digit multiplications. Like last digit of the second number with all digits of the first, then the one before last digit of the second with all digits of the first, and so on. This may take almost all your life. Now imagine you have 1 million digits numbers, to multiply them you will need to do one million of millions, i.e. 1000 billions, or one trillion, digit operations. None of your children, or their children, or any of your descendents will ever finish...
And for a LL test, we talk numbers of hundred million digits, and you will have to multiply them hundred million times. To make the things much faster, the integers are transformed into "some real/complex numbers", then "multiplied fast" using some tricks, then transformed back to integers. Because we can not store real numbers in computers (think about an infinite number of decimals), but only approximations of those, this "approximation error" propagates during transformations, so every time we transform them back into integers we have to check how big the error is. Thinking like an engineer, as someone said here few days ago, we do an "approximating multiplication", but which is much faster (well, not really, but to make a parallel...) Both "too big" error and "too small" error are bad. With a "too big" error you risk to get a different integer when you round the numbers back. Think you use analog circuitry to multiply 5 and 3, much faster than reading the multiplication table inside of a CPU. You may use analog circuits (opamps), you put 5V into one input, 3V into the other, and you get instantly the result 15.2 (or 14.8), at the output. That won't be a problem, you know that the result is integer, and round it to 15 in your mind. But if you get a result like 15.8, well, or even 19.5V, well, you have to use a more accurate opamp.... On the other hand, imagine your opamps only work for a range of voltage over 2.7V, you will never be able to multiply 2*2, because you put 2V on the input and you only get zero as output. The "rounding" error to the closest integer is zero, but still not useful to you, and opamps with a lower input voltage should be used. With FFT is the same, we transform "digital" "integer" stuff into "analog" "real signals", "multiply" them, then transform back. For more math detail, you can read the Math on Gimps server, or the wiki (follow the links on the Math page), or better, look into the D.Knuth's book mentioned there (Vol 2, take a push cart with you when you go to library or bookshop, that is a brick of a book, large format, thousand pages...) for fast FFT multiplication algorithms. What you have to keep in mind, the "approximation" is dimensioned in such a way that an "error" is expected, (this means, it MUST appear, and it should be not zero, if the opamps really work!); and it is expected to be between, say, 0.001 and 0.49, otherwise the result is wrong. The fact that clLucas accepts that is WRONG. Keep in mind that this tool is "under development". Msft and few other here are currently investing a lot of time in it. Last fiddled with by LaurV on 2013-10-09 at 07:22 Reason: typos, lots of them... |
|
|
|
|
|
#237 | |
|
"Rob Gahan"
Aug 2013
Ireland
22×32 Posts |
Quote:
Thanks a million LaurV for your detailed and clear explanation. Very helpful and it looks like I've some research to do ;-) I'll continue the proper tests and show the output here, but with 190 days each I think they'll be the last (100million) I do. Well until Maxwell maybe.. ;-) Anyway I appreciate you taking the time to demystify some of this "Black Art" With more help like yours maybe a lot more newbies (like me) will come on board and speed up the search. Thanks again Rob. |
|
|
|
|
|
|
#238 | |
|
"Rob Gahan"
Aug 2013
Ireland
1001002 Posts |
Quote:
Continuing work from a partial result of M332233123 fft length = 20971520 iteration = 42622168 Iteration 42630000 M( 332233123 )C, 0xbd8dc41b383353ed, n = 20971520, CUDALucas v2.03 err = 0.0336 (6:21 real, 38.0896 ms/iter, ETA 3064:05:40) Iteration 42640000 M( 332233123 )C, 0x9a85a50ac0f45352, n = 20971520, CUDALucas v2.03 err = 0.0346 (8:14 real, 49.3313 ms/iter, ETA 3968:17:25) Iteration 42650000 M( 332233123 )C, 0x86ab474ca4d1eb20, n = 20971520, CUDALucas v2.03 err = 0.0354 (8:13 real, 49.2523 ms/iter, ETA 3961:47:47) Looks better ;-) but only 165 days to go :-( Did I read somewhere that dropping -t speeds it up a bit? Thinking (Must learn patience, must learn patience, must learn patience) Cheers Rob. |
|
|
|
|
|
|
#239 |
|
Romulan Interpreter
Jun 2011
Thailand
72×197 Posts |
Well, man, you are serious about it!
![]() If you want to use cudaLucas, then you can do better. With a GTX Titan, even better... Do you have one? (if yes, send it to me! )Start P95 (or go to it if it is already running) and use "Advanced -> Time..." option, to "time" your exponent. Fill the exponent in the box, select 100 iterations, click ok. See what FFT length is used. Write it down somewhere. Play with "cudalucas -cufftbench x y z" option, where x is a bit smaller than the value chosen by p95, y is about 20% higher, and z is a multiple of 1024 (see discussions on cudaLucas thread). With last command you "tunned" the right FFT size for your card. Select the FFT length which is around the one chosen by p95, it may be a bit smaller, or up to 10-15-20 percent larger, AND which is the fastest for your card. Run few iterations with the command you used above, but with the right FFT, the one you "cufftbench" it. Try with both -t and without. Try with "-polite 0" and without. (edit: use -k, then you can turn "polite" on/off interactively using the keyboard, just press "p" and "enter", see the speed difference. watch the GPU occupancy in this time, it should go down a bit when polite is active). My estimation (from a superficial mental calculus and from the error your test shows) it would be that a FFT close to 18M would be much faster, and keep the error under 0.21 (I may be wrong). 2097152*3^2 is better than 2097152*2*5 when splitting the "butterflies" of the FFT multiplication, therefore if the error given for ~18M8 FFT size is not too big, that would be much faster for your card. If my estimation is right, you may come few weeks shorter, comparing with the time you have now. Keep us informed. edit: if you can save the residues on the way it would be even better, in case someone wants to DC later. Last fiddled with by LaurV on 2013-10-09 at 10:41 |
|
|
|
|
|
#240 | |
|
Mar 2010
Jyvaskyla, Finland
22·32 Posts |
Quote:
|
|
|
|
|
|
|
#242 | |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101012 Posts |
Quote:
(maybe mistakenly, he forgot to modify the FFT when he switched from 33M to 55M exponents, or maybe intentionally).Now, because we know the last two (masked) digits of the residue, I will go and report doublechecks... I get my credit, the exponents are cleared... Screw gimps! Grrr.... Last fiddled with by LaurV on 2013-10-09 at 16:21 |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS | VictordeHolland | Linux | 4 | 2018-04-11 13:44 |
| OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| AMD's Graphics Core Next- a reason to accelerate towards OpenCL? | Belteshazzar | GPU Computing | 19 | 2012-03-07 18:58 |