mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2012-12-02 00:07

Anyone working with CUDALucas in Windows, swl551 is working on a program to track CUDALucas workers [URL="http://www.mersenneforum.org/showthread.php?p=320183#post320183"]here[/URL]. It's still in initial testing, but I have it working quite well right now. There is no automatic GPU72 support for now, but you can get assignments from GPU72 and add them manually. You can get assignments from PrimeNet through the program. The version of this program for TFing is quite good and so I expect this one to mature quickly, as well. Except for getting assignments, it makes the process all but completely automatic including submitting results.

ckdo 2012-12-02 06:40

[QUOTE=Dubslow;320194]It loops over each bit level.[/QUOTE]

Obviously. And it's silly to do so, which was the point of "Why?". :razz:

swl551 2012-12-02 13:26

[QUOTE=ckdo;320211]Obviously. And it's silly to do so, which was the point of "Why?". :razz:[/QUOTE]

We are always looking for ways to improve. Feel free to give us an updated version of the function without the loop.

Antonio 2012-12-02 14:39

[QUOTE=swl551;320226]We are always looking for ways to improve. Feel free to give us an updated version of the function without the loop.[/QUOTE]

The following expression calculates GHz days without the use of loops:

GHzD=28.50624*(POWER(2;(to-from+1))-2)*POWER(2;(from-48))/exp

Checked using a spreadsheet against Primenet credits for work I've submitted.
The constant is = 0.00707 * 2.4 * 1680 (just rearranging your equation).

swl551 2012-12-02 15:02

[QUOTE=Antonio;320228]The following expression calculates GHz days without the use of loops:

GHzD=28.50624*(POWER(2;(to-from+1))-2)*POWER(2;(from-48))/exp

Checked using a spreadsheet against Primenet credits for work I've submitted.
The constant is = 0.00707 * 2.4 * 1680 (just rearranging your equation).[/QUOTE]
Nicely done!

swl551 2012-12-03 20:08

[QUOTE=Dubslow;320189]Chalsall got his info from Mersenne.ca, or more specifically its owner; I have absolutely no idea how PrimeNet calculates the credit for CUDALucas tests.[/QUOTE]

I worked with James H. and translated his PHP calc functions over to c#. It was not the thrill of my day! Crazy stuff.

Antonio 2012-12-06 11:29

[QUOTE=swl551;320229]Nicely done![/QUOTE]

Thanks, an identical but neater solution is:-

GHzD=28.50624 * (POWER(2;to-47) - POWER(2;from-47)) / exp

dbaugh 2013-01-22 09:06

To run two instances of CuLu on a 590 do I just do the "-d 1" like with mfaktc? FYI on a 3970x a 60M LL is twice as fast with P95 as half a 590, 3.886ms vs 7.765ms.

flashjh 2013-01-22 15:42

[QUOTE=dbaugh;325464]To run two instances of CuLu on a 590 do I just do the "-d 1" like with mfaktc? FYI on a 3970x a 60M LL is twice as fast with P95 as half a 590, 3.886ms vs 7.765ms.[/QUOTE]

Yes, that will work.

owftheevil 2013-02-03 21:55

In response to a post of LaurV from September last year, here are the fft timings I get on a 570 running on a Linux box. I know its been a while, but I assume the interest in this data still exists. The first column is the fft length in multiples of 1024, the second is the timing in milliseconds per iteration. Missing lengths were slower than longer ffts in the table.


[CODE]1 0.007
2 0.011
8 0.019
9 0.020
14 0.022
18 0.023
20 0.028
22 0.028
26 0.030
32 0.030
36 0.037
40 0.039
48 0.040
56 0.043
64 0.054
70 0.064
80 0.064
84 0.070
96 0.072
112 0.075
120 0.092
128 0.095
144 0.099
160 0.110
180 0.128
192 0.135
224 0.141
256 0.168
288 0.174
320 0.204
336 0.229
360 0.246
384 0.256
392 0.267
400 0.269
448 0.270
512 0.309
576 0.342
640 0.405
648 0.418
672 0.457
720 0.474
768 0.513
784 0.522
896 0.522
1024 0.645
1152 0.722
1176 0.849
1280 0.855
1296 0.868
1344 0.928
1440 0.956
1568 1.020
1600 1.069
1728 1.110
1792 1.169
2048 1.263
2304 1.503
2560 1.731
2592 1.734
2688 1.953
2880 1.954
3136 2.101
3200 2.288
3456 2.377
3584 2.412
3600 2.651
4096 2.696
4608 3.088
4704 3.553
5120 3.639[/CODE]

LaurV 2013-02-04 16:49

1 Attachment(s)
The conclusion was that everybody needs to tune it for his/her system (card, cpu, etc). For me, for example, 2688 is very slow. I tune it for every exponent range, in small ranges (like every meg, or so). Here is a snap from my tables, with the difference that I gray the higher, not delete. They are updated periodically by averaging the real test times with the times in the table, so they become very accurate in time. Also note that the values are the real iteration time for LL test, not the values given by -cufftbench parameter (which is about 2.66 times less, as a single FFT is done for the bench, but the test does the multiplication and the reverse FFT too, to subtract 2 and control the errors).
[ATTACH]9250[/ATTACH]

Also, please note that not all FFT's are "usable". They have to be multiple of 16k, 32k, 64k, depending on your card (see msft's posts). For example 2160 is faster, but it is not multiple of 32k, so you have to live with 2304 in case you have a gtx580 and want to use 512 threads, which would be a bit faster. Also, 2646 may be faster, but is even not multiple of 16k, so you will need 128 threads for it, which is not maxing the card. You must use either 2800 with 256 threads, or 2880 with 512 (as 2800 is multiple of 16k, but not of 32k).


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.