![]() |
Anyone working with CUDALucas in Windows, swl551 is working on a program to track CUDALucas workers [URL="http://www.mersenneforum.org/showthread.php?p=320183#post320183"]here[/URL]. It's still in initial testing, but I have it working quite well right now. There is no automatic GPU72 support for now, but you can get assignments from GPU72 and add them manually. You can get assignments from PrimeNet through the program. The version of this program for TFing is quite good and so I expect this one to mature quickly, as well. Except for getting assignments, it makes the process all but completely automatic including submitting results.
|
[QUOTE=Dubslow;320194]It loops over each bit level.[/QUOTE]
Obviously. And it's silly to do so, which was the point of "Why?". :razz: |
[QUOTE=ckdo;320211]Obviously. And it's silly to do so, which was the point of "Why?". :razz:[/QUOTE]
We are always looking for ways to improve. Feel free to give us an updated version of the function without the loop. |
[QUOTE=swl551;320226]We are always looking for ways to improve. Feel free to give us an updated version of the function without the loop.[/QUOTE]
The following expression calculates GHz days without the use of loops: GHzD=28.50624*(POWER(2;(to-from+1))-2)*POWER(2;(from-48))/exp Checked using a spreadsheet against Primenet credits for work I've submitted. The constant is = 0.00707 * 2.4 * 1680 (just rearranging your equation). |
[QUOTE=Antonio;320228]The following expression calculates GHz days without the use of loops:
GHzD=28.50624*(POWER(2;(to-from+1))-2)*POWER(2;(from-48))/exp Checked using a spreadsheet against Primenet credits for work I've submitted. The constant is = 0.00707 * 2.4 * 1680 (just rearranging your equation).[/QUOTE] Nicely done! |
[QUOTE=Dubslow;320189]Chalsall got his info from Mersenne.ca, or more specifically its owner; I have absolutely no idea how PrimeNet calculates the credit for CUDALucas tests.[/QUOTE]
I worked with James H. and translated his PHP calc functions over to c#. It was not the thrill of my day! Crazy stuff. |
[QUOTE=swl551;320229]Nicely done![/QUOTE]
Thanks, an identical but neater solution is:- GHzD=28.50624 * (POWER(2;to-47) - POWER(2;from-47)) / exp |
To run two instances of CuLu on a 590 do I just do the "-d 1" like with mfaktc? FYI on a 3970x a 60M LL is twice as fast with P95 as half a 590, 3.886ms vs 7.765ms.
|
[QUOTE=dbaugh;325464]To run two instances of CuLu on a 590 do I just do the "-d 1" like with mfaktc? FYI on a 3970x a 60M LL is twice as fast with P95 as half a 590, 3.886ms vs 7.765ms.[/QUOTE]
Yes, that will work. |
In response to a post of LaurV from September last year, here are the fft timings I get on a 570 running on a Linux box. I know its been a while, but I assume the interest in this data still exists. The first column is the fft length in multiples of 1024, the second is the timing in milliseconds per iteration. Missing lengths were slower than longer ffts in the table.
[CODE]1 0.007 2 0.011 8 0.019 9 0.020 14 0.022 18 0.023 20 0.028 22 0.028 26 0.030 32 0.030 36 0.037 40 0.039 48 0.040 56 0.043 64 0.054 70 0.064 80 0.064 84 0.070 96 0.072 112 0.075 120 0.092 128 0.095 144 0.099 160 0.110 180 0.128 192 0.135 224 0.141 256 0.168 288 0.174 320 0.204 336 0.229 360 0.246 384 0.256 392 0.267 400 0.269 448 0.270 512 0.309 576 0.342 640 0.405 648 0.418 672 0.457 720 0.474 768 0.513 784 0.522 896 0.522 1024 0.645 1152 0.722 1176 0.849 1280 0.855 1296 0.868 1344 0.928 1440 0.956 1568 1.020 1600 1.069 1728 1.110 1792 1.169 2048 1.263 2304 1.503 2560 1.731 2592 1.734 2688 1.953 2880 1.954 3136 2.101 3200 2.288 3456 2.377 3584 2.412 3600 2.651 4096 2.696 4608 3.088 4704 3.553 5120 3.639[/CODE] |
1 Attachment(s)
The conclusion was that everybody needs to tune it for his/her system (card, cpu, etc). For me, for example, 2688 is very slow. I tune it for every exponent range, in small ranges (like every meg, or so). Here is a snap from my tables, with the difference that I gray the higher, not delete. They are updated periodically by averaging the real test times with the times in the table, so they become very accurate in time. Also note that the values are the real iteration time for LL test, not the values given by -cufftbench parameter (which is about 2.66 times less, as a single FFT is done for the bench, but the test does the multiplication and the reverse FFT too, to subtract 2 and control the errors).
[ATTACH]9250[/ATTACH] Also, please note that not all FFT's are "usable". They have to be multiple of 16k, 32k, 64k, depending on your card (see msft's posts). For example 2160 is faster, but it is not multiple of 32k, so you have to live with 2304 in case you have a gtx580 and want to use 512 threads, which would be a bit faster. Also, 2646 may be faster, but is even not multiple of 16k, so you will need 128 threads for it, which is not maxing the card. You must use either 2800 with 256 threads, or 2880 with 512 (as 2800 is multiple of 16k, but not of 32k). |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.