mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

ATH 2016-10-01 11:18

The one I'm using that works is CUDA6.5. But they claim now that the bugs in CUDA7 and 7.5 should be fixed in CUDA8. That's why I wanted to try it, but I guess it is not that important.

flashjh 2016-10-02 04:55

[QUOTE=ATH;443986]The one I'm using that works is CUDA6.5. But they claim now that the bugs in CUDA7 and 7.5 should be fixed in CUDA8. That's why I wanted to try it, but I guess it is not that important.[/QUOTE]

Hello everyone!

I uploaded a new Windows file to sourceforge with the CUDA 8.0 version of CUDALucas. It's [URL="https://sourceforge.net/projects/cudalucas/files/?source=navbar"]here[/URL].

Remember to download the CUDA 8.0 Libs from [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL].

Please note that I couldn't locate the cufft32_80.dll file, so I didn't compile a 32 bit version for CUDA 8.0. As far as I can tell nVidia didn't include it with this version of CUDA. If someone has it, get it to me and I'll compile and upload the other files.

I ran several tests on a GTX 750ti, but I don't have any of the newer cards to see if this works or not. Can someone test this and let me know if it's producing errors or zero results similar to the other problems? Thanks!

Karl M Johnson 2016-10-02 12:09

Hello, Jerry!

Thanks for the new binary, I can confirm that on a GTX 1080 it's a bit faster than the old v8.0 RC binary & the exponents match.:smile:
II did a quick test on M1000003, and the runtime was 76s for the old binary and 66s for the new one.


[QUOTE=flashjh;444034]Hello everyone!

I uploaded a new Windows file to sourceforge with the CUDA 8.0 version of CUDALucas. It's [URL="https://sourceforge.net/projects/cudalucas/files/?source=navbar"]here[/URL].

Remember to download the CUDA 8.0 Libs from [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL].

Please note that I couldn't locate the cufft32_80.dll file, so I didn't compile a 32 bit version for CUDA 8.0. As far as I can tell nVidia didn't include it with this version of CUDA. If someone has it, get it to me and I'll compile and upload the other files.

I ran several tests on a GTX 750ti, but I don't have any of the newer cards to see if this works or not. Can someone test this and let me know if it's producing errors or zero results similar to the other problems? Thanks![/QUOTE]

mognuts 2016-10-09 09:58

CUDA binary benchmarks
 
Just run a benchmark using my new card (GTX 780ti) and the various CUDA binaries.
Driver = 373.06. Test number = M2976221. Card not driving monitor.

CUDA 4.2 32bit = 13min 11sec
CUDA 4.2 64bit = 13min 01sec
CUDA 5.0 32bit = 12min 18sec
CUDA 5.0 64bit = 12min 24sec
CUDA 5.5 32bit = 12min 24sec
CUDA 5.5 64bit = 12min 36sec
CUDA 6.0 32bit = 12min 13sec
CUDA 6.0 64bit = 12min 19sec
CUDA 6.5 32bit = 12min 19sec
CUDA 6.5 64bit = 11min 38sec
CUDA 8.0 64bit = 13 min 56sec

I'm not that surprised at CUDA 8.0, but CUDA 6.5 64bit was a bit out of character compared to how by GTX580 behaves.

ATH 2016-10-09 10:47

I did a -cufftbench 2592 8192 20 on the different versions (only the 64 bit versions) so it does 20x 50iterations on each FFT and takes the average.

In most cases 6.5 is fastest but a few of them has 8.0 as the fastest (on a Titan Black, but this is probably GPU dependent).

CUDA 4.2 was quite a bit slower on all of them, so I left it out.


[CODE] 8.0 6.5 6.0 5.5 5.0
2592 48471289 1.6135 1.6683 1.6897 1.6145 1.6166
2744 51250889 2.0056 1.8606 1.8682 1.9980 1.8714
3136 58404433 2.0937 2.0480 2.0710 2.0201 2.0337
3200 59570449 2.4195 2.3907 2.4056 2.4150 2.4175
3240 60298969 2.4266 2.4388 2.4404 2.4435
3375 62756279 2.5147 2.5301
3888 72075517 2.5348 2.5584 2.4558
4000 74106457 2.4631 2.5498 2.5821 2.4590 2.4639
4096 75846319 2.5115 2.5800 2.5976 2.5200 2.5375
4320 79902611 3.2614 3.2573 3.2760 3.2685
4374 80879779 3.3753 3.2784 3.2924 3.2946 3.3003
4500 83158811 3.3535 3.3845
4536 83809729 3.3810 3.4006
5184 95507747 3.4279 3.3836 3.4208 3.2994 3.3273
5292 97454309 3.9568
5488 100984691 3.8843 3.8345 3.8336 3.9979 3.7484
5600 103000823 4.1630 4.3082 4.3430 4.1882 4.1943
5832 107174381 4.5283 4.3644 4.3842 4.3847 4.3903
6000 110194363 4.5328
6048 111056879 4.5338 4.5138 4.5275
6075 111541967 4.5586
6125 112440191 4.5530 4.5645 4.5780 4.5867
6144 112781477 4.6858
6250 114685037 4.8079 4.6418 4.6620 4.6714 4.6787
6272 115080019 4.6678 4.6820 4.6835 4.6908
6400 117377567 4.9088 4.7531 4.7721 4.7719 4.7809
6480 118813021 4.9438 4.8401 4.8670 4.8669
6561 120266023 5.1164 4.8818 4.9012 4.8968
6750 123654943 5.1792 5.0356 5.0432
6912 126558077 5.1957
7776 142017539 5.2343 5.0001 5.0441 4.8219
8000 146019329 5.2537 5.1762 5.2350 5.0527 5.0997
8192 149447533 5.3838 5.2219 5.2593 5.1617 5.2106[/CODE]

kladner 2016-10-09 11:47

Many thanks for the info above. I had forgotten what version of CuLu I was running, and I'm still not sure. However, I switched in the CUDA 6.5, 64 bit version, and the GTX460 went from 7.4304 to 7.3026 ms/it. :smile:

airsquirrels 2016-10-09 12:33

I upgraded a couple systems from 6.5 to 8.0 last night. My Titan/Black based systems saw a 4.6% gain after letting everything burn up to normal temps. I'm not sure how effective any one off benchmark will be given the amount of active thermal/power management on most GPUs. Titan X from 7.5->8.0 saw no improvement.

storm5510 2016-11-06 04:06

[QUOTE=flashjh;444034]I uploaded a new Windows file to sourceforge with the CUDA 8.0 version of CUDALucas. It's [URL="https://sourceforge.net/projects/cudalucas/files/?source=navbar"]here[/URL].

Remember to download the CUDA 8.0 Libs from [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL].[/QUOTE]

Thank you! :smile: I had been looking for the libraries for a day or so. I'm only lacking an INI file. It replies "using defaults for non-specified options." then goes on. Where can I find the INI file?

Thanks,
Dwayne.

storm5510 2016-11-06 05:30

Disregard the request above. I found it with a bit more searching.

I changed the screen output options so I could see what was going on. I reserved one doublecheck from PrimNet. CUDALucas reports it can complete it in a little under six days.

Now for my quandary: Prime95 indicates it can do the test in the same amount of time. Six days for a LL test is nothing to sneeze at. CUDALucas did not seem to be utilizing my GPU as much as I thought it might. It's a GTX-750Ti. I could tell by observing the core temperature. mfaktc runs it in the upper 50's one the C scale. CUDALucas only made it into the low 40's.

Just in case anyone wonders about my setup, it all runs with CUDA 8.

kladner 2016-11-06 06:54

[QUOTE=storm5510;446579]Disregard the request above. I found it with a bit more searching.

I changed the screen output options so I could see what was going on. I reserved one doublecheck from PrimNet. CUDALucas reports it can complete it in a little under six days.

Now for my quandary: Prime95 indicates it can do the test in the same amount of time. Six days for a LL test is nothing to sneeze at. CUDALucas did not seem to be utilizing my GPU as much as I thought it might. It's a GTX-750Ti. I could tell by observing the core temperature. mfaktc runs it in the upper 50's one the C scale. CUDALucas only made it into the low 40's.

Just in case anyone wonders about my setup, it all runs with CUDA 8.[/QUOTE]
That card should do better. I run CuLu on a (very overclocked) GTX 460 with the 6.5 libraries. You would probably do better with 6.5, as well. CUDA 8.0 mainly seems to benefit GTX 10-series architecture. The 750ti has 640 CUDA cores, at a base clock of 1020 MHz. The 460 has 336 CUDA cores, and mine is running at 848 MHz. I did have to slow the memory from 1900 to 1700 MHz to get reliable results with CuLu.

Temperature is not a good guide comparing mfaktc and CUDALucas. In my experience, mfaktc runs a card hotter. This may have to do with greater (throttled) floating point usage under CuLu. (I could be wrong on this point.)

The 460 mentioned above, with all conditions (voltage, clock, no competition from P95, same ambient) stabilizes at 64 C with CuLu, at 7.2521 ms/it.. It does a 40.8M LLDC in 3-4 days. This is about twice as fast as an FX-8350 worker, when running 2 workers with 4 threads each. Just about any i5 or i7 chip from Sandy Bridge on would beat the snot out of the AMD CPU.

The 460 holds at 67 C running mfaktc, where it delivers ~206 GHz-d/d.. In both cases, usage was 100%, according to MSI Afterburner. This is a secondary card. It is not driving the display.

If you are running Windows, I really recommend Afterburner. Even if you don't use it to OC, it has nice, configurable monitoring functions. Finding out what the actual usage is would be a good start to analyzing your performance.

Here is a question: have you run CUFFTbench and threadbench on this card?

henryzz 2016-11-06 13:48

Doesn't the 460 have a much better single precision/double precision ratio than the 750ti?


All times are UTC. The time now is 22:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.