![]() |
|
|
#2531 |
|
Einyen
Dec 2003
Denmark
C5616 Posts |
The one I'm using that works is CUDA6.5. But they claim now that the bugs in CUDA7 and 7.5 should be fixed in CUDA8. That's why I wanted to try it, but I guess it is not that important.
|
|
|
|
|
|
#2532 | |
|
"Jerry"
Nov 2011
Vancouver, WA
46316 Posts |
Quote:
I uploaded a new Windows file to sourceforge with the CUDA 8.0 version of CUDALucas. It's here. Remember to download the CUDA 8.0 Libs from here. Please note that I couldn't locate the cufft32_80.dll file, so I didn't compile a 32 bit version for CUDA 8.0. As far as I can tell nVidia didn't include it with this version of CUDA. If someone has it, get it to me and I'll compile and upload the other files. I ran several tests on a GTX 750ti, but I don't have any of the newer cards to see if this works or not. Can someone test this and let me know if it's producing errors or zero results similar to the other problems? Thanks! Last fiddled with by flashjh on 2016-10-02 at 04:59 Reason: Can't spell |
|
|
|
|
|
|
#2533 | |
|
Mar 2010
3×137 Posts |
Hello, Jerry!
Thanks for the new binary, I can confirm that on a GTX 1080 it's a bit faster than the old v8.0 RC binary & the exponents match. ![]() II did a quick test on M1000003, and the runtime was 76s for the old binary and 66s for the new one. Quote:
|
|
|
|
|
|
|
#2534 |
|
Sep 2008
Bromley, England
43 Posts |
Just run a benchmark using my new card (GTX 780ti) and the various CUDA binaries.
Driver = 373.06. Test number = M2976221. Card not driving monitor. CUDA 4.2 32bit = 13min 11sec CUDA 4.2 64bit = 13min 01sec CUDA 5.0 32bit = 12min 18sec CUDA 5.0 64bit = 12min 24sec CUDA 5.5 32bit = 12min 24sec CUDA 5.5 64bit = 12min 36sec CUDA 6.0 32bit = 12min 13sec CUDA 6.0 64bit = 12min 19sec CUDA 6.5 32bit = 12min 19sec CUDA 6.5 64bit = 11min 38sec CUDA 8.0 64bit = 13 min 56sec I'm not that surprised at CUDA 8.0, but CUDA 6.5 64bit was a bit out of character compared to how by GTX580 behaves. Last fiddled with by mognuts on 2016-10-09 at 10:32 Reason: Rounded off values for clarity |
|
|
|
|
|
#2535 |
|
Einyen
Dec 2003
Denmark
2·1,579 Posts |
I did a -cufftbench 2592 8192 20 on the different versions (only the 64 bit versions) so it does 20x 50iterations on each FFT and takes the average.
In most cases 6.5 is fastest but a few of them has 8.0 as the fastest (on a Titan Black, but this is probably GPU dependent). CUDA 4.2 was quite a bit slower on all of them, so I left it out. Code:
8.0 6.5 6.0 5.5 5.0 2592 48471289 1.6135 1.6683 1.6897 1.6145 1.6166 2744 51250889 2.0056 1.8606 1.8682 1.9980 1.8714 3136 58404433 2.0937 2.0480 2.0710 2.0201 2.0337 3200 59570449 2.4195 2.3907 2.4056 2.4150 2.4175 3240 60298969 2.4266 2.4388 2.4404 2.4435 3375 62756279 2.5147 2.5301 3888 72075517 2.5348 2.5584 2.4558 4000 74106457 2.4631 2.5498 2.5821 2.4590 2.4639 4096 75846319 2.5115 2.5800 2.5976 2.5200 2.5375 4320 79902611 3.2614 3.2573 3.2760 3.2685 4374 80879779 3.3753 3.2784 3.2924 3.2946 3.3003 4500 83158811 3.3535 3.3845 4536 83809729 3.3810 3.4006 5184 95507747 3.4279 3.3836 3.4208 3.2994 3.3273 5292 97454309 3.9568 5488 100984691 3.8843 3.8345 3.8336 3.9979 3.7484 5600 103000823 4.1630 4.3082 4.3430 4.1882 4.1943 5832 107174381 4.5283 4.3644 4.3842 4.3847 4.3903 6000 110194363 4.5328 6048 111056879 4.5338 4.5138 4.5275 6075 111541967 4.5586 6125 112440191 4.5530 4.5645 4.5780 4.5867 6144 112781477 4.6858 6250 114685037 4.8079 4.6418 4.6620 4.6714 4.6787 6272 115080019 4.6678 4.6820 4.6835 4.6908 6400 117377567 4.9088 4.7531 4.7721 4.7719 4.7809 6480 118813021 4.9438 4.8401 4.8670 4.8669 6561 120266023 5.1164 4.8818 4.9012 4.8968 6750 123654943 5.1792 5.0356 5.0432 6912 126558077 5.1957 7776 142017539 5.2343 5.0001 5.0441 4.8219 8000 146019329 5.2537 5.1762 5.2350 5.0527 5.0997 8192 149447533 5.3838 5.2219 5.2593 5.1617 5.2106 Last fiddled with by ATH on 2016-10-09 at 10:49 |
|
|
|
|
|
#2536 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
Many thanks for the info above. I had forgotten what version of CuLu I was running, and I'm still not sure. However, I switched in the CUDA 6.5, 64 bit version, and the GTX460 went from 7.4304 to 7.3026 ms/it.
|
|
|
|
|
|
#2537 |
|
"David"
Jul 2015
Ohio
11·47 Posts |
I upgraded a couple systems from 6.5 to 8.0 last night. My Titan/Black based systems saw a 4.6% gain after letting everything burn up to normal temps. I'm not sure how effective any one off benchmark will be given the amount of active thermal/power management on most GPUs. Titan X from 7.5->8.0 saw no improvement.
|
|
|
|
|
|
#2538 | |
|
Random Account
Aug 2009
22×3×163 Posts |
Quote:
I had been looking for the libraries for a day or so. I'm only lacking an INI file. It replies "using defaults for non-specified options." then goes on. Where can I find the INI file?Thanks, Dwayne. |
|
|
|
|
|
|
#2539 |
|
Random Account
Aug 2009
111101001002 Posts |
Disregard the request above. I found it with a bit more searching.
I changed the screen output options so I could see what was going on. I reserved one doublecheck from PrimNet. CUDALucas reports it can complete it in a little under six days. Now for my quandary: Prime95 indicates it can do the test in the same amount of time. Six days for a LL test is nothing to sneeze at. CUDALucas did not seem to be utilizing my GPU as much as I thought it might. It's a GTX-750Ti. I could tell by observing the core temperature. mfaktc runs it in the upper 50's one the C scale. CUDALucas only made it into the low 40's. Just in case anyone wonders about my setup, it all runs with CUDA 8. |
|
|
|
|
|
#2540 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
Quote:
Temperature is not a good guide comparing mfaktc and CUDALucas. In my experience, mfaktc runs a card hotter. This may have to do with greater (throttled) floating point usage under CuLu. (I could be wrong on this point.) The 460 mentioned above, with all conditions (voltage, clock, no competition from P95, same ambient) stabilizes at 64 C with CuLu, at 7.2521 ms/it.. It does a 40.8M LLDC in 3-4 days. This is about twice as fast as an FX-8350 worker, when running 2 workers with 4 threads each. Just about any i5 or i7 chip from Sandy Bridge on would beat the snot out of the AMD CPU. The 460 holds at 67 C running mfaktc, where it delivers ~206 GHz-d/d.. In both cases, usage was 100%, according to MSI Afterburner. This is a secondary card. It is not driving the display. If you are running Windows, I really recommend Afterburner. Even if you don't use it to OC, it has nice, configurable monitoring functions. Finding out what the actual usage is would be a good start to analyzing your performance. Here is a question: have you run CUFFTbench and threadbench on this card? Last fiddled with by kladner on 2016-11-06 at 07:31 |
|
|
|
|
|
|
#2541 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×33×109 Posts |
Doesn't the 460 have a much better single precision/double precision ratio than the 750ti?
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |