![]() |
[QUOTE=ET_;396552]meanwhile, here are the benchmarks for James[/QUOTE]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.
To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email] |
[QUOTE=James Heinrich;396554]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.
To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email][/QUOTE] It's compiled as cc5.0 with Cuda 6.5, not yet 5.2 for 7.0. Luigi |
Thanks, but it didn't fix my problem. I'm starting to suspect a problem in the PATH setting. The original:
[CODE] # set PATH so it includes user's private bin if it exists if [ -d "$HOME/bin" ] ; then PATH="$HOME/bin:$PATH" fi[][/CODE]Should I change to: [CODE] PATH="$HOME/SHN/CUDALucas:$HOME/SHN:$HOME/bin:$PATH"[/CODE](where SHN is my account, w/admin privileges) ? Would I need to run some update or reboot after making that change ? I've tried changing all this in /etc/ld.so.conf.d, but it doesn't work, even after running sudo ldconfig: [CODE] shn@Core2Duo ~/Desktop/CUDALucas $ ./CUDALucas -v bash: ./CUDALucas: No such file or directory shn@Core2Duo ~/Desktop/CUDALucas $ ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64 -v bash: ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64: No such file or directory[/CODE]Here are the files in that directory: [CODE] shn@Core2Duo ~/Desktop/CUDALucas $ ls -lt total 888 -rwxr-xr-x 1 shn shn 425256 Feb 21 21:57 CUDALucas -rwxr--r-- 1 shn shn 9092 Feb 21 21:43 CUDALucas.ini drwxr-xr-x 2 shn shn 4096 Feb 21 14:38 CUDALucas-2.05.1-CUDA4.2-CUDA6.5-linux-x86_64 -rw-r--r-- 1 shn shn 0 Feb 21 11:48 output -rwxr--r-- 1 shn shn 25 Feb 20 21:58 worktodo.txt -rwxr-xr-x 1 shn shn 425256 Feb 11 17:27 CUDALucas-2.05.1-CUDA4.2-linux-x86_64 -rw-r--r-- 1 shn shn 35316 Jul 20 2014 CUDALucas README[/CODE] |
[QUOTE=James Heinrich;396554]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.
To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email][/QUOTE] I'll post benchmarks on two 2.1 cards shortly, a GT520 and a GT430, once the benchmarks have finished running. |
[QUOTE=MacFactor;396570]Thanks, but it didn't fix my problem.[/quote]
I had to compile it from source as I was running into the same issue. On Ubuntu 14.04, I installed nvidia-cuda-toolkit, changed the cuda install location at the top of Makefile to /usr, ran `make`, and it worked. |
2 Attachment(s)
[QUOTE=James Heinrich;396554]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.
To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email][/QUOTE] Here are two benchmarks for low end 2.1 cards. Attached are the command outputs. Device GeForce GT 430 Compatibility 2.1 clockRate (MHz) 1400 memClockRate (MHz) 600 fft max exp ms/iter 1024 19535569 13.1073 1152 21921901 14.9833 1225 23280269 16.5440 1280 24302527 17.4606 1296 24599717 17.5964 1323 25101101 18.1876 1372 26010389 20.1547 1568 29640913 20.1924 1600 30232693 20.5022 1728 32597297 23.1662 1792 33778141 24.0094 2048 38492887 26.3347 2304 43194913 31.8484 2401 44973503 33.1631 2592 48471289 36.1269 2744 51250889 40.7279 3200 59570449 40.8349 3456 64229677 45.8811 3888 72075517 55.4194 4096 75846319 55.6313 4320 79902611 73.0512 4375 80897867 73.7962 4480 82797151 74.6121 5184 95507747 75.5244 5488 100984691 82.0007 6272 115080019 83.8615 6400 117377567 85.8051 6912 126558077 98.1325 7776 142017539 110.4415 8192 149447533 115.0217 Device GeForce GT 520 Compatibility 2.1 clockRate (MHz) 1620 memClockRate (MHz) 600 fft max exp ms/iter 1024 19535569 19.1582 1152 21921901 21.1080 1260 23930909 25.1102 1296 24599717 25.1694 1344 25490893 26.9664 1440 27271147 27.6734 1568 29640913 29.6729 1600 30232693 31.0359 1728 32597297 32.7081 1792 33778141 36.0988 2048 38492887 37.7887 2240 42020509 45.4196 2304 43194913 46.2915 2592 48471289 50.0936 2880 53735041 58.4163 3136 58404433 62.8308 3200 59570449 68.3727 3360 62483353 71.3736 3456 64229677 72.2613 3584 66556463 74.7343 3600 66847171 78.6148 4096 75846319 80.5501 4320 79902611 93.3572 4608 85111207 96.0147 5040 92911087 104.0299 5184 95507747 106.3927 5600 103000823 124.7254 5760 105879517 126.8726 6048 111056879 131.7726 6144 112781477 135.7366 6272 115080019 137.5057 6300 115582697 143.4483 6480 118813021 147.0057 6720 123117161 151.9291 6912 126558077 157.1928 7168 131142761 157.2835 7200 131715607 159.5422 7776 142017539 161.8901 7840 143161159 175.8101 8064 147162241 180.3350 8192 149447533 180.9630 |
[QUOTE=Mark Rose;396581]Here are two benchmarks for low end 2.1 cards.[/QUOTE]Thanks. Those numbers are a fair bit higher (~25%) than expected. If someone has a higher-end Compute 2.1 card (e.g. GTX 460 or 560) that matches those numbers I'll feel more confident in updating the chart.
|
[QUOTE=James Heinrich;396582]Thanks. Those numbers are a fair bit higher (~25%) than expected. If someone has a higher-end Compute 2.1 card (e.g. GTX 460 or 560) that matches those numbers I'll feel more confident in updating the chart.[/QUOTE]
They were done with CUDA 5.5 if it matters. |
I also have results for a 570, 780, Titan, and Titan black if you want them.
[CODE] Device GeForce GTX 560 Ti Compatibility 2.1 clockRate (MHz) 1940 memClockRate (MHz) 2080 fft max exp ms/iter 1024 19535569 2.3434 1152 21921901 2.6485 1260 23930909 3.0699 1296 24599717 3.0931 1344 25490893 3.3730 1440 27271147 3.4673 1568 29640913 3.7194 1600 30232693 3.9279 1728 32597297 4.0755 1792 33778141 4.3713 2048 38492887 4.6964 2240 42020509 5.5626 2304 43194913 5.5691 2592 48471289 6.2081 2688 50227213 7.0311 2880 53735041 7.1812 3136 58404433 7.7523 3200 59570449 8.2700 3360 62483353 8.5920 3456 64229677 8.6056 3584 66556463 8.9110 3600 66847171 9.6526 3780 70115887 9.8984 4096 75846319 9.9148 4320 79902611 11.2540 4608 85111207 11.4486 5040 92911087 12.6529 5184 95507747 12.9797 5292 97454309 14.6345 5376 98967641 14.8351 5760 105879517 15.0610 6048 111056879 15.8011 6144 112781477 16.1751 6272 115080019 16.2582 6300 115582697 17.0884 6480 118813021 17.3343 6720 123117161 17.9061 6912 126558077 18.2475 7168 131142761 18.5232 7200 131715607 19.6253 7776 142017539 19.8418 7840 143161159 21.2217 8192 149447533 21.2728 [/CODE] |
[QUOTE=owftheevil;396608]I also have results for a 570, 780, Titan, and Titan black if you want them.[/QUOTE]Your 560ti result was more in line with what I'd seen before. These numbers are from v2.05 correct?
I already have 570/580 results, but more is merrier if you have them handy. I would also be very interested in your 780/Titan/Black results please. |
I can bench on three more 580's and a 760 if it would be useful.
|
[CODE]
Device GeForce GTX 780 Compatibility 3.5 clockRate (MHz) 1032 memClockRate (MHz) 3004 fft max exp ms/iter 1024 19535569 1.2052 1080 20580341 1.4758 1152 21921901 1.4977 1280 24302527 1.6422 1296 24599717 1.6571 1323 25101101 1.8665 1344 25490893 1.8907 1440 27271147 1.9198 1568 29640913 1.9577 1600 30232693 2.0038 1728 32597297 2.2062 1792 33778141 2.2945 2048 38492887 2.4171 2304 43194913 2.8495 2560 47885689 3.2959 2592 48471289 3.3462 2646 49459153 3.8456 2700 50446621 3.8838 2800 52274087 3.9217 2880 53735041 3.9759 2916 54392209 3.9872 3136 58404433 4.0511 3200 59570449 4.2620 3240 60298969 4.5078 3584 66556463 4.5897 4096 75846319 4.9067 4608 85111207 5.9424 5120 94353877 6.7233 5184 95507747 6.9934 5292 97454309 7.5551 5600 103000823 7.5775 5832 107174381 8.0734 6272 115080019 8.3218 6400 117377567 8.9122 6912 126558077 9.1317 7168 131142761 9.4188 7200 131715607 9.7199 8192 149447533 10.2932 [/CODE] |
All with CUDAlucas 2.05.1 and CUDA-5.5
[CODE] Device GeForce GTX TITAN Compatibility 3.5 clockRate (MHz) 980 memClockRate (MHz) 3004 fft max exp ms/iter 1024 19535569 0.6874 1080 20580341 0.8469 1296 24599717 0.8590 1568 29640913 1.0307 1600 30232693 1.1269 1728 32597297 1.2268 2000 37609879 1.2813 2048 38492887 1.3269 2592 48471289 1.6516 2646 49459153 2.0153 2700 50446621 2.0498 3136 58404433 2.1236 3200 59570449 2.4147 3240 60298969 2.4481 4000 74106457 2.5415 4096 75846319 2.5937 4320 79902611 3.2794 4374 80879779 3.3026 4500 83158811 3.3899 4536 83809729 3.4085 5184 95507747 3.4727 5292 97454309 3.9633 5400 99399967 4.0524 5600 103000823 4.2176 5832 107174381 4.3864 6000 110194363 4.5187 6048 111056879 4.5813 6125 112440191 4.6094 6272 115080019 4.6856 6400 117377567 4.7771 6480 118813021 4.8461 6561 120266023 4.8945 6750 123654943 5.0736 6912 126558077 5.1885 7000 128134459 5.2399 7056 129137381 5.3293 8000 146019329 5.3398 8192 149447533 5.4728 [/CODE] [CODE] Device GeForce GTX TITAN Black Compatibility 3.5 clockRate (MHz) 1110 memClockRate (MHz) 3500 fft max exp ms/iter 1024 19535569 0.6642 1080 20580341 0.8309 1296 24599717 0.8310 1568 29640913 0.9913 1600 30232693 1.0648 1728 32597297 1.1501 2000 37609879 1.2579 2048 38492887 1.2876 2592 48471289 1.6101 2744 51250889 1.9976 3136 58404433 2.0265 3200 59570449 2.4074 3240 60298969 2.4360 4000 74106457 2.4614 4096 75846319 2.5284 4320 79902611 3.2676 4374 80879779 3.2938 5184 95507747 3.3084 5292 97454309 3.9584 5488 100984691 4.0006 5600 103000823 4.1916 5832 107174381 4.3808 6048 111056879 4.5168 6125 112440191 4.5787 6250 114685037 4.6657 6272 115080019 4.6844 6400 117377567 4.7705 6480 118813021 4.8617 6561 120266023 4.8914 6750 123654943 5.0513 8000 146019329 5.1017 8192 149447533 5.1987 [/CODE] [CODE] Device GeForce GTX 570 Compatibility 2.0 clockRate (MHz) 1464 memClockRate (MHz) 1900 fft max exp ms/iter 1024 19535569 1.7232 1080 20580341 2.0214 1120 21325891 2.0459 1152 21921901 2.0549 1176 22368691 2.2586 1296 24599717 2.3083 1440 27271147 2.5121 1568 29640913 2.7261 1600 30232693 2.8704 1728 32597297 3.0130 2048 38492887 3.2886 2160 40551479 4.0701 2304 43194913 4.1211 2352 44075249 4.4544 2592 48471289 4.4614 2880 53735041 5.1680 3072 57237889 5.4704 3136 58404433 5.6023 3200 59570449 6.0137 3360 62483353 6.0838 3456 64229677 6.3130 3584 66556463 6.3526 4096 75846319 6.8811 4320 79902611 7.7969 4608 85111207 7.9405 5040 92911087 8.9266 5184 95507747 9.2909 5400 99399967 10.4533 5600 103000823 10.6724 5670 104260469 10.8861 5760 105879517 10.9340 6144 112781477 10.9955 6272 115080019 11.9533 6400 117377567 12.3915 6480 118813021 12.5910 7056 129137381 12.9406 7168 131142761 13.2404 7200 131715607 13.3282 7776 142017539 13.8598 7840 143161159 14.8712 8192 149447533 15.0081 [/CODE] |
Your Titan/Titan Black are about 1-2% faster than mine :tantrum:.
Were they done with P95 running? (mine yes, I try to go for "as close to real conditions as possible" when I build them) Or you keep them cooler? (how? mine are air cooled and with hot Thai days they lose a lot of productivity when the aircond is off, for example, the TDP go down, and the iterations go from 3 to 4 ms, or even 5 etc.) Just curious. |
Please see this post:
[url]http://www.mersenneforum.org/showthread.php?p=396570#post396570[/url] |
[QUOTE=owftheevil;396624]I also have results for a 570, 780, Titan, and Titan black if you want them.[/QUOTE]Thanks. More benchmarks are required, I think. Between the three 3.5 cards the output devitates +/- 25% from my expected. The Titan Black is right on the mark, the 780 is 30% slower and the Titan is 20% faster than expected.
@LaurV (and anyone else who has one), if you can send me your 780/Titan* benchmarks (2.05) sometime that would be great. |
[QUOTE=LaurV;396626]Your Titan/Titan Black are about 1-2% faster than mine :tantrum:.
Were they done with P95 running? (mine yes, I try to go for "as close to real conditions as possible" when I build them) Or you keep them cooler? (how? mine are air cooled and with hot Thai days they lose a lot of productivity when the aircond is off, for example, the TDP go down, and the iterations go from 3 to 4 ms, or even 5 etc.) Just curious.[/QUOTE] These were all done with mprime running. Mine are water cooled and go from 37--43 C depending on room temperature. That's probably the difference. The titan black settles in at 1019 Mhz and the regular titan at 923--947 Mhz. |
[QUOTE=MacFactor;396630]Please see this post:
[URL]http://www.mersenneforum.org/showthread.php?p=396570#post396570[/URL][/QUOTE] Two others that had this same problem solved it by building their own binaries locally. Its easy, but involves a large download (most of which is unnecessary, but unavoidable) if you don't have all of the cuda toolkit already. Has anyone running Linux been able to get the binaries provided on Sourceforge working? |
[QUOTE=owftheevil;396653]These were all done with mprime running. Mine are water cooled and go from 37--43 C depending on room temperature. That's probably the difference. The titan black settles in at 1019 Mhz and the regular titan at 923--947 Mhz.[/QUOTE]
:tu: :tu: Mine stabilizes at ~900 during the day, and ~970 during the night, with the TDP between say, 60 and 75%. This is the "worst" period of the year, because it is getting hot during the day, close to 30C, but we have around 16C during the night, sometimes higher. So, with windows open in the morning/night, the temperature in the rooms during the day stays under 25C and is not so hot to run the aircond. So, it is "not so hot, no so cold". When it is hot outside, as from end of March to June or so (before the raining season starts) we run the aircond in the house so it is better for the air cooled cards, somehow, during the day. Well, it is not really better, but the day/nigh difference is not so big. Also, for the "water cooled" part, this period is bad because the sun is down and it shines under the roof of the cabinet, heating a corner of one of the radiators, therefore there are bigger temperature differences between day and night. For example, one water loop with one 2600k CPU and two 580's can reach as high as 60C during the day and as low as 28C during the night. In April, the hottest period in this area (45C during the day and 30C in the night) the temperature of the water plays around 45-50 during the day, in spite of the higher atmospheric temperature, because the sun is high in the sky and all the water cabinet is in the shadow. I have to move my ass one of these days and mount that Titan water cooling blocks I was talking about last year... it's getting rusty in the drawers.. (I did not change it as long as the air cooler worked, in spite of the fact that I bought the water block after I got the first Titan, from Xyzzy, years ago. That one is still running, and on air, see my posts in my cooling thread here around). |
[QUOTE=LaurV;396725]For example, one water loop with one 2600k CPU and two 580's can reach as high as 60C during the day and as low as 28C during the night.[/quote]
What do you have in the way of radiators on that loop? I am curious. |
offtopic [URL="http://www.mersenneforum.org/showthread.php?p=396793"]moved here[/URL] :razz:
|
Does anyone have a GTX 780 Ti and is willing to benchmark it with CUDALucas v2.05.1(CUDA 6.5) ?
I was shocked to learn that it might be faster than GTX Titan (while NV took away the option to enable 1/3 FP64 performance vs 1/8 default, according to [URL="http://www.hardwareluxx.com/index.php/reviews/hardware/vgacards/28513-test-nvidia-geforce-gtx-780-ti-.html?start=1"]one source[/URL]). The big question is: does GTX 780 Ti beat GTX Titan, even if the latter has 1/3 FP64 performance enabled? For comparison, it takes CUDALucas 2.12 ms to perform 1 LL iteration(last two digits cut off) on M57885161, using GTX Titan. Disabling double precision in NV CP slows CL to 3.62 ms / iteration. This suggests that 780 Ti should be slower than GTX Titan, even if it has more shaders and a higher vRAM frequency. Or not? Anyways, if someone's up for the task, make sure to run threadbench and FFTbench, for best parameters: [code] CUDALucas2.05.1-CUDA6.5-Windows-x64 -cufftbench 1024 8192 3 CUDALucas2.05.1-CUDA6.5-Windows-x64 -threadbench 1024 8192 3 1[/code] Over and out. |
[QUOTE=Karl M Johnson;398271]For comparison, it takes CUDALucas 2.12 ms to perform 1 LL iteration(last two digits cut off) on M57885161, using GTX Titan.
Disabling double precision in NV CP slows CL to 3.62 ms / iteration.[/QUOTE] Mersenne.ca gives 3.0ms/it for a 55m expo for [B]Titan[/B]! That suggests that perhaps it is the Titan data that needs to updated at the site. |
[QUOTE=axn;398277]That suggests that perhaps it is the Titan data that needs to updated at the site.[/QUOTE]
GTX Titan is 2 years old, CUDALucas got faster during that time. I selected CUDA 6.5 binary since it had the best timings during the FFT bench (wondering if it's the same for all GPUs with shader model 3.5?) |
I've twiddled how the performance is shown for the (original) Titan based on how a recently submitted benchmark lines up with the other Titan benchmark I have. What I need now is more benchmarks from both 780 and Titan Black to figure that part out too. Benchmarks from Titan Z and Titan X would great too, but I'm not sure how many people have those available.
|
Got some fresh info about GTX 780 Ti.
[CODE] 69M exp 1045 MHZ GPU clock, 3196 MHz mem clock, iteration time: 4.22 ms/iter 1176 MHz, 3570 MHz mem clock, iteration time: 3.7 ms/iter [/CODE] With my GTX Titan, I get 2.76ms/iteration on the same exponent. Promising data :smile: |
[QUOTE=Karl M Johnson;398407]Got some fresh info about GTX 780 Ti.
[CODE] 69M exp 1045 MHZ GPU clock, 3196 MHz mem clock, iteration time: 4.22 ms/iter 1176 MHz, 3570 MHz mem clock, iteration time: 3.7 ms/iter [/CODE] With my GTX Titan, I get 2.76ms/iteration on the same exponent. Promising data :smile:[/QUOTE] Hmmm... Mersenne.ca has been updated with new numbers, and now things have swung the other way. 780 Ti has apparently suffered a reduction in performance (how?) and Titan is now second. I think Titan Black and Titan Z also needs to be rebenchmarked. |
[QUOTE=Karl M Johnson;398407]Got some fresh info about GTX 780 Ti.[/QUOTE]If you have access to that 780 Ti I would much appreciate a benchmark.
[QUOTE=axn;398408]I think Titan Black and Titan Z also needs to be rebenchmarked.[/QUOTE][QUOTE=James Heinrich;398294]What I need now is more benchmarks from both 780 and Titan Black to figure that part out too. Benchmarks from Titan Z and Titan X would great too[/QUOTE]I would tend to agree with you :smile: |
Device GeForce GTX 780 Ti
Compatibility 3.5 clockRate (MHz) 1019 memClockRate (MHz) 3574 fft max exp ms/iter 3136 58404433 3.0522 3200 59570449 3.2748 3240 60298969 3.5230 3584 66556463 3.6101 4096 75846319 3.6664 4608 85111207 4.5590 4800 88579669 5.1143 4860 89662967 5.5272 4900 90384989 5.7338 5000 92189509 5.8190 Threads 3500 256 32 3.77875 3528 256 512 3.77450 3584 256 512 3.60728 3600 256 256 3.85989 3645 256 128 4.17807 3675 128 128 4.60588 3750 256 64 4.63702 3780 256 64 4.21302 3840 256 512 4.23805 3888 256 128 4.03913 3920 256 64 4.29968 3969 256 64 4.51075 4000 256 128 3.93409 4032 256 512 4.17210 4050 256 64 4.53155 4096 256 64 3.66282 4116 256 32 5.14177 4200 256 128 4.98391 4320 128 512 4.65157 4374 128 256 5.18404 4375 128 128 5.32028 4410 256 32 5.00846 4480 256 256 4.77655 4500 256 128 4.82445 |
New numbers for Titan X have appeared at mersenne.ca. 30% more thruput compared to 980.
The TDP for Titan Z is wrong at the site. It should be 375w, not 500w. |
[QUOTE=James Heinrich;398410]If you have access to that 780 Ti I would much appreciate a benchmark.[/QUOTE]And by 780 Ti what I really meant was GTX 980.
(but thanks [i]stars10250[/i]). [QUOTE=axn;398463]The TDP for Titan Z is wrong at the site. It should be 375w, not 500w.[/QUOTE]Fixed, thanks. |
32 bit Binary
Is there a 32-bit recent binary for CUDA4.2 please?
|
[QUOTE=vsuite;399243]Is there a 32-bit recent binary for CUDA4.2 please?[/QUOTE]
[URL="http://downloads.sourceforge.net/project/cudalucas/CUDALucas.2.05.1-CUDA4.2-CUDA6.5-Windows-32.64.7z?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fcudalucas%2F&ts=1428018767&use_mirror=iweb"]Here[/URL] you go. |
Thanks flashjh.
Now back in GIMP, and 2.03 appeared to only have 64bit bins, so I did not download 2.05. :ermm::whistle: GT 640 on 7/64 (X2 processor) slow, so mfaktc only. GTX 460 on XP/32 (Core 2 Quad) will be both CudaLucas and mfaktc. |
Compiler optimalisation
Hi guys,
My computer is running mprime for some time and for fun I started CUDALucas beside it. It runs great on my GTX970! I translated the CUDALucas source on my computer. And after some inspection in the Makefile I see: NAME = CUDALucas VERSION = 2.05.1 OptLevel = 1 I did some test with OptLevel 3 and here are the results (tiny test, btw the production CUDALucas is running in the background)): With OptLevel = 1: [FONT=Courier New][SIZE=1]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Apr 06 11:29:32 | M110503 10000 0xacb29fc05973d0a8 | 8K 0.00001 0.2062 2.06s | 0:20 9.04% | | Apr 06 11:29:34 | M110503 20000 0x9cd7ca8aa594b33c | 8K 0.00001 0.2060 2.06s | 0:18 18.09% | | Apr 06 11:29:36 | M110503 30000 0xba1ef4f09a7c955a | 8K 0.00001 0.2062 2.06s | 0:16 27.14% | | Apr 06 11:29:38 | M110503 40000 0x827b27dad4e98554 | 8K 0.00001 0.2060 2.06s | 0:14 36.19% | | Apr 06 11:29:40 | M110503 50000 0x9e6c039053cc2c17 | 8K 0.00001 0.2061 2.06s | 0:12 45.24% | | Apr 06 11:29:42 | M110503 60000 0xdb48afced9ebd397 | 8K 0.00001 0.2060 2.06s | 0:10 54.29% | | Apr 06 11:29:44 | M110503 70000 0xd650094b406761ed | 8K 0.00001 0.2061 2.06s | 0:08 63.34% | | Apr 06 11:29:46 | M110503 80000 0xa4d69c031cb0caa2 | 8K 0.00001 0.2060 2.06s | 0:06 72.39% | | Apr 06 11:29:48 | M110503 90000 0xf1427358e52c1458 | 8K 0.00001 0.2060 2.06s | 0:04 81.44% | | Apr 06 11:29:50 | M110503 100000 0x0f4385fec05eb193 | 8K 0.00001 0.2060 2.06s | 0:02 90.49% | | Apr 06 11:29:52 | M110503 110000 0xc5bb3186236db9db | 8K 0.00001 0.2061 2.06s | 0:00 99.54% |[/SIZE][/FONT] But with OptLevel = 3: [FONT=Courier New][SIZE=1]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Apr 06 11:30:19 | M110503 10000 0xacb29fc05973d0a8 | 8K 0.00001 0.2058 2.05s | 0:20 9.04% | | Apr 06 11:30:21 | M110503 20000 0x9cd7ca8aa594b33c | 8K 0.00001 0.2058 2.05s | 0:18 18.09% | | Apr 06 11:30:23 | M110503 30000 0xba1ef4f09a7c955a | 8K 0.00001 0.2058 2.05s | 0:16 27.14% | | Apr 06 11:30:25 | M110503 40000 0x827b27dad4e98554 | 8K 0.00001 0.2057 2.05s | 0:14 36.19% | | Apr 06 11:30:27 | M110503 50000 0x9e6c039053cc2c17 | 8K 0.00001 0.2059 2.05s | 0:12 45.24% | | Apr 06 11:30:29 | M110503 60000 0xdb48afced9ebd397 | 8K 0.00001 0.2057 2.05s | 0:10 54.29% | | Apr 06 11:30:31 | M110503 70000 0xd650094b406761ed | 8K 0.00001 0.2046 2.04s | 0:08 63.34% | | Apr 06 11:30:34 | M110503 80000 0xa4d69c031cb0caa2 | 8K 0.00001 0.2057 2.05s | 0:06 72.39% | | Apr 06 11:30:36 | M110503 90000 0xf1427358e52c1458 | 8K 0.00001 0.2059 2.05s | 0:04 81.44% | | Apr 06 11:30:38 | M110503 100000 0x0f4385fec05eb193 | 8K 0.00001 0.2058 2.05s | 0:02 90.49% | | Apr 06 11:30:40 | M110503 110000 0xc5bb3186236db9db | 8K 0.00001 0.2059 2.05s | 0:00 99.54% |[/SIZE][/FONT] Why isn' t the setting standard at 3? See the man pages for gcc! PS: For now I leave my (production) CUDALucas on OptLevel=1. PS2: I have some experience with C programming. I was one of the hercules-390 (mainframe emulator) developer for 12 years. My expertise was performance, maybe I can help? Kind regards, Bernard van der Helm |
No help here ? Is ANYONE running CUDALucas under Linux Mint ?
I haven't done a compile outside of an IDE, but if someone will give me a clue what a 'make' statement (which switches, options) would look like I'll give it a try -- I hate to have a couple of GPUs underutilized just because the OS can't find a file which is sitting right there.
MF |
Could you be more specific what the error is, what you've already tried?
|
Sorry, thought I was responding to my own post, so it would be obvious ...
There's something about this forum that's not organized the way I'm used to from other forums.
Makes it hard to keep related posts together. "I've downloaded an executable (CUDALucas) that I know other people are running under Linux. Typing ./filename gives me the error message "No such file or directory". ls -l clearly show the file is there, belongs to me, and I have permission to read,write, and execute. I get the same error in Lubuntu ! How can a file which is so obviously "there" not be there ??" Note that I've already got mfaktc up and running on two different systems, so I've had some experience with NVidia drivers and CUDA libs at this point, and I'm starting to doubt whether it has anything to do with finding the linked files. Here's a summary: [url]http://www.mersenneforum.org/showthread.php?p=396570#post396570[/url] I've tried running in sh, bsh, bash, csh, tcsh, none of which helps. file(1) returns ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, .... not stripped. Hope the ... isn't important stuff. I'm running it on another computer, using this one for browser only. It's not like I'm getting any usable error messages at all. And the instructions at Source Forge are pretty minimalist -- dl the files, change one env variable. Didn't work, now what ? |
Numerous people have run into that issue, myself included. It seems to happy with Ubuntu and derivatives. If you're already on Linux, it should be simple enough to recompile it. Note that TheJudger has found a bug in CUDA 7.x that is not fixed, so you should compiling it using an earlier version.
|
UNIX (including Linux) can produce rather misleading error messages. The program you are trying to run is probably calling some other file which it can't find, putting out that message and terminating. Of course it [B]should[/B] also say what it's trying to call, but the programmer writing the relevant code has to make it do that.
Try "strace ./CUDALucas" which should trace all the system calls CUDALucas makes with parameters passed to them. You may get quite a lot of output, but hopefully one line will tell you what file it's trying to call. Read the man page for strace to help understand the output. Chris |
[QUOTE=Mark Rose;405795]Numerous people have run into that issue, myself included. It seems to happy with Ubuntu and derivatives. If you're already on Linux, it should be simple enough to recompile it. Note that TheJudger has found a bug in CUDA 7.x that is not fixed, so you should compiling it using an earlier version.[/QUOTE]
Thanks, I sure wish I had known that earlier ! You''d think they'd update the instructions ... I've dl'ed the source, will try compiling later. I was reluctant to try that because I've never done a compilation outside of an IDE -- now it sounds like it would have been easier to go straight for the compile, instead of wasting so much time trying to figure out how to get the binaries to work. SHN |
Not sure that helps much ...
[CODE]
shn@C2D ~/Desktop/CUDALucas $ strace ./CUDALucas execve("./CUDALucas", ["./CUDALucas"], [/* 46 vars */]) = -1 ENOENT (No such file or directory) write(2, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory ) = 40 exit_group(1) = ? +++ exited with 1 +++ [/CODE] [CODE]shn@C2D ~/Desktop/CUDALucas/other_versions $ strace CUDALucas-2.05.1-CUDA4.2-linux-x86_64 strace: Can't stat 'CUDALucas-2.05.1-CUDA4.2-linux-x86_64': No such file or directory [/CODE][QUOTE=chris2be8;405799]UNIX (including Linux) can produce rather misleading error messages. The program you are trying to run is probably calling some other file which it can't find, putting out that message and terminating. Of course it [B]should[/B] also say what it's trying to call, but the programmer writing the relevant code has to make it do that. Try "strace ./CUDALucas" which should trace all the system calls CUDALucas makes with parameters passed to them. You may get quite a lot of output, but hopefully one line will tell you what file it's trying to call. Read the man page for strace to help understand the output. Chris[/QUOTE] |
This may be part of the problem ...
[CODE] ~/Desktop/CUDALucas $ ldd runll linux-vdso.so.1 => (0x00007fffd8d9c000) libcufft.so.4 => /home/shn/CUDAlibs/libcufft.so.4 (0x00007fc31dca7000) libcudart.so.4 => /home/shn/CUDAlibs/libcudart.so.4 (0x00007fc31da49000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc31d742000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc31d37d000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc31d179000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc31cf5a000) [B] libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc31cc56000)[/B] libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc31ca40000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc31c837000) /lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fc31fced000) [/CODE]The file in bold cannot be found. |
[QUOTE=Mark Rose;405795]Note that TheJudger has found a bug in CUDA 7.x that is not fixed, so you should compiling it using an earlier version.[/QUOTE]
That doesn't affect CUDALucas. Compiling with CUDA 7.x is fine. Just don't compile mfaktc with CUDA 7.x. |
[QUOTE=frmky;405816]That doesn't affect CUDALucas. Compiling with CUDA 7.x is fine. Just don't compile mfaktc with CUDA 7.x.[/QUOTE]
Good to know. :) |
[QUOTE=MacFactor;405806][CODE]
shn@C2D ~/Desktop/CUDALucas $ strace ./CUDALucas execve("./CUDALucas", ["./CUDALucas"], [/* 46 vars */]) = -1 ENOENT (No such file or directory) write(2, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory ) = 40 exit_group(1) = ? +++ exited with 1 +++ [/CODE] [/QUOTE] That does give a (slightly cryptic) clue. From the man page for execve: [code] ENOENT The file filename or a script or ELF interpreter does not exist, or a shared library needed for file or interpreter can‐not be found. [/code] Which matches up with the ldd output in your next post where one of the libraries is missing. Obviously the system should have said which file it could not find. If you can install a version of libstdc++.so.6 on your system the program might work. As I said UNIX can produce rather cryptic error messages. In my previous job as a MVS systems programmer I went to some effort to ensure error messages put out by programs I wrote or updated had enough information in them to point to the real problem. Chris |
Thanks, I've been told it's fairly distro-specific ...
I've been told by someone with more knowledge than myself in this area that that particular C++ linked file is probably not found in most Linux distros, so I'm going to try compiling a local version. "make" informs me that I don't have the CUDA toolkit installed on that particular platform, so that's next. I had dl'ed the toolkits on a WinXP computer but that was a while ago -- trying to do everything in Linux now, and don't even have that WinXP box available anymore.
I appreciate the analysis you've provided, it really helped me make some progress. Sure wish I had known some of this months ago. MF |
I've been running CUDALucas v2.05.1 under Windows 10. The bug with CUDA drivers >=310.70 and cc 2.0 cards still exists, but its behaviour is slightly different than before. Now it causes round off errors which are unrecoverable. Increasing FFT doesn't help. Only a restart of CUDALucas will overcome this. I've managed to match a residue using a run with similar behaviour, so no harm done, I just find it interesting.
[CODE]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Jul 25 11:22:57 | M39823489 100000 0xfbb2e64a7a7f1a9c | 2160K 0.15766 3.4682 346.82s | 1:14:16:10 0.25% | | Jul 25 11:28:44 | M39823489 200000 0x2f5728b5a730a4f2 | 2160K 0.15464 3.4692 346.92s | 1:14:10:42 0.50% | | Jul 25 11:34:31 | M39823489 300000 0x0ba36acaa24b9bed | 2160K 0.17899 3.4686 346.86s | 1:14:04:55 0.75% | | Jul 25 11:40:18 | M39823489 400000 0xee75195443fb9609 | 2160K 0.16523 3.4706 347.06s | 1:13:59:27 1.00% | | Jul 25 11:46:05 | M39823489 500000 0x5fd2260c0b578ccc | 2160K 0.16088 3.4706 347.06s | 1:13:53:52 1.25% | CUDALucas.cu(1878) : cudaSafeCall() Runtime API error 30: unknown error. Resetting device and restarting from last checkpoint. Using threads: square 32, splice 64. Continuing M39823489 @ iteration 500001 with fft length 2160K, 1.26% done Round off error at iteration = 500100, err = 0.49866 > 0.35, fft = 2160K. Restarting from last checkpoint to see if the error is repeatable. Using threads: square 32, splice 64. Continuing M39823489 @ iteration 500001 with fft length 2160K, 1.26% done Round off error at iteration = 500100, err = 0.49866 > 0.35, fft = 2160K. The error persists. Trying a larger fft until the next checkpoint. Using threads: square 32, splice 64. Continuing M39823489 @ iteration 500001 with fft length 2240K, 1.26% done Round off error at iteration = 500400, err = 0.42053 > 0.35, fft = 2240K. The error won't go away. I give up. Waiting for 0 seconds, press a key to continue ... ------- DEVICE 1 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1544 memClockRate (MHz) 2004 totalGlobalMem 3221225472 totalConstMem 65536 l2CacheSize 786432 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 Using threads: square 32, splice 64. Continuing M39823489 @ iteration 500001 with fft length 2160K, 1.26% done | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Jul 25 11:55:59 | M39823489 600000 0x2bfa199a2cf44a3c | 2160K 0.16825 3.4688 346.87s | 1:13:48:00 1.50% | | Jul 25 12:01:46 | M39823489 700000 0x4bc8273a3c703323 | 2160K 0.16035 3.4704 347.04s | 1:13:42:19 1.75% | | Jul 25 12:07:33 | M39823489 800000 0xdacbc4353a989884 | 2160K 0.15374 3.4703 347.03s | 1:13:36:36 2.00% | | Jul 25 12:13:20 | M39823489 900000 0x02e00bcef8ecf468 | 2160K 0.15009 3.4705 347.05s | 1:13:30:53 2.25% | | Jul 25 12:19:07 | M39823489 1000000 0x0fbd344fdae4c273 | 2160K 0.15717 3.4703 347.03s | 1:13:25:09 2.51% |[/CODE] |
I am getting this error in arch linux:
[code]device_number >= device_count ... exiting (This is probably a driver problem) [/code] I have 2 titan X with no SLI bridge and SLI option in xorg.conf is disabled. Tried using precompiled 6.5CUDA version and compiled from svn with CUDA7.0 with no avail. |
Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?
EDIT: I have deleted the checkpoint files between tests. |
[QUOTE=kladner;409447]Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?.[/QUOTE]
For CUDALucas, you just add ",1792K" to the end of the line in worktodo.txt. However, looking back at the ones I've done nothing in the 34M range used an FFT that small. Most were done with 1890K or 2000K depending on the GPU and version of cufft used. Check the timing for all three of 1890K, 2000K, and 2048K to see which is faster for you. |
[QUOTE=UBR47K;408850]I am getting this error in arch linux:
[code]device_number >= device_count ... exiting (This is probably a driver problem) [/code] I have 2 titan X with no SLI bridge and SLI option in xorg.conf is disabled. Tried using precompiled 6.5CUDA version and compiled from svn with CUDA7.0 with no avail.[/QUOTE] Are /dev/nvidia0 and /dev/nvidiactl present? Does the command nvidia-smi show your card present? If no to either, then the driver isn't installed and active. |
[QUOTE=frmky;409465]For CUDALucas, you just add ",1792K" to the end of the line in worktodo.txt. However, looking back at the ones I've done nothing in the 34M range used an FFT that small. Most were done with 1890K or 2000K depending on the GPU and version of cufft used. Check the timing for all three of 1890K, 2000K, and 2048K to see which is faster for you.[/QUOTE]
1890K and 2000K don't show up in 'GeForce GTX 580 fft.txt', but I'll plug them in and see what comes out. I think I ran CUFFTbench such that it limited the output in some way. It might be good to rerun it unfiltered. Here's that immediate region of the current fft.txt: [CODE] 1600 30232693 2.6039 1728 32597297 2.7957 1792 33778141 2.8987 2048 38492887 2.9924 2304 43194913 3.7102 2592 48471289 3.9732 2880 53735041 4.7980 [/CODE]Thanks for the suggestions! :smile: EDIT: Here are the results for the 3 FFTs: [QUOTE]M347xxxxx 5000 0xfd897327f15981c1 | 1890K 0.14111 3.5447 17.72s | 1:10:11:45 0.01% M347xxxxx 10000 0xe66eeb94b6e3a4e9 | 1890K 0.14258 3.5466 17.73s | 1:10:12:00 0.02% M347xxxxx 5000 0xfd897327f15981c1 | 2000K 0.03125 3.4163 17.08s | 1:08:57:26 0.01% M347xxxxx 10000 0xe66eeb94b6e3a4e9 | 2000K 0.03027 3.4354 17.17s | 1:09:02:41 0.02% M34733759 5000 0xfd897327f15981c1 | 2048K 0.02051 3.0706 15.35s | 1:05:37:19 0.01% M34733759 10000 0xe66eeb94b6e3a4e9 | 2048K 0.02100 3.0900 15.45s | 1:05:42:40 0.02%[/QUOTE]These are pretty much in keeping with FFT.txt. 1792K is the next lowest FFT with shorter times. I guess the [STRIKE]0.05[/STRIKE] 0.02 error rate is normal in this range. |
[QUOTE=kladner;409473]I think I ran CUFFTbench such that it limited the output in some way.[/QUOTE]
"some way" means that all sizes of the FFT which were eliminated are [B][U]slower[/U][/B] (for your card) than a higher FFT which remained. So, between 33M77 and 38M49 exponents, you will be faster using the 2M FFT (i.e 2048K in your file). The 1792K is too small, and any intermediary values will be slower than 2048K. When you do the "tuning", a list with all is created, then is parsed from the end, and any line which has a longer time than a line already parsed (i.e. a shorter FFT which takes longer than a longer FFT) is eliminated. There is no mystery in it. Remark that a smaller FFT does not always mean a faster iteration time. This depends of how "smooth" the FFT value is, i.e. how [B][U]your card*[/U][/B] can split it in small pieces and combine those [URL="https://en.wikipedia.org/wiki/Butterfly_diagram"]butterflies[/URL] ([URL="https://www.google.co.th/search?q=fft+butterfly&tbm=isch"]yaaarrr[/URL]!). That is why "power of two" is usually faster than the neighbors, because fast multiplication is kinda "[URL="https://en.wikipedia.org/wiki/Divide_and_conquer_algorithms"]divide et impera[/URL]", it splits the stuff in two, solve the halves, put the results together (well, kind of...) so when you split in two something which is not a multiple of (power of, when multiple splits) two, then a chunk is bigger than the other, and you have to do more splits and more work to put it together at the end. ---- * it depends of how many threads can the card do in the same time, memory for each, if it multiplies on 24 or on 32 bits, etc. - that is why you have to TUNE the card when you start using cudaLucas on it. ------------ Edit: yes, an error between 0.0006 and 0.24 is perfect. As higher as better (you will have a faster iteration time if you select a lower FFT, from that file, but the error is bigger). Some go to "as high as 0.35" but this is risky if you do not check the rounding error at every iteration. |
Thanks, LaurV. It has been a while since I messed with this stuff. I actually pulled a full list, and saw exactly what you are saying: the ones that don't show aren't worth looking at.
|
[QUOTE=frmky;409466]Are /dev/nvidia0 and /dev/nvidiactl present? Does the command nvidia-smi show your card present? If no to either, then the driver isn't installed and active.[/QUOTE]
I found out that it works after I rebooted my computer. Apparently I had installed a new driver update and I wasn't aware of it (piled updates for 1 month). |
[QUOTE=LaurV;409482]"some way" means that all sizes of the FFT which were eliminated are [B][U]slower[/U][/B] (for your card)[/QUOTE]
And version of CUDA/cufft used to compile the binary. If you switch to a binary compiled with a different version of CUDA, rerun the benchmark. |
[QUOTE=frmky;409524]And version of CUDA/cufft used to compile the binary. If you switch to a binary compiled with a different version of CUDA, rerun the benchmark.[/QUOTE]right! :tu:
|
[QUOTE=Oddball;223390]Fighting off a pit bull is quite hard, but you'll probably make it out alive. Now if you had to fight off a whole pack of wolves instead...[/QUOTE][URL]http://www.cnn.com/2015/09/12/us/new-york-pit-bull-attacks/[/URL]
|
[QUOTE=Xyzzy;410265][URL]http://www.cnn.com/2015/09/12/us/new-york-pit-bull-attacks/[/URL][/QUOTE]
[quote]Boves, who own a pit bull named Gina[/quote] Now ye know the reason... obviously, dogs can feel the smell better than we do, and who the hell knows what was in their heads. This reminds me some time ago when my boss had two huge dogs, I don't know the breed but they were huge and very nice dogs, and they both died from old age. Anyhow, that day my boss was walking the two dogs in the park and a couple of stray dogs attacked these two dogs, it was a short fight, these dogs were big and calm and you could have not many chances to put too much a fight against them, but you know how it is for the owner, my boss was worried and he jumped in between the dogs to separate them. In spite of the fact that there was no attack to person, i.e. the dogs continued to fight each-other, and they - luckily - didn't turn against the human in the middle, however he got caught in the leashes and felt down, breaking one of his front teeth in half. He had to have an implant and he came to work for a while with a "front hole", then later he was very funny for weeks because the doctor told him to unscrew and screw the "new toy" from time to time until the place gets used with it, he used to do that with his tongue and suddenly when you expected less, he was smiling at you nicked. |
[url]http://www.huffingtonpost.com/2014/03/06/dog-aggression-study-applied-animal-behaviour-science_n_4911861.html[/url]
[url]http://www.politifact.com/georgia/statements/2011/aug/03/elaine-boyer/are-pit-bulls-more-aggressive-other-dogs/[/url] [url]http://atts.org/breed-statistics/statistics-page1/[/url] [CODE]Breed Name Tested Passed Failed Percent [B]AMERICAN PIT BULL TERRIER 870 755 115 86.8% AMERICAN STAFFORDSHIRE TERRIER 657 555 102 84.5%[/B] AUSTRALIAN SHEPHERD 680 559 121 82.2% GOLDEN RETRIEVER 785 669 116 85.2%[/CODE] Just sayin' ;) |
To bring things back on topic, I'm still have issues with my GTX 570. Does anybody else reading this thread have one, and if so, 1) have you successfully completed an exponent without any crashes? 2) if 1, how?
|
[QUOTE=wombatman;410307]To bring things back on topic, I'm still have issues with my GTX 570. Does anybody else reading this thread have one, and if so, 1) have you successfully completed an exponent without any crashes? 2) if 1, how?[/QUOTE]
Let me get back to you on that. I have a GTX 570 which I really love, but can't run right now because of messed up power connectors. It has been a while since I could run it reliably. However, there are others here who have run those. I just don't have current operations to draw from. |
[QUOTE=wombatman;410307]To bring things back on topic, I'm still have issues with my GTX 570. Does anybody else reading this thread have one, and if so, 1) have you successfully completed an exponent without any crashes? 2) if 1, how?[/QUOTE]
What is it doing? |
It hits the TDRdelay limit, Windows kills the driver, and the driver is restarted. So far, I just deal with it via a looping shell file as advised by someone on here. That way the program is repeatedly run until the exponent is completed.
I also can't really identify a rhyme or reason for the crashes (i.e., it's not heat or anything like that). |
If you go back to [URL="http://www.mersenneforum.org/showthread.php?t=12576&page=205"]page 205[/URL] you'll see our discussion on this (with a link to some more info). Unfortunately, nothing has changed with the driver and the compute 2.0 cards. You could downgrade your driver or continue to use the batch file you're using. As far as my testing showed your results should still be good. Are you having any problems getting matches on your checksums?
|
[QUOTE=flashjh;410996]If you go back to [URL="http://www.mersenneforum.org/showthread.php?t=12576&page=205"]page 205[/URL] you'll see our discussion on this (with a link to some more info). Unfortunately, nothing has changed with the driver and the compute 2.0 cards. You could downgrade your driver or continue to use the batch file you're using. As far as my testing showed your results should still be good. Are you having any problems getting matches on your checksums?[/QUOTE]
Nah, all my checksums are right, so I'm assuming it's not a memory/computational problem either. For the meantime, I'll hold off on doing any CudaLucas work and stick to mfaktc. I'll come back to the problem and try to get it resolved later on. |
The only way to fix it is to downgrade the driver or get a new card. The 5xx series is too old to expect a driver fix at this point.
|
[QUOTE=flashjh;411004]The only way to fix it is to downgrade the driver or get a new card. The 5xx series is too old to expect a driver fix at this point.[/QUOTE]
Well, I'm hoping to get a new card around Christmas time, so I'll just add this reason to my list. ;) |
There is a known problem with 570 cards they are not very stable for LL, we had a discussion here in the past, but I don't remember the brand or the reason, neither how and if it was fixed. I can look for that topic if I have some time today during the lunch break. I am more into 580s and only tested few 570s (which I didn't own) in the past, luckily I never met the instability problem.
|
I would appreciate it if you do find it, but don't waste too much time on it! :smile:
|
[QUOTE=wombatman;410995]It hits the TDRdelay limit, Windows kills the driver, and the driver is restarted. So far, I just deal with it via a looping shell file as advised by someone on here. That way the program is repeatedly run until the exponent is completed.
I also can't really identify a rhyme or reason for the crashes (i.e., it's not heat or anything like that).[/QUOTE] Typically, on my MSI 580, over an 8 hour run (like when I am at work) it might restart once. Today, I left it running on a DC, and was pleasantly surprised that it was still at Count 0 (sorry, Mr Gibson) when I came home from work. When I first started testing this card with CUDALucas, it would not complete the short test at anything like the clocks it could run with MFAKTC. Not only that, but it failed at "stock" OC of 833 MHz, default voltage of 1.006, even with the memory clocked at 1600 MHz. The short of it is that I now run it at the core speed that is in the BIOS: 833. I turn the Vcore up to 1.025. The memory is running at 1800 MHz, with a 10 mv increase over whatever the base voltage is. (There are advantages to running an MSI card with Afterburner. :smile: I don't have the Memory and Auxiliary voltage settings on my EVGA 580. I am clueless about the Aux, but it was also set for +10 mv when I got the no-restart in 8 hours result.) An advantage, beyond possible stability, to running at the default frequency, is that if it restarts, the clock stays the same. However, voltage and memory settings are not affected. That is a very good thing, since CuLu would almost certainly give bad results with the RAM at over 2 GHz. The self-tests also gave bad results with stock voltage, so I'm glad those setting stick. In spite of the increased voltages, the card runs cooler than it would with MFAKTC at that speed/voltage combination. Perhaps DP calculation require more voltage for stability, while the DP throttling results in lower temps. Try setting the GPU core at or below stock frequency, with higher than normal voltage. Set memory low: maybe like 1500 MHz. The range of voltages I consider using comes from the wildly different "stock" voltages of the two 580s. For the MSI, it is 1.006 v; for the EVGA, it is 1.088 v. (And this insane voltage is to run at an 'oc' of 797 MHz!) I have never touched the higher voltage in real operation, even when running the EVGA at 882 MHz, as it is now, being fed 1.050 v. Meanwhile, the MSI card will run at its stock OC of 833 MHz, on 1.000 v, when running MFAKTC. It is right now running MFAKTC at 911 MHz, on 1.056 v. I never got the EVGA card to pass self-testing, but I never gave it the same amount of adjustment and testing as I did the MSI. My feeling is that the MSI is just a better card. If nothing else, it has twice as many voltage and frequency steps as the EVGA, which makes fine tuning much more effective. It also runs faster and cooler 'out of the box.' I suspect higher-binned parts. EDIT: .....and a better cooler than the Zalman aftermarket on the EVGA. |
[QUOTE=kladner;409447]Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?
EDIT: I have deleted the checkpoint files between tests.[/QUOTE] I am motivated to revive this question. I currently have two 34.7M DCs running on P95. Both are using 1792K FFT, and reporting 0.375 roundoff error. Are error tolerances set more strictly in CuLu? EDIT: Would the 'I' (capital I) (I/i -- increase/decrease error threshold) interactive command allow a smaller FFT? |
[QUOTE=kladner;411025]Typically, on my MSI 580, over an 8 hour run (like when I am at work) it might restart once. Today, I left it running on a DC, and was pleasantly surprised that it was still at Count 0 (sorry, Mr Gibson) when I came home from work. When I first started testing this card with CUDALucas, it would not complete the short test at anything like the clocks it could run with MFAKTC. Not only that, but it failed at "stock" OC of 833 MHz, default voltage of 1.006, even with the memory clocked at 1600 MHz.
The short of it is that I now run it at the core speed that is in the BIOS: 833. I turn the Vcore up to 1.025. The memory is running at 1800 MHz, with a 10 mv increase over whatever the base voltage is. (There are advantages to running an MSI card with Afterburner. :smile: I don't have the Memory and Auxiliary voltage settings on my EVGA 580. I am clueless about the Aux, but it was also set for +10 mv when I got the no-restart in 8 hours result.) An advantage, beyond possible stability, to running at the default frequency, is that if it restarts, the clock stays the same. However, voltage and memory settings are not affected. That is a very good thing, since CuLu would almost certainly give bad results with the RAM at over 2 GHz. The self-tests also gave bad results with stock voltage, so I'm glad those setting stick. In spite of the increased voltages, the card runs cooler than it would with MFAKTC at that speed/voltage combination. Perhaps DP calculation require more voltage for stability, while the DP throttling results in lower temps. Try setting the GPU core at or below stock frequency, with higher than normal voltage. Set memory low: maybe like 1500 MHz. The range of voltages I consider using comes from the wildly different "stock" voltages of the two 580s. For the MSI, it is 1.006 v; for the EVGA, it is 1.088 v. (And this insane voltage is to run at an 'oc' of 797 MHz!) I have never touched the higher voltage in real operation, even when running the EVGA at 882 MHz, as it is now, being fed 1.050 v. Meanwhile, the MSI card will run at its stock OC of 833 MHz, on 1.000 v, when running MFAKTC. It is right now running MFAKTC at 911 MHz, on 1.056 v. I never got the EVGA card to pass self-testing, but I never gave it the same amount of adjustment and testing as I did the MSI. My feeling is that the MSI is just a better card. If nothing else, it has twice as many voltage and frequency steps as the EVGA, which makes fine tuning much more effective. It also runs faster and cooler 'out of the box.' I suspect higher-binned parts. EDIT: .....and a better cooler than the Zalman aftermarket on the EVGA.[/QUOTE] This is good info. I have an EVGA, so that may be it, but I'll try your suggested tweaks and see if it does anything. Thanks! |
I see the difference between the cards as marketing related. MSI went for a reasonably nice overclock, at a reasonable voltage out of the box. It still has [U]lots[/U] of headroom, but one can only go up a step or two on stock voltage. The finer resolutions for voltage and frequency are great for someone with a feel for tweaking, but might confuse a beginning overclocker.
EVGA, on the other hand, set the voltage really high in BIOS, so that a beginner could just crank the clock up and say, "I got an amazing OC on STOCK VOLTAGE!" Of course, it would be running hot as hell, though maybe not so much in a game setting if the load is intermittent. I have not tried to see how high the EVGA will go at 1.088 v, because the system could not handle the heat. I start backing off the settings when temps reach around 78 C. |
Keep in mind that modern CPUs/GPUs/whatever have indivual voltage and clock settings.
While e.g. all GTX 970 have programmed the same baseclock each specific GPU has an individual voltage for that clock rate, an individual power consumption, an individual max clock and, of course, individual clock rate under a specific load. These are 12 ASUS GTX 970 (STRIX-GTX970-DC2OC-4GD5), factory OCed. All GPUs are running the same load (mfaktc in this case), GPU temperature is well below the temperature target aswell as power limit is not (yet) reached. [CODE] GPU #1 GPU Current Temp : 71 C Power Draw : 178.41 W Graphics : 1290 MHz GPU #2 GPU Current Temp : 72 C Power Draw : 175.58 W Graphics : 1290 MHz GPU #3 GPU Current Temp : 73 C Power Draw : 185.42 W Graphics : 1303 MHz GPU #4 GPU Current Temp : 73 C Power Draw : 177.85 W Graphics : 1303 MHz GPU #5 GPU Current Temp : 73 C Power Draw : 193.13 W Graphics : 1328 MHz GPU #6 GPU Current Temp : 70 C Power Draw : 178.00 W Graphics : 1315 MHz GPU #7 GPU Current Temp : 73 C Power Draw : 179.40 W Graphics : 1328 MHz GPU #8 GPU Current Temp : 70 C Power Draw : 174.76 W Graphics : 1290 MHz GPU #9 GPU Current Temp : 71 C Power Draw : 177.80 W Graphics : 1303 MHz GPU #10 GPU Current Temp : 71 C Power Draw : 177.25 W Graphics : 1278 MHz GPU #11 GPU Current Temp : 72 C Power Draw : 181.74 W Graphics : 1303 MHz GPU #12 GPU Current Temp : 71 C Power Draw : 184.82 W Graphics : 1316 MHz [/CODE] Oliver |
How do you get power draw?
|
just 'nvidia-smi -a' (Linux) with a recent driver. But this depends on the GPU(-BIOS) itself.
Teslas are fully featured by 'nvidia-smi' all the time, most Quadros, too. Recently(?) they added full support for Geforce TITAN (all flavours). From time to time they enable or disable features for "regular" Geforce cards. And it depends on the GPU-BIOS/card, too, I *guess* not all Geforce 970 will report this stuff. Oliver |
Must be the newer cards. On my oldish 590's, almost all entries show N/A.
Or could it be the driver? (currently 343.36) |
[QUOTE=blip;411136]Must be the newer cards. On my oldish 590's, almost all entries show N/A.
Or could it be the driver? (currently 343.36)[/QUOTE] 500 series and earlier did not do that fancy stuff. |
I saw a lot of those "N/A" going away after installing 352 series driver (in my case 352.41).
This a a cheap GTX 750 (non-Ti) with 352.41: [CODE]# nvidia-smi -a | grep -c " : " 105 # nvidia-smi -a | grep -c "N/A" 42 [/CODE] Oliver |
OK, 352.07 has lots of N/A for me while 352.41 has less N/A (more information).
|
I updated to 355.11, still the same N/A's for my GTX590.
|
[QUOTE=blip;411459]I updated to 355.11, still the same N/A's for my GTX590.[/QUOTE]
355.82 is no different on a 580. |
OK, highly depends on the GPU itself, on my GT 630 (GK208) the number of "N/A" remains the same, too.
I saw less "N/A" so far on[LIST][*]cheap 750 (non-Ti)[*]ASUS 970 Strix OC[*]ref. GTX 980[*]Palit 980Ti OC[/LIST] Oliver |
[QUOTE=TheJudger;411491]OK, highly depends on the GPU itself, on my GT 630 (GK208) the number of "N/A" remains the same, too.
I saw less "N/A" so far on[LIST][*]cheap 750 (non-Ti)[*]ASUS 970 Strix OC[*]ref. GTX 980[*]Palit 980Ti OC[/LIST] Oliver[/QUOTE] All of which are Maxwell v1 or v2 |
2 Attachment(s)
[QUOTE=wombatman;411047]This is good info. I have an EVGA, so that may be it, but I'll try your suggested tweaks and see if it does anything. Thanks![/QUOTE]
Well, so much for uninterrupted running of CuLu. I saw the display blink, and when I looked at CL, the restart count was at 5. I restarted the whole system, and CL restarted again within 20 minutes. This is with TDRDelay set at 128, which I saw someone mention somewhere on the forum. (I am also aware of your recommended setting of 10, WombatMan.:smile:) The voltages and clock settings were the same as on a previous DC which got through 12+ hours without resetting. I am starting to think that it is a waste of processing time to set TDRD that high. On the Afterburner monitor that card showed declining temperature for a few minutes. This suggests that it has stopped doing real work and is unlikely to resume without a reset. I will set TDRDelay set at 15, and see if I can screen-grab the pattern in the plot. EDIT: It sneaked one past me. TDRD was still 128, but there was no apparent long decline. All I could guess at being connected to the reset is the tiny blip attached. Notice that CL runs the card at 99% usage. This is true regardless of what else is running on the system. On the other hand, with mfaktc, usage drops from 99% to 98% when P95 gets going. This does not happen with the Small FFT stress test, but it does with Large and Blend tests. EDIT2: Reset again, just now. Still just the momentary dip in usage. I have seen, but failed to capture ravine-like plots in temp preceding a reset. I just made a simulation of such a plot using mfaktc. See below. |
Ooops! Wrong thread.
EDIT: Regarding the plots shown above, I have now concluded that the momentary dips are not associated with CuLu timing out. I have seen such when CL was running smoothly. |
I have a system with a Titan Z, a 590, and a 690 in it.
Previously all three were running mfaktc on both GPUs without issue. I switched the Z GPUs over to LL and that has been very successful, however the 590 and 690 both return all 0x000000000000 interim residues and 0.0 error rates. No errors that I can see. Any idea what is causing this? |
When testing my first Titan I couldn't run CL without wrong residues. Another user found out that downclocking the memory clock solves the problem. I'm successfully running for several years now on 2600 MHz instead of 3000 MHz. Give it a try.
|
Unfortunately my issue isn't with incorrect residues, rather no residues at all. It is as if something is failing moving the initial data to the card.
|
[QUOTE=airsquirrels;425008]Unfortunately my issue isn't with incorrect residues, rather no residues at all. It is as if something is failing moving the initial data to the card.[/QUOTE]
Me too. Every iteration says residue = 0. This is my first time running CUDALucas so I did not know that was wrong. I compiled CUDALucas to get it to work, so I could have easily done something wrong. Maybe it could check for all residue 0 and quit saying something is broke. |
The residue should definitely be a different non-zero hex string each iteration.
What's interesting is this same binary works fine with a different GPU in the system. I compiled against Cuda 6.5 |
[QUOTE=airsquirrels;424924]I have a system with a Titan Z, a 590, and a 690 in it.
Previously all three were running mfaktc on both GPUs without issue. I switched the Z GPUs over to LL and that has been very successful, however the 590 and 690 both return all 0x000000000000 interim residues and 0.0 error rates. No errors that I can see. Any idea what is causing this?[/QUOTE] Nothing is wrong, you just discovered a series of mersenne superprimes*. ---------------- * if a prime is the one that makes the last residue zero, then a superprime is the one which makes [U]all[/U] residues zero... |
[QUOTE=bgbeuning;425020]Me too. Every iteration says residue = 0.
This is my first time running CUDALucas so I did not know that was wrong. I compiled CUDALucas to get it to work, so I could have easily done something wrong. Maybe it could check for all residue 0 and quit saying something is broke.[/QUOTE] Are you the user who tried to manually submit two different "is prime!" results today? I knew they weren't right since the time between assignment and result was mere hours and there's no way it could have run a test in that time. At least it gave us a chance to test out the email feature when someone tries to submit a new prime using the manual forms. 3 times (one LL test was submitted twice, and a DC "is prime" submitted once). I know they're not right but I'm running my own tests just in the one in a billion million gazillion chance they happened to accidentally and coincidentally be prime, but I'm sure they won't be. They were done with: CUDALucas v2.05.1 If anyone can think of reasons why CUDALucas would report a prime result after only running for a little bit (in one case it was only a couple hours after the exponent, a 37M double-check, was assigned). Meanwhile, if you're doing a test and it magically reports that it's prime after an improbably short period of time, don't try to submit it to the server. Fix the software issue, run a real test, and then we'll talk. LOL |
I can read code.
What can i do ?:smile: |
[QUOTE=msft;425049]I can read code.
What can i do ?:smile:[/QUOTE] Any way to turn on additional debugging/logging ? Or a debug build? |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.