mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

James Heinrich 2015-02-27 17:11

[QUOTE=ET_;396552]meanwhile, here are the benchmarks for James[/QUOTE]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.

To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email]

ET_ 2015-02-27 18:08

[QUOTE=James Heinrich;396554]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.

To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email][/QUOTE]

It's compiled as cc5.0 with Cuda 6.5, not yet 5.2 for 7.0.

Luigi

MacFactor 2015-02-27 18:11

Thanks, but it didn't fix my problem. I'm starting to suspect a problem in the PATH setting. The original:
[CODE]
# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
PATH="$HOME/bin:$PATH"
fi[][/CODE]Should I change to:
[CODE]
PATH="$HOME/SHN/CUDALucas:$HOME/SHN:$HOME/bin:$PATH"[/CODE](where SHN is my account, w/admin privileges) ? Would I need to run some update or reboot after making that change ?

I've tried changing all this in /etc/ld.so.conf.d, but it doesn't work, even after running sudo ldconfig:
[CODE]
shn@Core2Duo ~/Desktop/CUDALucas $ ./CUDALucas -v
bash: ./CUDALucas: No such file or directory
shn@Core2Duo ~/Desktop/CUDALucas $ ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64 -v
bash: ./CUDALucas-2.05.1-CUDA4.2-linux-x86_64: No such file or directory[/CODE]Here are the files in that directory:
[CODE]
shn@Core2Duo ~/Desktop/CUDALucas $ ls -lt
total 888
-rwxr-xr-x 1 shn shn 425256 Feb 21 21:57 CUDALucas
-rwxr--r-- 1 shn shn 9092 Feb 21 21:43 CUDALucas.ini
drwxr-xr-x 2 shn shn 4096 Feb 21 14:38 CUDALucas-2.05.1-CUDA4.2-CUDA6.5-linux-x86_64
-rw-r--r-- 1 shn shn 0 Feb 21 11:48 output
-rwxr--r-- 1 shn shn 25 Feb 20 21:58 worktodo.txt
-rwxr-xr-x 1 shn shn 425256 Feb 11 17:27 CUDALucas-2.05.1-CUDA4.2-linux-x86_64
-rw-r--r-- 1 shn shn 35316 Jul 20 2014 CUDALucas README[/CODE]

Mark Rose 2015-02-27 18:15

[QUOTE=James Heinrich;396554]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.

To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email][/QUOTE]

I'll post benchmarks on two 2.1 cards shortly, a GT520 and a GT430, once the benchmarks have finished running.

Mark Rose 2015-02-27 18:43

[QUOTE=MacFactor;396570]Thanks, but it didn't fix my problem.[/quote]

I had to compile it from source as I was running into the same issue. On Ubuntu 14.04, I installed nvidia-cuda-toolkit, changed the cuda install location at the top of Makefile to /usr, ran `make`, and it worked.

Mark Rose 2015-02-27 19:16

2 Attachment(s)
[QUOTE=James Heinrich;396554]Thanks, much appreciated -- I didn't have any Compute 5.2 benchmarks. It's within 3% of predicted so I'm happy.

To everyone else: I still don't have any benchmarks for Compute 5.0, 3.5 or 2.1 under CuLu 2.05, so more benchmarks are welcome, either here or [email]james@mersenne.ca[/email][/QUOTE]

Here are two benchmarks for low end 2.1 cards. Attached are the command outputs.

Device GeForce GT 430
Compatibility 2.1
clockRate (MHz) 1400
memClockRate (MHz) 600

fft max exp ms/iter
1024 19535569 13.1073
1152 21921901 14.9833
1225 23280269 16.5440
1280 24302527 17.4606
1296 24599717 17.5964
1323 25101101 18.1876
1372 26010389 20.1547
1568 29640913 20.1924
1600 30232693 20.5022
1728 32597297 23.1662
1792 33778141 24.0094
2048 38492887 26.3347
2304 43194913 31.8484
2401 44973503 33.1631
2592 48471289 36.1269
2744 51250889 40.7279
3200 59570449 40.8349
3456 64229677 45.8811
3888 72075517 55.4194
4096 75846319 55.6313
4320 79902611 73.0512
4375 80897867 73.7962
4480 82797151 74.6121
5184 95507747 75.5244
5488 100984691 82.0007
6272 115080019 83.8615
6400 117377567 85.8051
6912 126558077 98.1325
7776 142017539 110.4415
8192 149447533 115.0217

Device GeForce GT 520
Compatibility 2.1
clockRate (MHz) 1620
memClockRate (MHz) 600

fft max exp ms/iter
1024 19535569 19.1582
1152 21921901 21.1080
1260 23930909 25.1102
1296 24599717 25.1694
1344 25490893 26.9664
1440 27271147 27.6734
1568 29640913 29.6729
1600 30232693 31.0359
1728 32597297 32.7081
1792 33778141 36.0988
2048 38492887 37.7887
2240 42020509 45.4196
2304 43194913 46.2915
2592 48471289 50.0936
2880 53735041 58.4163
3136 58404433 62.8308
3200 59570449 68.3727
3360 62483353 71.3736
3456 64229677 72.2613
3584 66556463 74.7343
3600 66847171 78.6148
4096 75846319 80.5501
4320 79902611 93.3572
4608 85111207 96.0147
5040 92911087 104.0299
5184 95507747 106.3927
5600 103000823 124.7254
5760 105879517 126.8726
6048 111056879 131.7726
6144 112781477 135.7366
6272 115080019 137.5057
6300 115582697 143.4483
6480 118813021 147.0057
6720 123117161 151.9291
6912 126558077 157.1928
7168 131142761 157.2835
7200 131715607 159.5422
7776 142017539 161.8901
7840 143161159 175.8101
8064 147162241 180.3350
8192 149447533 180.9630

James Heinrich 2015-02-27 19:39

[QUOTE=Mark Rose;396581]Here are two benchmarks for low end 2.1 cards.[/QUOTE]Thanks. Those numbers are a fair bit higher (~25%) than expected. If someone has a higher-end Compute 2.1 card (e.g. GTX 460 or 560) that matches those numbers I'll feel more confident in updating the chart.

Mark Rose 2015-02-27 21:50

[QUOTE=James Heinrich;396582]Thanks. Those numbers are a fair bit higher (~25%) than expected. If someone has a higher-end Compute 2.1 card (e.g. GTX 460 or 560) that matches those numbers I'll feel more confident in updating the chart.[/QUOTE]

They were done with CUDA 5.5 if it matters.

owftheevil 2015-02-27 23:20

I also have results for a 570, 780, Titan, and Titan black if you want them.


[CODE]
Device GeForce GTX 560 Ti
Compatibility 2.1
clockRate (MHz) 1940
memClockRate (MHz) 2080


fft max exp ms/iter
1024 19535569 2.3434
1152 21921901 2.6485
1260 23930909 3.0699
1296 24599717 3.0931
1344 25490893 3.3730
1440 27271147 3.4673
1568 29640913 3.7194
1600 30232693 3.9279
1728 32597297 4.0755
1792 33778141 4.3713
2048 38492887 4.6964
2240 42020509 5.5626
2304 43194913 5.5691
2592 48471289 6.2081
2688 50227213 7.0311
2880 53735041 7.1812
3136 58404433 7.7523
3200 59570449 8.2700
3360 62483353 8.5920
3456 64229677 8.6056
3584 66556463 8.9110
3600 66847171 9.6526
3780 70115887 9.8984
4096 75846319 9.9148
4320 79902611 11.2540
4608 85111207 11.4486
5040 92911087 12.6529
5184 95507747 12.9797
5292 97454309 14.6345
5376 98967641 14.8351
5760 105879517 15.0610
6048 111056879 15.8011
6144 112781477 16.1751
6272 115080019 16.2582
6300 115582697 17.0884
6480 118813021 17.3343
6720 123117161 17.9061
6912 126558077 18.2475
7168 131142761 18.5232
7200 131715607 19.6253
7776 142017539 19.8418
7840 143161159 21.2217
8192 149447533 21.2728
[/CODE]

James Heinrich 2015-02-28 00:11

[QUOTE=owftheevil;396608]I also have results for a 570, 780, Titan, and Titan black if you want them.[/QUOTE]Your 560ti result was more in line with what I'd seen before. These numbers are from v2.05 correct?

I already have 570/580 results, but more is merrier if you have them handy. I would also be very interested in your 780/Titan/Black results please.

Mark Rose 2015-02-28 01:54

I can bench on three more 580's and a 760 if it would be useful.

owftheevil 2015-02-28 02:14

[CODE]
Device GeForce GTX 780
Compatibility 3.5
clockRate (MHz) 1032
memClockRate (MHz) 3004


fft max exp ms/iter
1024 19535569 1.2052
1080 20580341 1.4758
1152 21921901 1.4977
1280 24302527 1.6422
1296 24599717 1.6571
1323 25101101 1.8665
1344 25490893 1.8907
1440 27271147 1.9198
1568 29640913 1.9577
1600 30232693 2.0038
1728 32597297 2.2062
1792 33778141 2.2945
2048 38492887 2.4171
2304 43194913 2.8495
2560 47885689 3.2959
2592 48471289 3.3462
2646 49459153 3.8456
2700 50446621 3.8838
2800 52274087 3.9217
2880 53735041 3.9759
2916 54392209 3.9872
3136 58404433 4.0511
3200 59570449 4.2620
3240 60298969 4.5078
3584 66556463 4.5897
4096 75846319 4.9067
4608 85111207 5.9424
5120 94353877 6.7233
5184 95507747 6.9934
5292 97454309 7.5551
5600 103000823 7.5775
5832 107174381 8.0734
6272 115080019 8.3218
6400 117377567 8.9122
6912 126558077 9.1317
7168 131142761 9.4188
7200 131715607 9.7199
8192 149447533 10.2932
[/CODE]

owftheevil 2015-02-28 02:16

All with CUDAlucas 2.05.1 and CUDA-5.5

[CODE]
Device GeForce GTX TITAN
Compatibility 3.5
clockRate (MHz) 980
memClockRate (MHz) 3004


fft max exp ms/iter
1024 19535569 0.6874
1080 20580341 0.8469
1296 24599717 0.8590
1568 29640913 1.0307
1600 30232693 1.1269
1728 32597297 1.2268
2000 37609879 1.2813
2048 38492887 1.3269
2592 48471289 1.6516
2646 49459153 2.0153
2700 50446621 2.0498
3136 58404433 2.1236
3200 59570449 2.4147
3240 60298969 2.4481
4000 74106457 2.5415
4096 75846319 2.5937
4320 79902611 3.2794
4374 80879779 3.3026
4500 83158811 3.3899
4536 83809729 3.4085
5184 95507747 3.4727
5292 97454309 3.9633
5400 99399967 4.0524
5600 103000823 4.2176
5832 107174381 4.3864
6000 110194363 4.5187
6048 111056879 4.5813
6125 112440191 4.6094
6272 115080019 4.6856
6400 117377567 4.7771
6480 118813021 4.8461
6561 120266023 4.8945
6750 123654943 5.0736
6912 126558077 5.1885
7000 128134459 5.2399
7056 129137381 5.3293
8000 146019329 5.3398
8192 149447533 5.4728
[/CODE]

[CODE]
Device GeForce GTX TITAN Black
Compatibility 3.5
clockRate (MHz) 1110
memClockRate (MHz) 3500


fft max exp ms/iter
1024 19535569 0.6642
1080 20580341 0.8309
1296 24599717 0.8310
1568 29640913 0.9913
1600 30232693 1.0648
1728 32597297 1.1501
2000 37609879 1.2579
2048 38492887 1.2876
2592 48471289 1.6101
2744 51250889 1.9976
3136 58404433 2.0265
3200 59570449 2.4074
3240 60298969 2.4360
4000 74106457 2.4614
4096 75846319 2.5284
4320 79902611 3.2676
4374 80879779 3.2938
5184 95507747 3.3084
5292 97454309 3.9584
5488 100984691 4.0006
5600 103000823 4.1916
5832 107174381 4.3808
6048 111056879 4.5168
6125 112440191 4.5787
6250 114685037 4.6657
6272 115080019 4.6844
6400 117377567 4.7705
6480 118813021 4.8617
6561 120266023 4.8914
6750 123654943 5.0513
8000 146019329 5.1017
8192 149447533 5.1987

[/CODE]


[CODE]
Device GeForce GTX 570
Compatibility 2.0
clockRate (MHz) 1464
memClockRate (MHz) 1900


fft max exp ms/iter
1024 19535569 1.7232
1080 20580341 2.0214
1120 21325891 2.0459
1152 21921901 2.0549
1176 22368691 2.2586
1296 24599717 2.3083
1440 27271147 2.5121
1568 29640913 2.7261
1600 30232693 2.8704
1728 32597297 3.0130
2048 38492887 3.2886
2160 40551479 4.0701
2304 43194913 4.1211
2352 44075249 4.4544
2592 48471289 4.4614
2880 53735041 5.1680
3072 57237889 5.4704
3136 58404433 5.6023
3200 59570449 6.0137
3360 62483353 6.0838
3456 64229677 6.3130
3584 66556463 6.3526
4096 75846319 6.8811
4320 79902611 7.7969
4608 85111207 7.9405
5040 92911087 8.9266
5184 95507747 9.2909
5400 99399967 10.4533
5600 103000823 10.6724
5670 104260469 10.8861
5760 105879517 10.9340
6144 112781477 10.9955
6272 115080019 11.9533
6400 117377567 12.3915
6480 118813021 12.5910
7056 129137381 12.9406
7168 131142761 13.2404
7200 131715607 13.3282
7776 142017539 13.8598
7840 143161159 14.8712
8192 149447533 15.0081

[/CODE]

LaurV 2015-02-28 02:36

Your Titan/Titan Black are about 1-2% faster than mine :tantrum:.
Were they done with P95 running? (mine yes, I try to go for "as close to real conditions as possible" when I build them)
Or you keep them cooler? (how? mine are air cooled and with hot Thai days they lose a lot of productivity when the aircond is off, for example, the TDP go down, and the iterations go from 3 to 4 ms, or even 5 etc.)
Just curious.

MacFactor 2015-02-28 03:56

Please see this post:
[url]http://www.mersenneforum.org/showthread.php?p=396570#post396570[/url]

James Heinrich 2015-02-28 04:21

[QUOTE=owftheevil;396624]I also have results for a 570, 780, Titan, and Titan black if you want them.[/QUOTE]Thanks. More benchmarks are required, I think. Between the three 3.5 cards the output devitates +/- 25% from my expected. The Titan Black is right on the mark, the 780 is 30% slower and the Titan is 20% faster than expected.

@LaurV (and anyone else who has one), if you can send me your 780/Titan* benchmarks (2.05) sometime that would be great.

owftheevil 2015-02-28 11:01

[QUOTE=LaurV;396626]Your Titan/Titan Black are about 1-2% faster than mine :tantrum:.
Were they done with P95 running? (mine yes, I try to go for "as close to real conditions as possible" when I build them)
Or you keep them cooler? (how? mine are air cooled and with hot Thai days they lose a lot of productivity when the aircond is off, for example, the TDP go down, and the iterations go from 3 to 4 ms, or even 5 etc.)
Just curious.[/QUOTE]

These were all done with mprime running. Mine are water cooled and go from 37--43 C depending on room temperature. That's probably the difference. The titan black settles in at 1019 Mhz and the regular titan at 923--947 Mhz.

owftheevil 2015-02-28 11:11

[QUOTE=MacFactor;396630]Please see this post:
[URL]http://www.mersenneforum.org/showthread.php?p=396570#post396570[/URL][/QUOTE]

Two others that had this same problem solved it by building their own binaries locally. Its easy, but involves a large download (most of which is unnecessary, but unavoidable) if you don't have all of the cuda toolkit already.


Has anyone running Linux been able to get the binaries provided on Sourceforge working?

LaurV 2015-03-01 09:57

[QUOTE=owftheevil;396653]These were all done with mprime running. Mine are water cooled and go from 37--43 C depending on room temperature. That's probably the difference. The titan black settles in at 1019 Mhz and the regular titan at 923--947 Mhz.[/QUOTE]
:tu: :tu:
Mine stabilizes at ~900 during the day, and ~970 during the night, with the TDP between say, 60 and 75%. This is the "worst" period of the year, because it is getting hot during the day, close to 30C, but we have around 16C during the night, sometimes higher. So, with windows open in the morning/night, the temperature in the rooms during the day stays under 25C and is not so hot to run the aircond. So, it is "not so hot, no so cold". When it is hot outside, as from end of March to June or so (before the raining season starts) we run the aircond in the house so it is better for the air cooled cards, somehow, during the day. Well, it is not really better, but the day/nigh difference is not so big.

Also, for the "water cooled" part, this period is bad because the sun is down and it shines under the roof of the cabinet, heating a corner of one of the radiators, therefore there are bigger temperature differences between day and night. For example, one water loop with one 2600k CPU and two 580's can reach as high as 60C during the day and as low as 28C during the night. In April, the hottest period in this area (45C during the day and 30C in the night) the temperature of the water plays around 45-50 during the day, in spite of the higher atmospheric temperature, because the sun is high in the sky and all the water cabinet is in the shadow.

I have to move my ass one of these days and mount that Titan water cooling blocks I was talking about last year... it's getting rusty in the drawers.. (I did not change it as long as the air cooler worked, in spite of the fact that I bought the water block after I got the first Titan, from Xyzzy, years ago. That one is still running, and on air, see my posts in my cooling thread here around).

Mark Rose 2015-03-01 18:25

[QUOTE=LaurV;396725]For example, one water loop with one 2600k CPU and two 580's can reach as high as 60C during the day and as low as 28C during the night.[/quote]

What do you have in the way of radiators on that loop? I am curious.

LaurV 2015-03-02 13:44

offtopic [URL="http://www.mersenneforum.org/showthread.php?p=396793"]moved here[/URL] :razz:

Karl M Johnson 2015-03-21 09:16

Does anyone have a GTX 780 Ti and is willing to benchmark it with CUDALucas v2.05.1(CUDA 6.5) ?
I was shocked to learn that it might be faster than GTX Titan (while NV took away the option to enable 1/3 FP64 performance vs 1/8 default, according to [URL="http://www.hardwareluxx.com/index.php/reviews/hardware/vgacards/28513-test-nvidia-geforce-gtx-780-ti-.html?start=1"]one source[/URL]).
The big question is: does GTX 780 Ti beat GTX Titan, even if the latter has 1/3 FP64 performance enabled?

For comparison, it takes CUDALucas 2.12 ms to perform 1 LL iteration(last two digits cut off) on M57885161, using GTX Titan.
Disabling double precision in NV CP slows CL to 3.62 ms / iteration.
This suggests that 780 Ti should be slower than GTX Titan, even if it has more shaders and a higher vRAM frequency.
Or not?

Anyways, if someone's up for the task, make sure to run threadbench and FFTbench, for best parameters:
[code]
CUDALucas2.05.1-CUDA6.5-Windows-x64 -cufftbench 1024 8192 3
CUDALucas2.05.1-CUDA6.5-Windows-x64 -threadbench 1024 8192 3 1[/code]

Over and out.

axn 2015-03-21 10:09

[QUOTE=Karl M Johnson;398271]For comparison, it takes CUDALucas 2.12 ms to perform 1 LL iteration(last two digits cut off) on M57885161, using GTX Titan.
Disabling double precision in NV CP slows CL to 3.62 ms / iteration.[/QUOTE]

Mersenne.ca gives 3.0ms/it for a 55m expo for [B]Titan[/B]! That suggests that perhaps it is the Titan data that needs to updated at the site.

Karl M Johnson 2015-03-21 14:30

[QUOTE=axn;398277]That suggests that perhaps it is the Titan data that needs to updated at the site.[/QUOTE]
GTX Titan is 2 years old, CUDALucas got faster during that time.
I selected CUDA 6.5 binary since it had the best timings during the FFT bench (wondering if it's the same for all GPUs with shader model 3.5?)

James Heinrich 2015-03-21 17:24

I've twiddled how the performance is shown for the (original) Titan based on how a recently submitted benchmark lines up with the other Titan benchmark I have. What I need now is more benchmarks from both 780 and Titan Black to figure that part out too. Benchmarks from Titan Z and Titan X would great too, but I'm not sure how many people have those available.

Karl M Johnson 2015-03-23 14:12

Got some fresh info about GTX 780 Ti.
[CODE]
69M exp
1045 MHZ GPU clock, 3196 MHz mem clock, iteration time: 4.22 ms/iter
1176 MHz, 3570 MHz mem clock, iteration time: 3.7 ms/iter
[/CODE]

With my GTX Titan, I get 2.76ms/iteration on the same exponent.

Promising data :smile:

axn 2015-03-23 14:25

[QUOTE=Karl M Johnson;398407]Got some fresh info about GTX 780 Ti.
[CODE]
69M exp
1045 MHZ GPU clock, 3196 MHz mem clock, iteration time: 4.22 ms/iter
1176 MHz, 3570 MHz mem clock, iteration time: 3.7 ms/iter
[/CODE]

With my GTX Titan, I get 2.76ms/iteration on the same exponent.

Promising data :smile:[/QUOTE]

Hmmm... Mersenne.ca has been updated with new numbers, and now things have swung the other way. 780 Ti has apparently suffered a reduction in performance (how?) and Titan is now second. I think Titan Black and Titan Z also needs to be rebenchmarked.

James Heinrich 2015-03-23 14:42

[QUOTE=Karl M Johnson;398407]Got some fresh info about GTX 780 Ti.[/QUOTE]If you have access to that 780 Ti I would much appreciate a benchmark.

[QUOTE=axn;398408]I think Titan Black and Titan Z also needs to be rebenchmarked.[/QUOTE][QUOTE=James Heinrich;398294]What I need now is more benchmarks from both 780 and Titan Black to figure that part out too. Benchmarks from Titan Z and Titan X would great too[/QUOTE]I would tend to agree with you :smile:

stars10250 2015-03-24 04:00

Device GeForce GTX 780 Ti
Compatibility 3.5
clockRate (MHz) 1019
memClockRate (MHz) 3574

fft max exp ms/iter
3136 58404433 3.0522
3200 59570449 3.2748
3240 60298969 3.5230
3584 66556463 3.6101
4096 75846319 3.6664
4608 85111207 4.5590
4800 88579669 5.1143
4860 89662967 5.5272
4900 90384989 5.7338
5000 92189509 5.8190

Threads
3500 256 32 3.77875
3528 256 512 3.77450
3584 256 512 3.60728
3600 256 256 3.85989
3645 256 128 4.17807
3675 128 128 4.60588
3750 256 64 4.63702
3780 256 64 4.21302
3840 256 512 4.23805
3888 256 128 4.03913
3920 256 64 4.29968
3969 256 64 4.51075
4000 256 128 3.93409
4032 256 512 4.17210
4050 256 64 4.53155
4096 256 64 3.66282
4116 256 32 5.14177
4200 256 128 4.98391
4320 128 512 4.65157
4374 128 256 5.18404
4375 128 128 5.32028
4410 256 32 5.00846
4480 256 256 4.77655
4500 256 128 4.82445

axn 2015-03-24 04:21

New numbers for Titan X have appeared at mersenne.ca. 30% more thruput compared to 980.

The TDP for Titan Z is wrong at the site. It should be 375w, not 500w.

James Heinrich 2015-03-24 11:35

[QUOTE=James Heinrich;398410]If you have access to that 780 Ti I would much appreciate a benchmark.[/QUOTE]And by 780 Ti what I really meant was GTX 980.
(but thanks [i]stars10250[/i]).

[QUOTE=axn;398463]The TDP for Titan Z is wrong at the site. It should be 375w, not 500w.[/QUOTE]Fixed, thanks.

vsuite 2015-04-02 23:29

32 bit Binary
 
Is there a 32-bit recent binary for CUDA4.2 please?

flashjh 2015-04-02 23:53

[QUOTE=vsuite;399243]Is there a 32-bit recent binary for CUDA4.2 please?[/QUOTE]

[URL="http://downloads.sourceforge.net/project/cudalucas/CUDALucas.2.05.1-CUDA4.2-CUDA6.5-Windows-32.64.7z?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fcudalucas%2F&ts=1428018767&use_mirror=iweb"]Here[/URL] you go.

vsuite 2015-04-03 02:37

Thanks flashjh.

Now back in GIMP, and 2.03 appeared to only have 64bit bins, so I did not download 2.05. :ermm::whistle:

GT 640 on 7/64 (X2 processor) slow, so mfaktc only.
GTX 460 on XP/32 (Core 2 Quad) will be both CudaLucas and mfaktc.

bpcvdhelm 2015-04-06 11:00

Compiler optimalisation
 
Hi guys,

My computer is running mprime for some time and for fun I started CUDALucas beside it. It runs great on my GTX970!

I translated the CUDALucas source on my computer. And after some inspection in the Makefile I see:
NAME = CUDALucas
VERSION = 2.05.1
OptLevel = 1

I did some test with OptLevel 3 and here are the results (tiny test, btw the production CUDALucas is running in the background)):

With OptLevel = 1:
[FONT=Courier New][SIZE=1]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Apr 06 11:29:32 | M110503 10000 0xacb29fc05973d0a8 | 8K 0.00001 0.2062 2.06s | 0:20 9.04% |
| Apr 06 11:29:34 | M110503 20000 0x9cd7ca8aa594b33c | 8K 0.00001 0.2060 2.06s | 0:18 18.09% |
| Apr 06 11:29:36 | M110503 30000 0xba1ef4f09a7c955a | 8K 0.00001 0.2062 2.06s | 0:16 27.14% |
| Apr 06 11:29:38 | M110503 40000 0x827b27dad4e98554 | 8K 0.00001 0.2060 2.06s | 0:14 36.19% |
| Apr 06 11:29:40 | M110503 50000 0x9e6c039053cc2c17 | 8K 0.00001 0.2061 2.06s | 0:12 45.24% |
| Apr 06 11:29:42 | M110503 60000 0xdb48afced9ebd397 | 8K 0.00001 0.2060 2.06s | 0:10 54.29% |
| Apr 06 11:29:44 | M110503 70000 0xd650094b406761ed | 8K 0.00001 0.2061 2.06s | 0:08 63.34% |
| Apr 06 11:29:46 | M110503 80000 0xa4d69c031cb0caa2 | 8K 0.00001 0.2060 2.06s | 0:06 72.39% |
| Apr 06 11:29:48 | M110503 90000 0xf1427358e52c1458 | 8K 0.00001 0.2060 2.06s | 0:04 81.44% |
| Apr 06 11:29:50 | M110503 100000 0x0f4385fec05eb193 | 8K 0.00001 0.2060 2.06s | 0:02 90.49% |
| Apr 06 11:29:52 | M110503 110000 0xc5bb3186236db9db | 8K 0.00001 0.2061 2.06s | 0:00 99.54% |[/SIZE][/FONT]

But with OptLevel = 3:
[FONT=Courier New][SIZE=1]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Apr 06 11:30:19 | M110503 10000 0xacb29fc05973d0a8 | 8K 0.00001 0.2058 2.05s | 0:20 9.04% |
| Apr 06 11:30:21 | M110503 20000 0x9cd7ca8aa594b33c | 8K 0.00001 0.2058 2.05s | 0:18 18.09% |
| Apr 06 11:30:23 | M110503 30000 0xba1ef4f09a7c955a | 8K 0.00001 0.2058 2.05s | 0:16 27.14% |
| Apr 06 11:30:25 | M110503 40000 0x827b27dad4e98554 | 8K 0.00001 0.2057 2.05s | 0:14 36.19% |
| Apr 06 11:30:27 | M110503 50000 0x9e6c039053cc2c17 | 8K 0.00001 0.2059 2.05s | 0:12 45.24% |
| Apr 06 11:30:29 | M110503 60000 0xdb48afced9ebd397 | 8K 0.00001 0.2057 2.05s | 0:10 54.29% |
| Apr 06 11:30:31 | M110503 70000 0xd650094b406761ed | 8K 0.00001 0.2046 2.04s | 0:08 63.34% |
| Apr 06 11:30:34 | M110503 80000 0xa4d69c031cb0caa2 | 8K 0.00001 0.2057 2.05s | 0:06 72.39% |
| Apr 06 11:30:36 | M110503 90000 0xf1427358e52c1458 | 8K 0.00001 0.2059 2.05s | 0:04 81.44% |
| Apr 06 11:30:38 | M110503 100000 0x0f4385fec05eb193 | 8K 0.00001 0.2058 2.05s | 0:02 90.49% |
| Apr 06 11:30:40 | M110503 110000 0xc5bb3186236db9db | 8K 0.00001 0.2059 2.05s | 0:00 99.54% |[/SIZE][/FONT]

Why isn' t the setting standard at 3? See the man pages for gcc!

PS: For now I leave my (production) CUDALucas on OptLevel=1.
PS2: I have some experience with C programming. I was one of the hercules-390 (mainframe emulator) developer for 12 years. My expertise was performance, maybe I can help?

Kind regards,

Bernard van der Helm

MacFactor 2015-07-13 02:29

No help here ? Is ANYONE running CUDALucas under Linux Mint ?
 
I haven't done a compile outside of an IDE, but if someone will give me a clue what a 'make' statement (which switches, options) would look like I'll give it a try -- I hate to have a couple of GPUs underutilized just because the OS can't find a file which is sitting right there.

MF

Dubslow 2015-07-13 04:38

Could you be more specific what the error is, what you've already tried?

MacFactor 2015-07-13 14:03

Sorry, thought I was responding to my own post, so it would be obvious ...
 
There's something about this forum that's not organized the way I'm used to from other forums.
Makes it hard to keep related posts together.

"I've downloaded an executable (CUDALucas) that I know other people are running under Linux. Typing ./filename gives me the error message "No such file or directory". ls -l clearly show the file is there, belongs to me, and I have permission to read,write, and execute. I get the same error in Lubuntu ! How can a file which is so obviously "there" not be there ??"

Note that I've already got mfaktc up and running on two different systems, so I've had some experience with NVidia drivers and CUDA libs at this point, and I'm starting to doubt whether it has anything to do with finding the linked files. Here's a summary:
[url]http://www.mersenneforum.org/showthread.php?p=396570#post396570[/url]

I've tried running in sh, bsh, bash, csh, tcsh, none of which helps.

file(1) returns ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, .... not stripped. Hope the ... isn't important stuff. I'm running it on another computer, using this one for browser only.

It's not like I'm getting any usable error messages at all. And the instructions at Source Forge are pretty minimalist -- dl the files, change one env variable. Didn't work, now what ?

Mark Rose 2015-07-13 14:12

Numerous people have run into that issue, myself included. It seems to happy with Ubuntu and derivatives. If you're already on Linux, it should be simple enough to recompile it. Note that TheJudger has found a bug in CUDA 7.x that is not fixed, so you should compiling it using an earlier version.

chris2be8 2015-07-13 15:30

UNIX (including Linux) can produce rather misleading error messages. The program you are trying to run is probably calling some other file which it can't find, putting out that message and terminating. Of course it [B]should[/B] also say what it's trying to call, but the programmer writing the relevant code has to make it do that.

Try "strace ./CUDALucas" which should trace all the system calls CUDALucas makes with parameters passed to them. You may get quite a lot of output, but hopefully one line will tell you what file it's trying to call.

Read the man page for strace to help understand the output.

Chris

MacFactor 2015-07-13 15:52

[QUOTE=Mark Rose;405795]Numerous people have run into that issue, myself included. It seems to happy with Ubuntu and derivatives. If you're already on Linux, it should be simple enough to recompile it. Note that TheJudger has found a bug in CUDA 7.x that is not fixed, so you should compiling it using an earlier version.[/QUOTE]
Thanks, I sure wish I had known that earlier ! You''d think they'd update the instructions ...

I've dl'ed the source, will try compiling later. I was reluctant to try that because I've never done a compilation outside of an IDE -- now it sounds like it would have been easier to go straight for the compile, instead of wasting so much time trying to figure out how to get the binaries to work.

SHN

MacFactor 2015-07-13 16:18

Not sure that helps much ...
 
[CODE]
shn@C2D ~/Desktop/CUDALucas $ strace ./CUDALucas
execve("./CUDALucas", ["./CUDALucas"], [/* 46 vars */]) = -1 ENOENT (No such file or directory)
write(2, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
) = 40
exit_group(1) = ?
+++ exited with 1 +++



[/CODE]
[CODE]shn@C2D ~/Desktop/CUDALucas/other_versions $ strace CUDALucas-2.05.1-CUDA4.2-linux-x86_64
strace: Can't stat 'CUDALucas-2.05.1-CUDA4.2-linux-x86_64': No such file or directory
[/CODE][QUOTE=chris2be8;405799]UNIX (including Linux) can produce rather misleading error messages. The program you are trying to run is probably calling some other file which it can't find, putting out that message and terminating. Of course it [B]should[/B] also say what it's trying to call, but the programmer writing the relevant code has to make it do that.

Try "strace ./CUDALucas" which should trace all the system calls CUDALucas makes with parameters passed to them. You may get quite a lot of output, but hopefully one line will tell you what file it's trying to call.

Read the man page for strace to help understand the output.

Chris[/QUOTE]

MacFactor 2015-07-13 17:19

This may be part of the problem ...


[CODE]
~/Desktop/CUDALucas $ ldd runll
linux-vdso.so.1 => (0x00007fffd8d9c000)
libcufft.so.4 => /home/shn/CUDAlibs/libcufft.so.4 (0x00007fc31dca7000)
libcudart.so.4 => /home/shn/CUDAlibs/libcudart.so.4 (0x00007fc31da49000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc31d742000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc31d37d000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc31d179000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc31cf5a000)
[B] libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc31cc56000)[/B]
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc31ca40000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc31c837000)
/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fc31fced000)
[/CODE]The file in bold cannot be found.

frmky 2015-07-13 19:29

[QUOTE=Mark Rose;405795]Note that TheJudger has found a bug in CUDA 7.x that is not fixed, so you should compiling it using an earlier version.[/QUOTE]

That doesn't affect CUDALucas. Compiling with CUDA 7.x is fine. Just don't compile mfaktc with CUDA 7.x.

Mark Rose 2015-07-13 19:35

[QUOTE=frmky;405816]That doesn't affect CUDALucas. Compiling with CUDA 7.x is fine. Just don't compile mfaktc with CUDA 7.x.[/QUOTE]

Good to know. :)

chris2be8 2015-07-14 15:42

[QUOTE=MacFactor;405806][CODE]
shn@C2D ~/Desktop/CUDALucas $ strace ./CUDALucas
execve("./CUDALucas", ["./CUDALucas"], [/* 46 vars */]) = -1 ENOENT (No such file or directory)
write(2, "strace: exec: No such file or di"..., 40strace: exec: No such file or directory
) = 40
exit_group(1) = ?
+++ exited with 1 +++



[/CODE] [/QUOTE]

That does give a (slightly cryptic) clue. From the man page for execve: [code] ENOENT The file filename or a script or ELF interpreter does not exist, or a shared library needed for file or interpreter can‐not be found. [/code] Which matches up with the ldd output in your next post where one of the libraries is missing. Obviously the system should have said which file it could not find.

If you can install a version of libstdc++.so.6 on your system the program might work.

As I said UNIX can produce rather cryptic error messages. In my previous job as a MVS systems programmer I went to some effort to ensure error messages put out by programs I wrote or updated had enough information in them to point to the real problem.

Chris

MacFactor 2015-07-14 16:29

Thanks, I've been told it's fairly distro-specific ...
 
I've been told by someone with more knowledge than myself in this area that that particular C++ linked file is probably not found in most Linux distros, so I'm going to try compiling a local version. "make" informs me that I don't have the CUDA toolkit installed on that particular platform, so that's next. I had dl'ed the toolkits on a WinXP computer but that was a while ago -- trying to do everything in Linux now, and don't even have that WinXP box available anymore.

I appreciate the analysis you've provided, it really helped me make some progress. Sure wish I had known some of this months ago.

MF

mognuts 2015-07-25 11:34

I've been running CUDALucas v2.05.1 under Windows 10. The bug with CUDA drivers >=310.70 and cc 2.0 cards still exists, but its behaviour is slightly different than before. Now it causes round off errors which are unrecoverable. Increasing FFT doesn't help. Only a restart of CUDALucas will overcome this. I've managed to match a residue using a run with similar behaviour, so no harm done, I just find it interesting.

[CODE]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Jul 25 11:22:57 | M39823489 100000 0xfbb2e64a7a7f1a9c | 2160K 0.15766 3.4682 346.82s | 1:14:16:10 0.25% |
| Jul 25 11:28:44 | M39823489 200000 0x2f5728b5a730a4f2 | 2160K 0.15464 3.4692 346.92s | 1:14:10:42 0.50% |
| Jul 25 11:34:31 | M39823489 300000 0x0ba36acaa24b9bed | 2160K 0.17899 3.4686 346.86s | 1:14:04:55 0.75% |
| Jul 25 11:40:18 | M39823489 400000 0xee75195443fb9609 | 2160K 0.16523 3.4706 347.06s | 1:13:59:27 1.00% |
| Jul 25 11:46:05 | M39823489 500000 0x5fd2260c0b578ccc | 2160K 0.16088 3.4706 347.06s | 1:13:53:52 1.25% |
CUDALucas.cu(1878) : cudaSafeCall() Runtime API error 30: unknown error.
Resetting device and restarting from last checkpoint.
Using threads: square 32, splice 64.
Continuing M39823489 @ iteration 500001 with fft length 2160K, 1.26% done
Round off error at iteration = 500100, err = 0.49866 > 0.35, fft = 2160K.
Restarting from last checkpoint to see if the error is repeatable.
Using threads: square 32, splice 64.
Continuing M39823489 @ iteration 500001 with fft length 2160K, 1.26% done
Round off error at iteration = 500100, err = 0.49866 > 0.35, fft = 2160K.
The error persists.
Trying a larger fft until the next checkpoint.
Using threads: square 32, splice 64.
Continuing M39823489 @ iteration 500001 with fft length 2240K, 1.26% done
Round off error at iteration = 500400, err = 0.42053 > 0.35, fft = 2240K.
The error won't go away. I give up.

Waiting for 0 seconds, press a key to continue ...
------- DEVICE 1 -------
name GeForce GTX 580
Compatibility 2.0
clockRate (MHz) 1544
memClockRate (MHz) 2004
totalGlobalMem 3221225472
totalConstMem 65536
l2CacheSize 786432
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 16
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1
Using threads: square 32, splice 64.
Continuing M39823489 @ iteration 500001 with fft length 2160K, 1.26% done
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Jul 25 11:55:59 | M39823489 600000 0x2bfa199a2cf44a3c | 2160K 0.16825 3.4688 346.87s | 1:13:48:00 1.50% |
| Jul 25 12:01:46 | M39823489 700000 0x4bc8273a3c703323 | 2160K 0.16035 3.4704 347.04s | 1:13:42:19 1.75% |
| Jul 25 12:07:33 | M39823489 800000 0xdacbc4353a989884 | 2160K 0.15374 3.4703 347.03s | 1:13:36:36 2.00% |
| Jul 25 12:13:20 | M39823489 900000 0x02e00bcef8ecf468 | 2160K 0.15009 3.4705 347.05s | 1:13:30:53 2.25% |
| Jul 25 12:19:07 | M39823489 1000000 0x0fbd344fdae4c273 | 2160K 0.15717 3.4703 347.03s | 1:13:25:09 2.51% |[/CODE]

UBR47K 2015-08-26 12:07

I am getting this error in arch linux:
[code]device_number >= device_count ... exiting
(This is probably a driver problem)
[/code]

I have 2 titan X with no SLI bridge and SLI option in xorg.conf is disabled.
Tried using precompiled 6.5CUDA version and compiled from svn with CUDA7.0 with no avail.

kladner 2015-09-02 18:44

Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?

EDIT: I have deleted the checkpoint files between tests.

frmky 2015-09-02 23:14

[QUOTE=kladner;409447]Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?.[/QUOTE]

For CUDALucas, you just add ",1792K" to the end of the line in worktodo.txt. However, looking back at the ones I've done nothing in the 34M range used an FFT that small. Most were done with 1890K or 2000K depending on the GPU and version of cufft used. Check the timing for all three of 1890K, 2000K, and 2048K to see which is faster for you.

frmky 2015-09-02 23:22

[QUOTE=UBR47K;408850]I am getting this error in arch linux:
[code]device_number >= device_count ... exiting
(This is probably a driver problem)
[/code]

I have 2 titan X with no SLI bridge and SLI option in xorg.conf is disabled.
Tried using precompiled 6.5CUDA version and compiled from svn with CUDA7.0 with no avail.[/QUOTE]

Are /dev/nvidia0 and /dev/nvidiactl present? Does the command nvidia-smi show your card present? If no to either, then the driver isn't installed and active.

kladner 2015-09-03 03:55

[QUOTE=frmky;409465]For CUDALucas, you just add ",1792K" to the end of the line in worktodo.txt. However, looking back at the ones I've done nothing in the 34M range used an FFT that small. Most were done with 1890K or 2000K depending on the GPU and version of cufft used. Check the timing for all three of 1890K, 2000K, and 2048K to see which is faster for you.[/QUOTE]

1890K and 2000K don't show up in 'GeForce GTX 580 fft.txt', but I'll plug them in and see what comes out.

I think I ran CUFFTbench such that it limited the output in some way. It might be good to rerun it unfiltered.
Here's that immediate region of the current fft.txt:
[CODE] 1600 30232693 2.6039
1728 32597297 2.7957
1792 33778141 2.8987
2048 38492887 2.9924
2304 43194913 3.7102
2592 48471289 3.9732
2880 53735041 4.7980
[/CODE]Thanks for the suggestions! :smile:

EDIT: Here are the results for the 3 FFTs:
[QUOTE]M347xxxxx 5000 0xfd897327f15981c1 | 1890K 0.14111 3.5447 17.72s | 1:10:11:45 0.01%
M347xxxxx 10000 0xe66eeb94b6e3a4e9 | 1890K 0.14258 3.5466 17.73s | 1:10:12:00 0.02%

M347xxxxx 5000 0xfd897327f15981c1 | 2000K 0.03125 3.4163 17.08s | 1:08:57:26 0.01%
M347xxxxx 10000 0xe66eeb94b6e3a4e9 | 2000K 0.03027 3.4354 17.17s | 1:09:02:41 0.02%

M34733759 5000 0xfd897327f15981c1 | 2048K 0.02051 3.0706 15.35s | 1:05:37:19 0.01%
M34733759 10000 0xe66eeb94b6e3a4e9 | 2048K 0.02100 3.0900 15.45s | 1:05:42:40 0.02%[/QUOTE]These are pretty much in keeping with FFT.txt. 1792K is the next lowest FFT with shorter times. I guess the [STRIKE]0.05[/STRIKE] 0.02 error rate is normal in this range.

LaurV 2015-09-03 07:30

[QUOTE=kladner;409473]I think I ran CUFFTbench such that it limited the output in some way.[/QUOTE]
"some way" means that all sizes of the FFT which were eliminated are [B][U]slower[/U][/B] (for your card) than a higher FFT which remained. So, between 33M77 and 38M49 exponents, you will be faster using the 2M FFT (i.e 2048K in your file). The 1792K is too small, and any intermediary values will be slower than 2048K. When you do the "tuning", a list with all is created, then is parsed from the end, and any line which has a longer time than a line already parsed (i.e. a shorter FFT which takes longer than a longer FFT) is eliminated. There is no mystery in it.

Remark that a smaller FFT does not always mean a faster iteration time. This depends of how "smooth" the FFT value is, i.e. how [B][U]your card*[/U][/B] can split it in small pieces and combine those [URL="https://en.wikipedia.org/wiki/Butterfly_diagram"]butterflies[/URL] ([URL="https://www.google.co.th/search?q=fft+butterfly&tbm=isch"]yaaarrr[/URL]!). That is why "power of two" is usually faster than the neighbors, because fast multiplication is kinda "[URL="https://en.wikipedia.org/wiki/Divide_and_conquer_algorithms"]divide et impera[/URL]", it splits the stuff in two, solve the halves, put the results together (well, kind of...) so when you split in two something which is not a multiple of (power of, when multiple splits) two, then a chunk is bigger than the other, and you have to do more splits and more work to put it together at the end.

----
* it depends of how many threads can the card do in the same time, memory for each, if it multiplies on 24 or on 32 bits, etc. - that is why you have to TUNE the card when you start using cudaLucas on it.

------------
Edit: yes, an error between 0.0006 and 0.24 is perfect. As higher as better (you will have a faster iteration time if you select a lower FFT, from that file, but the error is bigger). Some go to "as high as 0.35" but this is risky if you do not check the rounding error at every iteration.

kladner 2015-09-03 08:01

Thanks, LaurV. It has been a while since I messed with this stuff. I actually pulled a full list, and saw exactly what you are saying: the ones that don't show aren't worth looking at.

UBR47K 2015-09-03 12:37

[QUOTE=frmky;409466]Are /dev/nvidia0 and /dev/nvidiactl present? Does the command nvidia-smi show your card present? If no to either, then the driver isn't installed and active.[/QUOTE]

I found out that it works after I rebooted my computer. Apparently I had installed a new driver update and I wasn't aware of it (piled updates for 1 month).

frmky 2015-09-03 21:02

[QUOTE=LaurV;409482]"some way" means that all sizes of the FFT which were eliminated are [B][U]slower[/U][/B] (for your card)[/QUOTE]
And version of CUDA/cufft used to compile the binary. If you switch to a binary compiled with a different version of CUDA, rerun the benchmark.

LaurV 2015-09-04 12:59

[QUOTE=frmky;409524]And version of CUDA/cufft used to compile the binary. If you switch to a binary compiled with a different version of CUDA, rerun the benchmark.[/QUOTE]right! :tu:

Xyzzy 2015-09-14 13:46

[QUOTE=Oddball;223390]Fighting off a pit bull is quite hard, but you'll probably make it out alive. Now if you had to fight off a whole pack of wolves instead...[/QUOTE][URL]http://www.cnn.com/2015/09/12/us/new-york-pit-bull-attacks/[/URL]

LaurV 2015-09-15 01:40

[QUOTE=Xyzzy;410265][URL]http://www.cnn.com/2015/09/12/us/new-york-pit-bull-attacks/[/URL][/QUOTE]
[quote]Boves, who own a pit bull named Gina[/quote]
Now ye know the reason... obviously, dogs can feel the smell better than we do, and who the hell knows what was in their heads.
This reminds me some time ago when my boss had two huge dogs, I don't know the breed but they were huge and very nice dogs, and they both died from old age. Anyhow, that day my boss was walking the two dogs in the park and a couple of stray dogs attacked these two dogs, it was a short fight, these dogs were big and calm and you could have not many chances to put too much a fight against them, but you know how it is for the owner, my boss was worried and he jumped in between the dogs to separate them. In spite of the fact that there was no attack to person, i.e. the dogs continued to fight each-other, and they - luckily - didn't turn against the human in the middle, however he got caught in the leashes and felt down, breaking one of his front teeth in half. He had to have an implant and he came to work for a while with a "front hole", then later he was very funny for weeks because the doctor told him to unscrew and screw the "new toy" from time to time until the place gets used with it, he used to do that with his tongue and suddenly when you expected less, he was smiling at you nicked.

wombatman 2015-09-15 03:29

[url]http://www.huffingtonpost.com/2014/03/06/dog-aggression-study-applied-animal-behaviour-science_n_4911861.html[/url]

[url]http://www.politifact.com/georgia/statements/2011/aug/03/elaine-boyer/are-pit-bulls-more-aggressive-other-dogs/[/url]

[url]http://atts.org/breed-statistics/statistics-page1/[/url]
[CODE]Breed Name Tested Passed Failed Percent
[B]AMERICAN PIT BULL TERRIER 870 755 115 86.8%
AMERICAN STAFFORDSHIRE TERRIER 657 555 102 84.5%[/B]
AUSTRALIAN SHEPHERD 680 559 121 82.2%
GOLDEN RETRIEVER 785 669 116 85.2%[/CODE]

Just sayin' ;)

wombatman 2015-09-15 04:23

To bring things back on topic, I'm still have issues with my GTX 570. Does anybody else reading this thread have one, and if so, 1) have you successfully completed an exponent without any crashes? 2) if 1, how?

kladner 2015-09-15 06:06

[QUOTE=wombatman;410307]To bring things back on topic, I'm still have issues with my GTX 570. Does anybody else reading this thread have one, and if so, 1) have you successfully completed an exponent without any crashes? 2) if 1, how?[/QUOTE]

Let me get back to you on that. I have a GTX 570 which I really love, but can't run right now because of messed up power connectors. It has been a while since I could run it reliably. However, there are others here who have run those. I just don't have current operations to draw from.

flashjh 2015-09-21 19:03

[QUOTE=wombatman;410307]To bring things back on topic, I'm still have issues with my GTX 570. Does anybody else reading this thread have one, and if so, 1) have you successfully completed an exponent without any crashes? 2) if 1, how?[/QUOTE]

What is it doing?

wombatman 2015-09-21 21:01

It hits the TDRdelay limit, Windows kills the driver, and the driver is restarted. So far, I just deal with it via a looping shell file as advised by someone on here. That way the program is repeatedly run until the exponent is completed.

I also can't really identify a rhyme or reason for the crashes (i.e., it's not heat or anything like that).

flashjh 2015-09-21 21:27

If you go back to [URL="http://www.mersenneforum.org/showthread.php?t=12576&page=205"]page 205[/URL] you'll see our discussion on this (with a link to some more info). Unfortunately, nothing has changed with the driver and the compute 2.0 cards. You could downgrade your driver or continue to use the batch file you're using. As far as my testing showed your results should still be good. Are you having any problems getting matches on your checksums?

wombatman 2015-09-21 23:24

[QUOTE=flashjh;410996]If you go back to [URL="http://www.mersenneforum.org/showthread.php?t=12576&page=205"]page 205[/URL] you'll see our discussion on this (with a link to some more info). Unfortunately, nothing has changed with the driver and the compute 2.0 cards. You could downgrade your driver or continue to use the batch file you're using. As far as my testing showed your results should still be good. Are you having any problems getting matches on your checksums?[/QUOTE]

Nah, all my checksums are right, so I'm assuming it's not a memory/computational problem either. For the meantime, I'll hold off on doing any CudaLucas work and stick to mfaktc. I'll come back to the problem and try to get it resolved later on.

flashjh 2015-09-21 23:30

The only way to fix it is to downgrade the driver or get a new card. The 5xx series is too old to expect a driver fix at this point.

wombatman 2015-09-21 23:46

[QUOTE=flashjh;411004]The only way to fix it is to downgrade the driver or get a new card. The 5xx series is too old to expect a driver fix at this point.[/QUOTE]

Well, I'm hoping to get a new card around Christmas time, so I'll just add this reason to my list. ;)

LaurV 2015-09-22 03:37

There is a known problem with 570 cards they are not very stable for LL, we had a discussion here in the past, but I don't remember the brand or the reason, neither how and if it was fixed. I can look for that topic if I have some time today during the lunch break. I am more into 580s and only tested few 570s (which I didn't own) in the past, luckily I never met the instability problem.

wombatman 2015-09-22 03:38

I would appreciate it if you do find it, but don't waste too much time on it! :smile:

kladner 2015-09-22 05:36

[QUOTE=wombatman;410995]It hits the TDRdelay limit, Windows kills the driver, and the driver is restarted. So far, I just deal with it via a looping shell file as advised by someone on here. That way the program is repeatedly run until the exponent is completed.

I also can't really identify a rhyme or reason for the crashes (i.e., it's not heat or anything like that).[/QUOTE]

Typically, on my MSI 580, over an 8 hour run (like when I am at work) it might restart once. Today, I left it running on a DC, and was pleasantly surprised that it was still at Count 0 (sorry, Mr Gibson) when I came home from work. When I first started testing this card with CUDALucas, it would not complete the short test at anything like the clocks it could run with MFAKTC. Not only that, but it failed at "stock" OC of 833 MHz, default voltage of 1.006, even with the memory clocked at 1600 MHz.

The short of it is that I now run it at the core speed that is in the BIOS: 833. I turn the Vcore up to 1.025. The memory is running at 1800 MHz, with a 10 mv increase over whatever the base voltage is. (There are advantages to running an MSI card with Afterburner. :smile: I don't have the Memory and Auxiliary voltage settings on my EVGA 580. I am clueless about the Aux, but it was also set for +10 mv when I got the no-restart in 8 hours result.)

An advantage, beyond possible stability, to running at the default frequency, is that if it restarts, the clock stays the same. However, voltage and memory settings are not affected. That is a very good thing, since CuLu would almost certainly give bad results with the RAM at over 2 GHz. The self-tests also gave bad results with stock voltage, so I'm glad those setting stick.

In spite of the increased voltages, the card runs cooler than it would with MFAKTC at that speed/voltage combination. Perhaps DP calculation require more voltage for stability, while the DP throttling results in lower temps.

Try setting the GPU core at or below stock frequency, with higher than normal voltage. Set memory low: maybe like 1500 MHz.

The range of voltages I consider using comes from the wildly different "stock" voltages of the two 580s. For the MSI, it is 1.006 v; for the EVGA, it is 1.088 v. (And this insane voltage is to run at an 'oc' of 797 MHz!) I have never touched the higher voltage in real operation, even when running the EVGA at 882 MHz, as it is now, being fed 1.050 v. Meanwhile, the MSI card will run at its stock OC of 833 MHz, on 1.000 v, when running MFAKTC. It is right now running MFAKTC at 911 MHz, on 1.056 v.

I never got the EVGA card to pass self-testing, but I never gave it the same amount of adjustment and testing as I did the MSI. My feeling is that the MSI is just a better card. If nothing else, it has twice as many voltage and frequency steps as the EVGA, which makes fine tuning much more effective. It also runs faster and cooler 'out of the box.' I suspect higher-binned parts. EDIT: .....and a better cooler than the Zalman aftermarket on the EVGA.

kladner 2015-09-22 06:11

[QUOTE=kladner;409447]Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?

EDIT: I have deleted the checkpoint files between tests.[/QUOTE]

I am motivated to revive this question. I currently have two 34.7M DCs running on P95. Both are using 1792K FFT, and reporting 0.375 roundoff error. Are error tolerances set more strictly in CuLu?

EDIT: Would the 'I' (capital I) (I/i -- increase/decrease error threshold) interactive command allow a smaller FFT?

wombatman 2015-09-22 14:33

[QUOTE=kladner;411025]Typically, on my MSI 580, over an 8 hour run (like when I am at work) it might restart once. Today, I left it running on a DC, and was pleasantly surprised that it was still at Count 0 (sorry, Mr Gibson) when I came home from work. When I first started testing this card with CUDALucas, it would not complete the short test at anything like the clocks it could run with MFAKTC. Not only that, but it failed at "stock" OC of 833 MHz, default voltage of 1.006, even with the memory clocked at 1600 MHz.

The short of it is that I now run it at the core speed that is in the BIOS: 833. I turn the Vcore up to 1.025. The memory is running at 1800 MHz, with a 10 mv increase over whatever the base voltage is. (There are advantages to running an MSI card with Afterburner. :smile: I don't have the Memory and Auxiliary voltage settings on my EVGA 580. I am clueless about the Aux, but it was also set for +10 mv when I got the no-restart in 8 hours result.)

An advantage, beyond possible stability, to running at the default frequency, is that if it restarts, the clock stays the same. However, voltage and memory settings are not affected. That is a very good thing, since CuLu would almost certainly give bad results with the RAM at over 2 GHz. The self-tests also gave bad results with stock voltage, so I'm glad those setting stick.

In spite of the increased voltages, the card runs cooler than it would with MFAKTC at that speed/voltage combination. Perhaps DP calculation require more voltage for stability, while the DP throttling results in lower temps.

Try setting the GPU core at or below stock frequency, with higher than normal voltage. Set memory low: maybe like 1500 MHz.

The range of voltages I consider using comes from the wildly different "stock" voltages of the two 580s. For the MSI, it is 1.006 v; for the EVGA, it is 1.088 v. (And this insane voltage is to run at an 'oc' of 797 MHz!) I have never touched the higher voltage in real operation, even when running the EVGA at 882 MHz, as it is now, being fed 1.050 v. Meanwhile, the MSI card will run at its stock OC of 833 MHz, on 1.000 v, when running MFAKTC. It is right now running MFAKTC at 911 MHz, on 1.056 v.

I never got the EVGA card to pass self-testing, but I never gave it the same amount of adjustment and testing as I did the MSI. My feeling is that the MSI is just a better card. If nothing else, it has twice as many voltage and frequency steps as the EVGA, which makes fine tuning much more effective. It also runs faster and cooler 'out of the box.' I suspect higher-binned parts. EDIT: .....and a better cooler than the Zalman aftermarket on the EVGA.[/QUOTE]

This is good info. I have an EVGA, so that may be it, but I'll try your suggested tweaks and see if it does anything. Thanks!

kladner 2015-09-22 17:34

I see the difference between the cards as marketing related. MSI went for a reasonably nice overclock, at a reasonable voltage out of the box. It still has [U]lots[/U] of headroom, but one can only go up a step or two on stock voltage. The finer resolutions for voltage and frequency are great for someone with a feel for tweaking, but might confuse a beginning overclocker.

EVGA, on the other hand, set the voltage really high in BIOS, so that a beginner could just crank the clock up and say, "I got an amazing OC on STOCK VOLTAGE!" Of course, it would be running hot as hell, though maybe not so much in a game setting if the load is intermittent. I have not tried to see how high the EVGA will go at 1.088 v, because the system could not handle the heat. I start backing off the settings when temps reach around 78 C.

TheJudger 2015-09-22 18:12

Keep in mind that modern CPUs/GPUs/whatever have indivual voltage and clock settings.
While e.g. all GTX 970 have programmed the same baseclock each specific GPU has an individual voltage for that clock rate, an individual power consumption, an individual max clock and, of course, individual clock rate under a specific load.

These are 12 ASUS GTX 970 (STRIX-GTX970-DC2OC-4GD5), factory OCed. All GPUs are running the same load (mfaktc in this case), GPU temperature is well below the temperature target aswell as power limit is not (yet) reached.
[CODE]
GPU #1
GPU Current Temp : 71 C
Power Draw : 178.41 W
Graphics : 1290 MHz
GPU #2
GPU Current Temp : 72 C
Power Draw : 175.58 W
Graphics : 1290 MHz
GPU #3
GPU Current Temp : 73 C
Power Draw : 185.42 W
Graphics : 1303 MHz
GPU #4
GPU Current Temp : 73 C
Power Draw : 177.85 W
Graphics : 1303 MHz
GPU #5
GPU Current Temp : 73 C
Power Draw : 193.13 W
Graphics : 1328 MHz
GPU #6
GPU Current Temp : 70 C
Power Draw : 178.00 W
Graphics : 1315 MHz
GPU #7
GPU Current Temp : 73 C
Power Draw : 179.40 W
Graphics : 1328 MHz
GPU #8
GPU Current Temp : 70 C
Power Draw : 174.76 W
Graphics : 1290 MHz
GPU #9
GPU Current Temp : 71 C
Power Draw : 177.80 W
Graphics : 1303 MHz
GPU #10
GPU Current Temp : 71 C
Power Draw : 177.25 W
Graphics : 1278 MHz
GPU #11
GPU Current Temp : 72 C
Power Draw : 181.74 W
Graphics : 1303 MHz
GPU #12
GPU Current Temp : 71 C
Power Draw : 184.82 W
Graphics : 1316 MHz
[/CODE]

Oliver

blip 2015-09-23 08:43

How do you get power draw?

TheJudger 2015-09-23 09:56

just 'nvidia-smi -a' (Linux) with a recent driver. But this depends on the GPU(-BIOS) itself.

Teslas are fully featured by 'nvidia-smi' all the time, most Quadros, too. Recently(?) they added full support for Geforce TITAN (all flavours). From time to time they enable or disable features for "regular" Geforce cards. And it depends on the GPU-BIOS/card, too, I *guess* not all Geforce 970 will report this stuff.

Oliver

blip 2015-09-23 21:56

Must be the newer cards. On my oldish 590's, almost all entries show N/A.
Or could it be the driver? (currently 343.36)

kladner 2015-09-24 06:28

[QUOTE=blip;411136]Must be the newer cards. On my oldish 590's, almost all entries show N/A.
Or could it be the driver? (currently 343.36)[/QUOTE]

500 series and earlier did not do that fancy stuff.

TheJudger 2015-09-24 08:44

I saw a lot of those "N/A" going away after installing 352 series driver (in my case 352.41).

This a a cheap GTX 750 (non-Ti) with 352.41:
[CODE]# nvidia-smi -a | grep -c " : "
105
# nvidia-smi -a | grep -c "N/A"
42
[/CODE]

Oliver

TheJudger 2015-09-26 10:58

OK, 352.07 has lots of N/A for me while 352.41 has less N/A (more information).

blip 2015-09-28 09:25

I updated to 355.11, still the same N/A's for my GTX590.

kladner 2015-09-28 11:39

[QUOTE=blip;411459]I updated to 355.11, still the same N/A's for my GTX590.[/QUOTE]

355.82 is no different on a 580.

TheJudger 2015-09-28 17:16

OK, highly depends on the GPU itself, on my GT 630 (GK208) the number of "N/A" remains the same, too.
I saw less "N/A" so far on[LIST][*]cheap 750 (non-Ti)[*]ASUS 970 Strix OC[*]ref. GTX 980[*]Palit 980Ti OC[/LIST]

Oliver

henryzz 2015-09-30 17:25

[QUOTE=TheJudger;411491]OK, highly depends on the GPU itself, on my GT 630 (GK208) the number of "N/A" remains the same, too.
I saw less "N/A" so far on[LIST][*]cheap 750 (non-Ti)[*]ASUS 970 Strix OC[*]ref. GTX 980[*]Palit 980Ti OC[/LIST]
Oliver[/QUOTE]

All of which are Maxwell v1 or v2

kladner 2015-09-30 17:45

2 Attachment(s)
[QUOTE=wombatman;411047]This is good info. I have an EVGA, so that may be it, but I'll try your suggested tweaks and see if it does anything. Thanks![/QUOTE]

Well, so much for uninterrupted running of CuLu. I saw the display blink, and when I looked at CL, the restart count was at 5. I restarted the whole system, and CL restarted again within 20 minutes. This is with TDRDelay set at 128, which I saw someone mention somewhere on the forum. (I am also aware of your recommended setting of 10, WombatMan.:smile:)

The voltages and clock settings were the same as on a previous DC which got through 12+ hours without resetting.

I am starting to think that it is a waste of processing time to set TDRD that high. On the Afterburner monitor that card showed declining temperature for a few minutes. This suggests that it has stopped doing real work and is unlikely to resume without a reset. I will set TDRDelay set at 15, and see if I can screen-grab the pattern in the plot.

EDIT: It sneaked one past me. TDRD was still 128, but there was no apparent long decline. All I could guess at being connected to the reset is the tiny blip attached.

Notice that CL runs the card at 99% usage. This is true regardless of what else is running on the system. On the other hand, with mfaktc, usage drops from 99% to 98% when P95 gets going. This does not happen with the Small FFT stress test, but it does with Large and Blend tests.

EDIT2: Reset again, just now. Still just the momentary dip in usage. I have seen, but failed to capture ravine-like plots in temp preceding a reset.

I just made a simulation of such a plot using mfaktc. See below.

kladner 2015-09-30 22:32

Ooops! Wrong thread.

EDIT: Regarding the plots shown above, I have now concluded that the momentary dips are not associated with CuLu timing out. I have seen such when CL was running smoothly.

airsquirrels 2016-02-02 01:04

I have a system with a Titan Z, a 590, and a 690 in it.

Previously all three were running mfaktc on both GPUs without issue. I switched the Z GPUs over to LL and that has been very successful, however the 590 and 690 both return all 0x000000000000 interim residues and 0.0 error rates. No errors that I can see.

Any idea what is causing this?

Brain 2016-02-02 20:45

When testing my first Titan I couldn't run CL without wrong residues. Another user found out that downclocking the memory clock solves the problem. I'm successfully running for several years now on 2600 MHz instead of 3000 MHz. Give it a try.

airsquirrels 2016-02-02 21:05

Unfortunately my issue isn't with incorrect residues, rather no residues at all. It is as if something is failing moving the initial data to the card.

bgbeuning 2016-02-03 01:08

[QUOTE=airsquirrels;425008]Unfortunately my issue isn't with incorrect residues, rather no residues at all. It is as if something is failing moving the initial data to the card.[/QUOTE]

Me too. Every iteration says residue = 0.
This is my first time running CUDALucas so I did not know that was wrong.

I compiled CUDALucas to get it to work, so I could have easily done something
wrong. Maybe it could check for all residue 0 and quit saying something is broke.

airsquirrels 2016-02-03 01:10

The residue should definitely be a different non-zero hex string each iteration.

What's interesting is this same binary works fine with a different GPU in the system.

I compiled against Cuda 6.5

LaurV 2016-02-03 01:49

[QUOTE=airsquirrels;424924]I have a system with a Titan Z, a 590, and a 690 in it.

Previously all three were running mfaktc on both GPUs without issue. I switched the Z GPUs over to LL and that has been very successful, however the 590 and 690 both return all 0x000000000000 interim residues and 0.0 error rates. No errors that I can see.

Any idea what is causing this?[/QUOTE]
Nothing is wrong, you just discovered a series of mersenne superprimes*.

----------------
* if a prime is the one that makes the last residue zero, then a superprime is the one which makes [U]all[/U] residues zero...

Madpoo 2016-02-03 02:31

[QUOTE=bgbeuning;425020]Me too. Every iteration says residue = 0.
This is my first time running CUDALucas so I did not know that was wrong.

I compiled CUDALucas to get it to work, so I could have easily done something
wrong. Maybe it could check for all residue 0 and quit saying something is broke.[/QUOTE]

Are you the user who tried to manually submit two different "is prime!" results today?

I knew they weren't right since the time between assignment and result was mere hours and there's no way it could have run a test in that time.

At least it gave us a chance to test out the email feature when someone tries to submit a new prime using the manual forms.

3 times (one LL test was submitted twice, and a DC "is prime" submitted once).

I know they're not right but I'm running my own tests just in the one in a billion million gazillion chance they happened to accidentally and coincidentally be prime, but I'm sure they won't be.

They were done with:
CUDALucas v2.05.1

If anyone can think of reasons why CUDALucas would report a prime result after only running for a little bit (in one case it was only a couple hours after the exponent, a 37M double-check, was assigned).

Meanwhile, if you're doing a test and it magically reports that it's prime after an improbably short period of time, don't try to submit it to the server. Fix the software issue, run a real test, and then we'll talk. LOL

msft 2016-02-03 06:45

I can read code.
What can i do ?:smile:

airsquirrels 2016-02-03 07:17

[QUOTE=msft;425049]I can read code.
What can i do ?:smile:[/QUOTE]

Any way to turn on additional debugging/logging ? Or a debug build?


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.