mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

ECPilot 2018-09-03 20:39

tdulcet, the install script works beautifully for CUDALucas on Ubuntu laptops. Thank you for this.

Lorenzo 2018-09-27 11:15

Hello! Is there truth [URL="http://www.mersenne.ca/cudalucas.php?model=745"]http://www.mersenne.ca/cudalucas.php?model=745[/URL]? As for me this is doubtful result because performance of this card near the Titan V which has 1/2 DP (unlike 2080i that has only 1/32) .

Can someone confirm this?

tServo 2018-09-27 14:24

[QUOTE=Lorenzo;496894]Hello! Is there truth [URL="http://www.mersenne.ca/cudalucas.php?model=745"]http://www.mersenne.ca/cudalucas.php?model=745[/URL]? As for me this is doubtful result because performance of this card near the Titan V which has 1/2 DP (unlike 2080i that has only 1/32) .

Can someone confirm this?[/QUOTE]

It doesn't make sense to me either. The best predictor for performance should be the Gflops(DP) column. Thus, the entry for GTX 1080 TI looks suspect also.

xx005fs 2018-09-27 14:24

[QUOTE=Lorenzo;496894]Hello! Is there truth [URL="http://www.mersenne.ca/cudalucas.php?model=745"]http://www.mersenne.ca/cudalucas.php?model=745[/URL]? As for me this is doubtful result because performance of this card near the Titan V which has 1/2 DP (unlike 2080i that has only 1/32) .

Can someone confirm this?[/QUOTE]

I realized too, and I am definitely thinking that this is a placeholder because there is no way that with so much deficit on the 2080ti in terms of DP compared to Titan V it's only within 20%. We will have to see.

xx005fs 2018-09-27 14:27

[QUOTE=tServo;496911]It doesn't make sense to me either. The best predictor for performance should be the Gflops(DP) column. Thus, the entry for GTX 1080 TI looks suspect also.[/QUOTE]

The AMD cards on the site seems really slow, or is it because they are running clLucas rather than Gpuowl? I think that the AMD cards should have the gpuowl speed on it and that would reflect real world performance better because it is significantly faster. For example, on the site for 85M exponents it says vega 64 liquid gets 3.5 ms/it, however, my vega 56 undervolted to 1480/1080 runs at 2.05 ms/it on gpuowl and that's nearly 40% faster.

TheJudger 2018-10-05 22:25

CUDA 10.0130, CUDA driver 410.57, CUDALucas 2.05.1 (SVN rev. 99)

Benchmark FFT sizes './CUDALucas -cufftbench 2048 32768 20'
[CODE]Device GeForce RTX 2080 Ti
Compatibility 7.5
clockRate (MHz) 1635
memClockRate (MHz) 7000

fft max exp ms/iter
2048 38492887 1.0627
2304 43194913 1.2857
2592 48471289 1.5457
2700 50446621 1.7518
2744 51250889 1.8380
2880 53735041 1.8585
3200 59570449 1.8635
3456 64229677 1.9027
4096 75846319 2.0373
4608 85111207 2.6288
5184 95507747 3.0856
5400 99399967 3.5222
5760 105879517 3.5851
5832 107174381 3.6607
6400 117377567 3.8294
6912 126558077 3.9194
7168 131142761 4.5456
8192 149447533 4.8361
8748 159365399 5.5607
9216 167703023 5.5871
10368 188188471 5.8853
10584 192023851 7.2167
11520 208624903 7.3206
11664 211176269 7.6460
12544 226753511 7.8723
12800 231280639 7.9574
13824 249369863 8.1618
16384 294471259 9.3144
17496 314013451 11.1805
18432 330441847 11.7295
20736 370806323 12.0643
21952 392070229 14.8799
22400 399897793 15.1170
23040 411074273 15.1771
24192 431175197 15.5346
25088 446794913 16.3743
26244 466929581 17.0977
27648 491358173 17.8123
32768 580225813 18.1072[/CODE]

And benchmark for [URL="https://www.mersenne.ca/cudalucas.php"]mersenne.ca[/URL]: './CUDALucas 57885161'
[CODE]
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Oct 05 23:53:14 | M57885161 10000 0x76c27556683cd84d | 3200K 0.10156 1.8599 18.59s | 1:05:54:02 0.01% |
| Oct 05 23:53:33 | M57885161 20000 0xfd8e311d20ffe6ab | 3200K 0.10156 1.8648 18.64s | 1:05:56:06 0.03% |
| Oct 05 23:53:52 | M57885161 30000 0xce0d85ab0065a232 | 3200K 0.10156 1.8695 18.69s | 1:05:58:05 0.05% |
[...]
| Oct 05 23:57:39 | M57885161 150000 0x8e9733fee4029132 | 3200K 0.09375 1.8939 18.93s | 1:06:17:10 0.25% |
| Oct 05 23:57:58 | M57885161 160000 0x0b5dadf12ed96a4d | 3200K 0.10156 1.8932 18.93s | 1:06:17:09 0.27% |
| Oct 05 23:58:17 | M57885161 170000 0x69754eac9cc190a5 | 3200K 0.10938 1.8932 18.93s | 1:06:17:05 0.29% |
[/CODE]
~220W and 1860-1875MHz on average once GPU is heated up.

And 100M digits (manually set FFT size): './CUDALucas -f 20736K 332192879'
[CODE]
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Oct 06 00:16:45 | M332192879 10000 0xa19043095e213f4c | 20736K 0.01953 12.1000 121.00s | 46:12:30:39 0.00% |
| Oct 06 00:18:47 | M332192879 20000 0xcb7bc66ac81b24be | 20736K 0.01758 12.1699 121.69s | 46:15:42:00 0.00% |
| Oct 06 00:20:48 | M332192879 30000 0x38e4cc517de8fda3 | 20736K 0.01758 12.1660 121.66s | 46:16:37:11 0.00% |
[/CODE]

Oliver

xx005fs 2018-10-06 00:47

Wrong Result for Volta Architecture?
 
Hi Guys. I noticed that I am consistently getting 0x0000000000000000 for the residue for every iteration output on Nvidia Volta hardware. The error output for CUDALucas also says 0.00. This is both replicable with a Tesla V100 instance and a Titan V GPU. Is there any problem with the settings I use or do I have to do something else to fix it.

James Heinrich 2018-10-06 01:16

[QUOTE=TheJudger;497450]And benchmark for [URL="https://www.mersenne.ca/cudalucas.php"]mersenne.ca[/URL]: './CUDALucas 57885161'[/QUOTE]Thanks, benchmark page is updated now with the single result. Does this look about right, with 2080 [i]significantly[/i] slower than V100?
[url]https://www.mersenne.ca/cudalucas.php?filter=V100|2080[/url]

Lorenzo 2018-10-06 08:48

Hello, Oliver!

Thank you very much for the bencmark!!!

Ehhh. Perfomance lower than GTX1080Ti. Really bad choice for today (for LL).

kriesel 2018-10-06 12:54

[QUOTE=xx005fs;497460]Hi Guys. I noticed that I am consistently getting 0x0000000000000000 for the residue for every iteration output on Nvidia Volta hardware. The error output for CUDALucas also says 0.00. This is both replicable with a Tesla V100 instance and a Titan V GPU. Is there any problem with the settings I use or do I have to do something else to fix it.[/QUOTE]"Is there any problem with the settings I use or do I have to do something else to fix it."
It's hard to say, without knowing the CUDALucas version used, CUDA level used, exponent(s), fft length(s), or any settings you use when you see this behavior, whether you get any correct results on Volta, etc. Please provide some specifics of when you see this. Also when you don't. Also whether replication on other models is by continuation or restart from scratch or whatever.

Yes, 0x0 any printed residue before the last (or the iteration before that when exponent p>127) is a problem.
You could look through [URL]https://www.mersenneforum.org/showpost.php?p=488524&postcount=3[/URL] for 0x0 cases.
It could be a bug for which there's a workaround patch available, a known bug with no known fix, a previously undocumented (newly found) bug, or a setting issue.

TheJudger 2018-10-06 17:46

[QUOTE=xx005fs;497460]Hi Guys. I noticed that I am consistently getting 0x0000000000000000 for the residue for every iteration output on Nvidia Volta hardware. The error output for CUDALucas also says 0.00. This is both replicable with a Tesla V100 instance and a Titan V GPU. Is there any problem with the settings I use or do I have to do something else to fix it.[/QUOTE]

Had the same issue yesterday when running benchmarks on RTX 2080 Ti. In my case it was a combination of user error and lack of error checking. I had compiled CUDALucas for a quick & dirty benchmark only for sm75 (Turing). [U]Than I decided to check performance of Volta with CUDA 10.0, too.[/U] The binary runs without any warnings/error messages, benchmarks showed an improvement of nearly 50% over my previous benchmark. But when during the first 30000 iterations of M57885161 for James I've noticed those 0x0000000000000000 and knew something is wrong. Recompiling CUDALucas for sm70 solved this issue and performance was back on the same level as CUDA 9.1/9.2.

I know it is not perfect but in mfaktc I do[CODE]
cudaError = cudaGetLastError();
if(cudaError != cudaSuccess)
printf("ERROR: cudaGetLastError() returned %d: %s\n", cudaError, cudaGetErrorString(cudaError));[/CODE]

every now and then. From host those CUDA calls are asynchronous and so there is no return value and you have to ask for errors [I]later[/I]. This would catch those types of errors easily and is the main (but not only) source of the famous[CODE]
[B]ERROR: cudaGetLastError() returned 8: invalid device function[/B][/CODE] in mfaktc for example [URL="https://mersenneforum.org/showpost.php?p=496755&postcount=2883"]here[/URL] when running old binaries on GTX 2080.

Oliver


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.