mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

msft 2012-04-07 21:03

[QUOTE=Brain;295705]@msft/others: What does this error message exactly mean? Formerly, I got "allocation errors" so I'm a bit surprised...
[/QUOTE]
[code]
$ ./CUDALucas -threads 512 332220523
DEVICE:0------------------------
name GeForce GTX 550 Ti
totalGlobalMem 1072889856
...
start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 20971520
Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29)
Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17)
[/code]

Dubslow 2012-04-07 21:09

[QUOTE=msft;295750][code]
$ ./CUDALucas -threads 512 332220523
DEVICE:0------------------------
name GeForce GTX 550 Ti
totalGlobalMem 1072889856
...
start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 20971520
Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29)
Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17)
[/code][/QUOTE]

Why is that 1.66 instead of 2.00?

James Heinrich 2012-04-07 21:13

[QUOTE=msft;295750]CUDALucas [b]-threads 512[/b] 332220523[/QUOTE]That also works fine here on v2.00 Windows. VRAM usage is 893MB (minus the 126MB idle at desktop = 767MB used).

What is the default value of "-threads"? Is 512 larger or smaller than default?

Dubslow 2012-04-07 21:21

[QUOTE=msft;292776]multiples 32768(threads=256)
multiples 65536(threads=512)
multiples 131072(threads=1024)[/QUOTE]

[QUOTE=aaronhaviland;295742]Absolutely (from the CUFFT documentation):

I've been testing CUFFT timings for other lengths than just multiples of 32768. I've excluded the timings because they're not run exactly as CUDALucas would run them, but the fact that they are "optimal lengths" should still apply.

Eff% is is calculated similarly to the prior examples here, but scaled so that the results are all within the range 0-100. Very few lengths have Eff% between 15% - 75%; the majority of inefficient lengths ran around 9-10%. These have all been excluded. Some of the 70-80% efficient run-lengths have also been excluded because they are smaller than a larger+faster length. [COLOR=Blue]Note the exponents in blue which would be skipped over if only looking at multiples of 32768[/COLOR]:

[CODE]
FFT Exponent
Size Eff% 2 3 5 7
======================
1048576 97.23 20 0 0 0
[COLOR=Blue]1105920 88.82 13 3 1 0[/COLOR]
1179648 91.20 17 2 0 0[COLOR=Blue]
1204224 82.49 13 1 0 2[/COLOR]
1310720 89.06 18 0 1 0[COLOR=Blue]
1327104 90.86 14 4 0 0[/COLOR]
1376256 85.13 16 1 0 1
1474560 89.14 15 2 1 0[COLOR=Blue]
1548288 89.05 13 3 0 1[/COLOR]
1572864 89.23 19 1 0 0
1605632 88.84 15 0 0 2
1769472 92.58 16 3 0 0
1835008 89.17 18 0 0 1
2097152 95.87 21 0 0 0[COLOR=Blue]
2211840 87.81 14 3 1 0[/COLOR]
2359296 89.84 18 2 0 0[COLOR=Blue]
2370816 80.62 8 3 0 3[/COLOR][COLOR=Blue]
2408448 81.08 14 1 0 2[/COLOR]
2621440 87.60 19 0 1 0
2654208 85.52 15 4 0 0
[COLOR=Blue]2709504 82.21 11 3 0 2
2809856 82.38 13 0 0 3
2985984 87.28 12 6 0 0
3096576 85.87 14 3 0 1
[/COLOR]3145728 85.74 20 1 0 0
3211264 82.12 16 0 0 2
[COLOR=Blue]3317760 82.69 13 4 1 0
3359232 74.71 9 8 0 0
3386880 71.31 9 3 1 2
[/COLOR]3932160 80.93 18 1 1 0
[COLOR=Blue]4014080 80.65 14 0 1 2
[/COLOR]4096000 73.66 15 0 3 0
4194304 95.87 22 0 0 0
4423680 87.81 15 3 1 0
4718592 89.84 19 2 0 0
[COLOR=Blue]4741632 80.62 9 3 0 3
[/COLOR]4816896 81.08 15 1 0 2
5242880 87.60 20 0 1 0
5308416 85.52 16 4 0 0
[COLOR=Blue]5419008 82.21 12 3 0 2
5619712 82.38 14 0 0 3
5971968 87.28 13 6 0 0
[/COLOR]6193152 85.87 15 3 0 1
6291456 85.74 21 1 0 0
6422528 82.12 17 0 0 2
[COLOR=Blue]6635520 82.69 14 4 1 0
6718464 74.71 10 8 0 0
6773760 71.31 10 3 1 2
[/COLOR]7864320 80.93 19 1 1 0
8028160 80.65 15 0 1 2
8192000 73.66 16 0 3 0[/CODE][/QUOTE]
[url]http://www.mersenneforum.org/showpost.php?p=292776&postcount=959[/url]
[QUOTE=msft;292776]multiples 32768(threads=256)
multiples 65536(threads=512)
multiples 131072(threads=1024)[/QUOTE]
So with threads >= 256, you must have multiples of 32K. (Threads lower than that would significantly impact performance, I would think.)

aaronhaviland 2012-04-07 23:35

[QUOTE=Dubslow;295754][URL]http://www.mersenneforum.org/showpost.php?p=292776&postcount=959[/URL]

So with threads >= 256, you must have multiples of 32K. (Threads lower than that would significantly impact performance, I would think.)[/QUOTE]
I believe that's a recommendation, not a requirement. There *is* a requirement that the length is a multiple of threads.

e.g. I just ran a quick test, and got a valid result for:
./CUDALucas -threads 512 -f 175616 2700067 -t
M( 2700067 )C, 0x787c1272dc144ba2, n = 175616, CUDALucas v2.00

Granted, 175616 isn't one of the faster lengths, but it is (2^9)*(7^3) i.e. not a multiple of 32768

flashjh 2012-04-07 23:37

[QUOTE=apsen;295165]I've been assigned triple check and got mismatch with the first two checks for 28982959.

I've run the check twice with different FFT lengths (and -t both times) and got all residues match.

Could someone run it through P95?

Thanks,
Andriy[/QUOTE]

@apsen

My P95 run is complete:
[CODE]
UID: flashjh/TF2, M28982959 is not prime. Res64: 5B3274500F7D17__. We4: 858095B2,16603096,00000000
[/CODE]

If it matches yours, let me know so we can submit the results together. It it doesn't match, let me know what you'd like to do.

zs6nw 2012-04-08 00:40

Aaron, thanks for that great insight, and would you believe, available for reading in a manual. :smile:

This immediately interested me in FFT_size = 4194304 which is 95.87% efficient, but unfortunately the program terminated with error too large.

[QUOTE=aaronhaviland;295742]Absolutely (from the CUFFT documentation):

I've been testing CUFFT timings for other lengths than just multiples of 32768. I've excluded the timings because they're not run exactly as CUDALucas would run them, but the fact that they are "optimal lengths" should still apply.

Eff% is is calculated similarly to the prior examples here, but scaled so that the results are all within the range 0-100. Very few lengths have Eff% between 15% - 75%; the majority of inefficient lengths ran around 9-10%. These have all been excluded. Some of the 70-80% efficient run-lengths have also been excluded because they are smaller than a larger+faster length. [COLOR=Blue]Note the exponents in blue which would be skipped over if only looking at multiples of 32768[/COLOR]:

[CODE]
FFT Exponent
Size Eff% 2 3 5 7
======================
1048576 97.23 20 0 0 0
[COLOR=Blue]1105920 88.82 13 3 1 0[/COLOR]
1179648 91.20 17 2 0 0[COLOR=Blue]
1204224 82.49 13 1 0 2[/COLOR]
1310720 89.06 18 0 1 0[COLOR=Blue]
1327104 90.86 14 4 0 0[/COLOR]
1376256 85.13 16 1 0 1
1474560 89.14 15 2 1 0[COLOR=Blue]
1548288 89.05 13 3 0 1[/COLOR]
1572864 89.23 19 1 0 0
1605632 88.84 15 0 0 2
1769472 92.58 16 3 0 0
1835008 89.17 18 0 0 1
2097152 95.87 21 0 0 0[COLOR=Blue]
2211840 87.81 14 3 1 0[/COLOR]
2359296 89.84 18 2 0 0[COLOR=Blue]
2370816 80.62 8 3 0 3[/COLOR][COLOR=Blue]
2408448 81.08 14 1 0 2[/COLOR]
2621440 87.60 19 0 1 0
2654208 85.52 15 4 0 0
[COLOR=Blue]2709504 82.21 11 3 0 2
2809856 82.38 13 0 0 3
2985984 87.28 12 6 0 0
3096576 85.87 14 3 0 1
[/COLOR]3145728 85.74 20 1 0 0
3211264 82.12 16 0 0 2
[COLOR=Blue]3317760 82.69 13 4 1 0
3359232 74.71 9 8 0 0
3386880 71.31 9 3 1 2
[/COLOR]3932160 80.93 18 1 1 0
[COLOR=Blue]4014080 80.65 14 0 1 2
[/COLOR]4096000 73.66 15 0 3 0
4194304 95.87 22 0 0 0
4423680 87.81 15 3 1 0
4718592 89.84 19 2 0 0
[COLOR=Blue]4741632 80.62 9 3 0 3
[/COLOR]4816896 81.08 15 1 0 2
5242880 87.60 20 0 1 0
5308416 85.52 16 4 0 0
[COLOR=Blue]5419008 82.21 12 3 0 2
5619712 82.38 14 0 0 3
5971968 87.28 13 6 0 0
[/COLOR]6193152 85.87 15 3 0 1
6291456 85.74 21 1 0 0
6422528 82.12 17 0 0 2
[COLOR=Blue]6635520 82.69 14 4 1 0
6718464 74.71 10 8 0 0
6773760 71.31 10 3 1 2
[/COLOR]7864320 80.93 19 1 1 0
8028160 80.65 15 0 1 2
8192000 73.66 16 0 3 0[/CODE][/QUOTE]

zs6nw 2012-04-08 01:34

So I picked FFT_size = 2985984 (2^12 * 3^6).

Does 45ms/iteration look OK for a GT 430?

[code]>cudalucas.2.00]$ ./cul -d 0 -f 2985984 -t 49845883

DEVICE:0------------------------
name GeForce GT 430
clockRate 1400000

start M49845883 fft length = 2985984

Iteration 10000 M( 49845883 )C, 0xbb8661cd90463e94, n = 2985984, CUDALucas v2.00 err = 0.03711 (7:38 real, 45.7782 ms/iter, ETA 633:38:49)
Iteration 20000 M( 49845883 )C, 0xf1d53981f966befa, n = 2985984, CUDALucas v2.00 err = 0.03711 (7:39 real, 45.8859 ms/iter, ETA 635:00:37)

.[/code]

aaronhaviland 2012-04-08 01:56

[QUOTE=zs6nw;295768]Aaron, thanks for that great insight, and would you believe, available for reading in a manual. :smile:

This immediately interested me in FFT_size = 4194304 which is 95.87% efficient, but unfortunately the program terminated with error too large.[/QUOTE]
The eff% numbers are artificially scaled so that the numbers are all in the range of 0-100%. It's better to assume they represent the efficiency as compared to the theoretical maximum efficiency at that given size, although I am not sure what the actual theoretical maximum would be...

I don't know why that FFT size is "too large" as it doesn't take much memory, and seems to work just fine for me. What's the specific error message?

Dubslow 2012-04-08 01:58

[QUOTE=aaronhaviland;295773]
I don't know why that FFT size is "too large" as it doesn't take much memory, and seems to work just fine for me. What's the specific error message?[/QUOTE]I think he just meant round-off error, not an FFT/mem error.

zs6nw 2012-04-08 02:47

[QUOTE=Dubslow;295774]I think he just meant round-off error, not an FFT/mem error.[/QUOTE]

Pardon my inconcise-ness... It was indeed a round-off error.

If you do get a round-off error "deep" in, is all your previous work wasted?


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.