mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-04-07, 21:03   #1189
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by Brain View Post
@msft/others: What does this error message exactly mean? Formerly, I got "allocation errors" so I'm a bit surprised...
Code:
$ ./CUDALucas -threads 512 332220523
DEVICE:0------------------------
name                GeForce GTX 550 Ti
totalGlobalMem      1072889856
...
start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 20971520
Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29)
Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17)
msft is offline   Reply With Quote
Old 2012-04-07, 21:09   #1190
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by msft View Post
Code:
$ ./CUDALucas -threads 512 332220523
DEVICE:0------------------------
name                GeForce GTX 550 Ti
totalGlobalMem      1072889856
...
start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 18874368
err = 0.35937, increasing n from 18874368

start M332220523 fft length = 20971520
Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29)
Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17)
Why is that 1.66 instead of 2.00?
Dubslow is offline   Reply With Quote
Old 2012-04-07, 21:13   #1191
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3,407 Posts
Default

Quote:
Originally Posted by msft View Post
CUDALucas -threads 512 332220523
That also works fine here on v2.00 Windows. VRAM usage is 893MB (minus the 126MB idle at desktop = 767MB used).

What is the default value of "-threads"? Is 512 larger or smaller than default?
James Heinrich is online now   Reply With Quote
Old 2012-04-07, 21:21   #1192
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by msft View Post
multiples 32768(threads=256)
multiples 65536(threads=512)
multiples 131072(threads=1024)
Quote:
Originally Posted by aaronhaviland View Post
Absolutely (from the CUFFT documentation):

I've been testing CUFFT timings for other lengths than just multiples of 32768. I've excluded the timings because they're not run exactly as CUDALucas would run them, but the fact that they are "optimal lengths" should still apply.

Eff% is is calculated similarly to the prior examples here, but scaled so that the results are all within the range 0-100. Very few lengths have Eff% between 15% - 75%; the majority of inefficient lengths ran around 9-10%. These have all been excluded. Some of the 70-80% efficient run-lengths have also been excluded because they are smaller than a larger+faster length. Note the exponents in blue which would be skipped over if only looking at multiples of 32768:

Code:
FFT           Exponent
Size    Eff%   2 3 5 7
======================
1048576 97.23 20 0 0 0
1105920 88.82 13 3 1 0
1179648 91.20 17 2 0 0
1204224 82.49 13 1 0 2
1310720 89.06 18 0 1 0
1327104 90.86 14 4 0 0
1376256 85.13 16 1 0 1
1474560 89.14 15 2 1 0
1548288 89.05 13 3 0 1
1572864 89.23 19 1 0 0
1605632 88.84 15 0 0 2
1769472 92.58 16 3 0 0
1835008 89.17 18 0 0 1
2097152 95.87 21 0 0 0
2211840 87.81 14 3 1 0
2359296 89.84 18 2 0 0
2370816 80.62 8  3 0 3
2408448 81.08 14 1 0 2
2621440 87.60 19 0 1 0
2654208 85.52 15 4 0 0
2709504 82.21 11 3 0 2
2809856 82.38 13 0 0 3
2985984 87.28 12 6 0 0
3096576 85.87 14 3 0 1
3145728 85.74 20 1 0 0
3211264 82.12 16 0 0 2
3317760 82.69 13 4 1 0
3359232 74.71 9  8 0 0
3386880 71.31 9  3 1 2
3932160 80.93 18 1 1 0
4014080 80.65 14 0 1 2
4096000 73.66 15 0 3 0
4194304 95.87 22 0 0 0
4423680 87.81 15 3 1 0
4718592 89.84 19 2 0 0
4741632 80.62 9  3 0 3
4816896 81.08 15 1 0 2
5242880 87.60 20 0 1 0
5308416 85.52 16 4 0 0
5419008 82.21 12 3 0 2
5619712 82.38 14 0 0 3
5971968 87.28 13 6 0 0
6193152 85.87 15 3 0 1
6291456 85.74 21 1 0 0
6422528 82.12 17 0 0 2
6635520 82.69 14 4 1 0
6718464 74.71 10 8 0 0
6773760 71.31 10 3 1 2
7864320 80.93 19 1 1 0
8028160 80.65 15 0 1 2
8192000 73.66 16 0 3 0
http://www.mersenneforum.org/showpos...&postcount=959
Quote:
Originally Posted by msft View Post
multiples 32768(threads=256)
multiples 65536(threads=512)
multiples 131072(threads=1024)
So with threads >= 256, you must have multiples of 32K. (Threads lower than that would significantly impact performance, I would think.)
Dubslow is offline   Reply With Quote
Old 2012-04-07, 23:35   #1193
aaronhaviland
 
Jan 2011
Dudley, MA, USA

73 Posts
Default

Quote:
Originally Posted by Dubslow View Post
http://www.mersenneforum.org/showpos...&postcount=959

So with threads >= 256, you must have multiples of 32K. (Threads lower than that would significantly impact performance, I would think.)
I believe that's a recommendation, not a requirement. There *is* a requirement that the length is a multiple of threads.

e.g. I just ran a quick test, and got a valid result for:
./CUDALucas -threads 512 -f 175616 2700067 -t
M( 2700067 )C, 0x787c1272dc144ba2, n = 175616, CUDALucas v2.00

Granted, 175616 isn't one of the faster lengths, but it is (2^9)*(7^3) i.e. not a multiple of 32768
aaronhaviland is offline   Reply With Quote
Old 2012-04-07, 23:37   #1194
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

21438 Posts
Default

Quote:
Originally Posted by apsen View Post
I've been assigned triple check and got mismatch with the first two checks for 28982959.

I've run the check twice with different FFT lengths (and -t both times) and got all residues match.

Could someone run it through P95?

Thanks,
Andriy
@apsen

My P95 run is complete:
Code:
 
UID: flashjh/TF2, M28982959 is not prime. Res64: 5B3274500F7D17__. We4: 858095B2,16603096,00000000
If it matches yours, let me know so we can submit the results together. It it doesn't match, let me know what you'd like to do.

Last fiddled with by flashjh on 2012-04-07 at 23:57
flashjh is offline   Reply With Quote
Old 2012-04-08, 00:40   #1195
zs6nw
 
Dec 2007

111002 Posts
Default

Aaron, thanks for that great insight, and would you believe, available for reading in a manual.

This immediately interested me in FFT_size = 4194304 which is 95.87% efficient, but unfortunately the program terminated with error too large.

Quote:
Originally Posted by aaronhaviland View Post
Absolutely (from the CUFFT documentation):

I've been testing CUFFT timings for other lengths than just multiples of 32768. I've excluded the timings because they're not run exactly as CUDALucas would run them, but the fact that they are "optimal lengths" should still apply.

Eff% is is calculated similarly to the prior examples here, but scaled so that the results are all within the range 0-100. Very few lengths have Eff% between 15% - 75%; the majority of inefficient lengths ran around 9-10%. These have all been excluded. Some of the 70-80% efficient run-lengths have also been excluded because they are smaller than a larger+faster length. Note the exponents in blue which would be skipped over if only looking at multiples of 32768:

Code:
FFT           Exponent
Size    Eff%   2 3 5 7
======================
1048576 97.23 20 0 0 0
1105920 88.82 13 3 1 0
1179648 91.20 17 2 0 0
1204224 82.49 13 1 0 2
1310720 89.06 18 0 1 0
1327104 90.86 14 4 0 0
1376256 85.13 16 1 0 1
1474560 89.14 15 2 1 0
1548288 89.05 13 3 0 1
1572864 89.23 19 1 0 0
1605632 88.84 15 0 0 2
1769472 92.58 16 3 0 0
1835008 89.17 18 0 0 1
2097152 95.87 21 0 0 0
2211840 87.81 14 3 1 0
2359296 89.84 18 2 0 0
2370816 80.62 8  3 0 3
2408448 81.08 14 1 0 2
2621440 87.60 19 0 1 0
2654208 85.52 15 4 0 0
2709504 82.21 11 3 0 2
2809856 82.38 13 0 0 3
2985984 87.28 12 6 0 0
3096576 85.87 14 3 0 1
3145728 85.74 20 1 0 0
3211264 82.12 16 0 0 2
3317760 82.69 13 4 1 0
3359232 74.71 9  8 0 0
3386880 71.31 9  3 1 2
3932160 80.93 18 1 1 0
4014080 80.65 14 0 1 2
4096000 73.66 15 0 3 0
4194304 95.87 22 0 0 0
4423680 87.81 15 3 1 0
4718592 89.84 19 2 0 0
4741632 80.62 9  3 0 3
4816896 81.08 15 1 0 2
5242880 87.60 20 0 1 0
5308416 85.52 16 4 0 0
5419008 82.21 12 3 0 2
5619712 82.38 14 0 0 3
5971968 87.28 13 6 0 0
6193152 85.87 15 3 0 1
6291456 85.74 21 1 0 0
6422528 82.12 17 0 0 2
6635520 82.69 14 4 1 0
6718464 74.71 10 8 0 0
6773760 71.31 10 3 1 2
7864320 80.93 19 1 1 0
8028160 80.65 15 0 1 2
8192000 73.66 16 0 3 0
zs6nw is offline   Reply With Quote
Old 2012-04-08, 01:34   #1196
zs6nw
 
Dec 2007

22·7 Posts
Default

So I picked FFT_size = 2985984 (2^12 * 3^6).

Does 45ms/iteration look OK for a GT 430?

Code:
>cudalucas.2.00]$ ./cul -d 0 -f 2985984 -t 49845883

DEVICE:0------------------------
name                GeForce GT 430
clockRate           1400000

start M49845883 fft length = 2985984

Iteration 10000 M( 49845883 )C, 0xbb8661cd90463e94, n = 2985984, CUDALucas v2.00 err = 0.03711 (7:38 real, 45.7782 ms/iter, ETA 633:38:49)
Iteration 20000 M( 49845883 )C, 0xf1d53981f966befa, n = 2985984, CUDALucas v2.00 err = 0.03711 (7:39 real, 45.8859 ms/iter, ETA 635:00:37)

.
zs6nw is offline   Reply With Quote
Old 2012-04-08, 01:56   #1197
aaronhaviland
 
Jan 2011
Dudley, MA, USA

4916 Posts
Default

Quote:
Originally Posted by zs6nw View Post
Aaron, thanks for that great insight, and would you believe, available for reading in a manual.

This immediately interested me in FFT_size = 4194304 which is 95.87% efficient, but unfortunately the program terminated with error too large.
The eff% numbers are artificially scaled so that the numbers are all in the range of 0-100%. It's better to assume they represent the efficiency as compared to the theoretical maximum efficiency at that given size, although I am not sure what the actual theoretical maximum would be...

I don't know why that FFT size is "too large" as it doesn't take much memory, and seems to work just fine for me. What's the specific error message?
aaronhaviland is offline   Reply With Quote
Old 2012-04-08, 01:58   #1198
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by aaronhaviland View Post
I don't know why that FFT size is "too large" as it doesn't take much memory, and seems to work just fine for me. What's the specific error message?
I think he just meant round-off error, not an FFT/mem error.
Dubslow is offline   Reply With Quote
Old 2012-04-08, 02:47   #1199
zs6nw
 
Dec 2007

22·7 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I think he just meant round-off error, not an FFT/mem error.
Pardon my inconcise-ness... It was indeed a round-off error.

If you do get a round-off error "deep" in, is all your previous work wasted?
zs6nw is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 03:11.


Sat Jul 17 03:11:52 UTC 2021 up 50 days, 59 mins, 1 user, load averages: 1.27, 1.35, 1.33

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.