mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Brain 2012-01-07 13:10

Win64 SM 21 compile
 
1 Attachment(s)
1.45 Win64 SM 2.1 compile, untested.

aaronhaviland 2012-01-09 02:17

patch
 
1 Attachment(s)
Two things in this patch:
1) reduce memory usage of g_maxerr (no speed improvement) from sizeof(BIG_DOUBLE)*n/STRIDE to sizeof(BIG_DOUBLE)
2) template normalize_kernel and normalize_kernel_zz with g_err_flag (very minor speed improvement)
3) manually unroll the g_err_flag=0 condition of the same normalize kernels (biggest speed improvement due to more fully using registers and re-using part of the IDX calculation)

This works fine on sm_20 and sm_21, hoewever ***I am unable to test this on older hardware*** so it might not work as expected on sm_13.

Original Timings (GTX460):
[CODE]Iteration 10000 2.2 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.45
...
M( 216091 )P, n = 524288, CUDALucas v1.45
7:51 real

Iteration 10000 11 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.45[/CODE]

With patch:
[CODE]Iteration 10000 1.8 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.45
...
M( 216091 )P, n = 524288, CUDALucas v1.45
6:38 real

Iteration 10000 10.7 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.45[/CODE]

msft 2012-01-09 05:40

1 Attachment(s)
Hi ,aaronhaviland
Thank you good work.

Ver 1.46
Add aaronhavilad's patch.
[code]
Iteration 10000 5.6 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.46
Iteration 10000 10.5 msec/Iter M( 20000000 )C, 0xb6475f8cb0888740, n = 1572864, CUDALucas v1.46
Iteration 10000 11.4 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.46
Iteration 10000 17.3 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.46
Iteration 10000 21.1 msec/Iter M( 50000000 )C, 0x80dabfda58bb63db, n = 3145728, CUDALucas v1.46
Iteration 10000 24.3 msec/Iter M( 60000000 )C, 0x8db3527512f3559b, n = 3670016, CUDALucas v1.46
Iteration 10000 24.8 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.46
Iteration 10000 30.9 msec/Iter M( 80000000 )C, 0xa2dfe07c9f24275d, n = 4718592, CUDALucas v1.46
Iteration 10000 35.2 msec/Iter M( 90000000 )C, 0xf0703f404c4eb47a, n = 5242880, CUDALucas v1.46
[/code]

flashjh 2012-01-09 14:48

[QUOTE=Brain;285244]1.45 Win64 SM 2.1 compile, untested.[/QUOTE]

Good double-check of [URL="http://www.mersenne.org/report_exponent/?exp_lo=25912171&exp_hi=10000&B1=Get+status"]M25912171[/URL] with 1.45

Brain 2012-01-09 22:26

Win64 SM 13 compile
 
1 Attachment(s)
1.46 Win64 SM 1.3 compile, untested.

Brain 2012-01-09 22:28

Win64 SM 21 compile
 
1 Attachment(s)
1.46 Win64 SM 2.1 compile, untested.

flashjh 2012-01-10 01:26

[QUOTE=Brain;285626]1.46 Win64 SM 2.1 compile, untested.[/QUOTE]

Thanks for compiling! I'm testing right now. Times dropped on a M25XXXXXX exponent from 4 to 3.4 msec/Iter.

msft 2012-01-10 09:18

Hi ,flashjh
Thank you for your report.

Hi ,Brain
Thank you for your lots of work.

flashjh 2012-01-11 04:19

[QUOTE=Brain;285626]1.46 Win64 SM 2.1 compile, untested.[/QUOTE]

Good double-check with 1.46

M( [URL="http://www.mersenne.org/report_exponent/?exp_lo=25922849&exp_hi=&B1=Get+status"]25922849[/URL] )C, 0xce0f7802d76d17e2, n = 1572864, CUDALucas v1.46

flashjh 2012-01-11 04:33

[QUOTE=Brain;285626]1.46 Win64 SM 2.1 compile, untested.[/QUOTE]

Good double-check with 1.46

M( [URL="http://www.mersenne.org/report_exponent/?exp_lo=25922849&exp_hi=&B1=Get+status"]25922849[/URL] )C, 0xce0f7802d76d17e2, n = 1572864, CUDALucas v1.46

flashjh 2012-01-11 05:28

My computer was having problems... it never showed as posted. Sorry.


All times are UTC. The time now is 23:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.