![]() |
Win64 SM 21 compile
1 Attachment(s)
1.45 Win64 SM 2.1 compile, untested.
|
patch
1 Attachment(s)
Two things in this patch:
1) reduce memory usage of g_maxerr (no speed improvement) from sizeof(BIG_DOUBLE)*n/STRIDE to sizeof(BIG_DOUBLE) 2) template normalize_kernel and normalize_kernel_zz with g_err_flag (very minor speed improvement) 3) manually unroll the g_err_flag=0 condition of the same normalize kernels (biggest speed improvement due to more fully using registers and re-using part of the IDX calculation) This works fine on sm_20 and sm_21, hoewever ***I am unable to test this on older hardware*** so it might not work as expected on sm_13. Original Timings (GTX460): [CODE]Iteration 10000 2.2 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.45 ... M( 216091 )P, n = 524288, CUDALucas v1.45 7:51 real Iteration 10000 11 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.45[/CODE] With patch: [CODE]Iteration 10000 1.8 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.45 ... M( 216091 )P, n = 524288, CUDALucas v1.45 6:38 real Iteration 10000 10.7 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.45[/CODE] |
1 Attachment(s)
Hi ,aaronhaviland
Thank you good work. Ver 1.46 Add aaronhavilad's patch. [code] Iteration 10000 5.6 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.46 Iteration 10000 10.5 msec/Iter M( 20000000 )C, 0xb6475f8cb0888740, n = 1572864, CUDALucas v1.46 Iteration 10000 11.4 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.46 Iteration 10000 17.3 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.46 Iteration 10000 21.1 msec/Iter M( 50000000 )C, 0x80dabfda58bb63db, n = 3145728, CUDALucas v1.46 Iteration 10000 24.3 msec/Iter M( 60000000 )C, 0x8db3527512f3559b, n = 3670016, CUDALucas v1.46 Iteration 10000 24.8 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.46 Iteration 10000 30.9 msec/Iter M( 80000000 )C, 0xa2dfe07c9f24275d, n = 4718592, CUDALucas v1.46 Iteration 10000 35.2 msec/Iter M( 90000000 )C, 0xf0703f404c4eb47a, n = 5242880, CUDALucas v1.46 [/code] |
[QUOTE=Brain;285244]1.45 Win64 SM 2.1 compile, untested.[/QUOTE]
Good double-check of [URL="http://www.mersenne.org/report_exponent/?exp_lo=25912171&exp_hi=10000&B1=Get+status"]M25912171[/URL] with 1.45 |
Win64 SM 13 compile
1 Attachment(s)
1.46 Win64 SM 1.3 compile, untested.
|
Win64 SM 21 compile
1 Attachment(s)
1.46 Win64 SM 2.1 compile, untested.
|
[QUOTE=Brain;285626]1.46 Win64 SM 2.1 compile, untested.[/QUOTE]
Thanks for compiling! I'm testing right now. Times dropped on a M25XXXXXX exponent from 4 to 3.4 msec/Iter. |
Hi ,flashjh
Thank you for your report. Hi ,Brain Thank you for your lots of work. |
[QUOTE=Brain;285626]1.46 Win64 SM 2.1 compile, untested.[/QUOTE]
Good double-check with 1.46 M( [URL="http://www.mersenne.org/report_exponent/?exp_lo=25922849&exp_hi=&B1=Get+status"]25922849[/URL] )C, 0xce0f7802d76d17e2, n = 1572864, CUDALucas v1.46 |
[QUOTE=Brain;285626]1.46 Win64 SM 2.1 compile, untested.[/QUOTE]
Good double-check with 1.46 M( [URL="http://www.mersenne.org/report_exponent/?exp_lo=25922849&exp_hi=&B1=Get+status"]25922849[/URL] )C, 0xce0f7802d76d17e2, n = 1572864, CUDALucas v1.46 |
My computer was having problems... it never showed as posted. Sorry.
|
| All times are UTC. The time now is 23:07. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.