![]() |
[QUOTE=f11ksx;284236]V1.41 is running pretty well just like v1.42 :smile:
9.3 ms/iter for 54M exponent on GTX-580 card with v1.41 8.6 ms/iter with v1.42 Domo[/QUOTE] 8.6ms still correct? Was the 580 at clock speed? |
First testing results
Two first results:
1. Found a checkpoint failure: [CODE]F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.2\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe -c1000 -t 72000503 err = 0.369665, increasing n from 3932160 Iteration 10000 18.2 msec/Iter M( 72000503 )C, 0xc4177e6840e2ebe5, n = 4718592, CUDALucas v1.4.2 CUDALucas: Could not find a checkpoint file to resume from F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.2\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe -c10000 -t 72000503 CUDALucas: Resuming from checkpoint file c72000503 something wrong; error message, if any, already printed [COLOR=Red]CUDALucas: inconsistent binary checkpoint; bad FFT run length of 4718592[/COLOR] 0:00 real[/CODE]2. Testing iteration times with M(72000503): a) CUDALucas 1.2 with CUDA 3.2 --> 4096K FFT--> 13.4ms per Iter b) CUDALucas 1.4.2 with CUDA 4.0 --> 4608K FFT--> 17.9ms per Iter If I downscale b) down to 4096K FFT--> 15.9ms per Iter 13.4 ms --> 15.9ms makes me conclude that CUDALucas 1.4.2 is 19% slower than 1.2. :sad: Any thoughts on that? Why does CL 1.4.2 choose a longer FFT? |
With 1.4.2 is there any way to get ETA?
Jerry |
1 Attachment(s)
I set up a Visual Studio 2008 / CUDA 3.2 environment and compiled v1.4.2.
Timings for M36200419 on a GTX570: CUDALucas 1.4.2 with CUDA 4.0 --> 5.35 ms per Iter CUDALucas 1.4.2 with CUDA 3.2 --> 5.16 ms per Iter CUDALucas 1.2 with CUDA 3.2 --> 4.50 ms per Iter |
Tried one of the binaries attached here against v1.3(beta) 4.0, cc 2.0 (which I am currently using). Experiencing the same as the people before said: reduced performance from 15% to 20% on v1.4.2 :no:
Therefore I am staying on v1.3 for now. |
What's new in 1.4.2
I haven't been watching this thread for a while and I see a lot has happened.
I've noticed that 1.4.2 should support non-power of 2 FFT. What else is different from 1.2? Thanks, Andriy PS Could we get rid of K&R style prototypes in the code to make it more readable? |
Status report
msft coded:
1.3 with ms per Iter output and 1.4 with non-power-of-2-ffts 1.4.1 minor bugfix 1.4.2 cpu issue & some speedup A 1.3 was also code by ethan (eo) some time ago. I think msft resumed from his 1.2 but replaced "caso" prints with more speaking texts. [B] Main problem is performance slowdown. [/B]We need some GPU debugging. I don't think msft eliminated the CUDA safe call that ethan eliminated but this is okay for me as graphics became laggy for me. msft was very nice to support with 1.4 problems.... |
Hi ,
I can't debug rw.cu. I think rewrite check point routine. |
1 Attachment(s)
Ver 1.44
Fixed checkpoint issue. Ver 1.44: [code] Iteration 10000 6.7 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.44 Iteration 10000 10.7 msec/Iter M( 20000000 )C, 0xb6475f8cb0888740, n = 1572864, CUDALucas v1.44 Iteration 10000 13.8 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.44 Iteration 10000 17.6 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.44 Iteration 10000 21.3 msec/Iter M( 50000000 )C, 0x80dabfda58bb63db, n = 3145728, CUDALucas v1.44 Iteration 10000 24.6 msec/Iter M( 60000000 )C, 0x8db3527512f3559b, n = 3670016, CUDALucas v1.44 Iteration 10000 28 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.44 Iteration 10000 31.1 msec/Iter M( 80000000 )C, 0xa2dfe07c9f24275d, n = 4718592, CUDALucas v1.44 Iteration 10000 35.5 msec/Iter M( 90000000 )C, 0xf0703f404c4eb47a, n = 5242880, CUDALucas v1.44 [/code] Ver 1.3: [code] Iteration 10000 5.9 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.3 Iteration 10000 11.6 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.3 Iteration 10000 25 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.3 Iteration 10000 50.4 msec/Iter M( 90000000 )C, 0xf0703f404c4eb47a, n = 8388608, CUDALucas v1.3 [/code] |
1 Attachment(s)
Ver 1.45
Fixed performance issue. [code] Iteration 10000 6 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.45 Iteration 10000 11.8 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.45 Iteration 10000 25.2 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.45 [/code] |
Win64 SM 13 compile
1 Attachment(s)
1.45 Win64 SM 1.3 compile, untested.
|
| All times are UTC. The time now is 23:07. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.