![]() |
|
|
#727 |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
Is sm2.0 for computer capability 2.0 and sm2.1 for cc 2.1? Does is make a difference in performance to use a 2.0 build for 2.0?
|
|
|
|
|
|
#728 |
|
Mar 2010
1100110112 Posts |
I once tried to figure that out on gtx 480.
sm_13 was the fastest, then sm_20 and sm_21 nearly identical in speed. This is a reminder that the compiler doesnt optimize the code for specific arch greatly. Everything should be done by hand. |
|
|
|
|
|
#729 | |
|
Romulan Interpreter
Jun 2011
Thailand
226778 Posts |
Quote:
You should consider that if one (3.2+cc.1.3) test is bad in 30-50 tests done, then you still gain nothing, and this is happening for me occasionally, so I found out that is prefferable to use drv 4.0 with cc 2.0. I have no idea how using cc 2.1 would influence everything, it seems that both 580/tesla can run cudalucas compiled with cc 2.1 without significant slowdown, but their specs are cc 2.0 and as I don't know too much about the differences between the two versions of cc, I prefer not to risk the accuracy. Going from 3.2 to 4 (or higher) is definitely decreasing the speed, and also going from cc1.3 to 2.0 or higher, but as I said, there should be a reason why they made it, as the only gain is in accuracy (as the speed is not the gain). And trying to prove it to myself, I occasionally found out wrong residues. I did not test yet the current cudalucas 1.49, this is in process now (just downloaded and put it to chores). But up to 1.48 inclusive, the best choice for 580/tesla is cudalucas.1.3.alpha_eoc with drv 4.0 and cc 2.0 I even did experiments like starting a test with CL1.48, and then stop it, and resume it with CL1.3. This was very funny, because the 1.48 started the test with a lower FFT (I remember the saving checkpoint files had 9 or 12 megs instead of 16 or 20 megs). Then 1.3.alpha resumed with THAT LOW FFT, and continued it like that, finishing everything much-much faster (about double speed of normal power-of-2 FFT used by 1.3, and about 10-20 percent faster then 1.48 should do it!) But what a pity, all the residues were wrong, haha. And the initial test could not spot this, because the 859433 which I used as a testing exponent (it generates a prime mersenne) used the same FFT on both versions ![]() Well, joking apart, I will keep you posted with 1.49 results on Fermis. One improvement I see already is correctly identifying the GPU (on a multi-gpu system) -GOOD! and showing the ETA's, but they are closer to 1.48 then to 1.33 (that is, slower :P). And ETA is in seconds, not in HMS as v1.3alpha used to train me already to read. BAD - I have to use a pocket calculator to see the real ETA :P Also, for tuning reasons, I would prefer a higher accuracy for the ms/iteration, as again, v1.3alpha_eoc made me used to it (at 50M iterations, 0.05ms faster could mean almost an hour of "finishing the test faster"!). Accuracy of one decimal only is too small, and creates confusion with "rounding or truncating" mechanism. Three decimals on printf() would suffice. P.S. since I am trying to find me english words here, v1.49 successfully proved both 2^756839-1 and 2^859433-1 primes, on both 580 and tesla boards.Another minor bug is that the message "could not find a checkpoint to resume for" is written at the end, after the tests were finished :D:D:P... The 4 tests took between 11 and 13 minutes (780/800 GPU clock) Last fiddled with by LaurV on 2012-02-11 at 09:02 |
|
|
|
|
|
|
#730 |
|
"Jerry"
Nov 2011
Vancouver, WA
21438 Posts |
Thanks for the great info. I'll do some testing on a 580 and see what I get.
Last fiddled with by flashjh on 2012-02-11 at 13:37 |
|
|
|
|
|
#731 | |
|
Jun 2005
3·43 Posts |
Quote:
|
|
|
|
|
|
|
#732 | |
|
"Jerry"
Nov 2011
Vancouver, WA
112310 Posts |
Quote:
|
|
|
|
|
|
|
#733 |
|
Romulan Interpreter
Jun 2011
Thailand
3·3,221 Posts |
The first two DC's completed with CudaLucas1.49, drv4.0, cc2.0:
M( 26026433 )C, 0x457f73d49f90b822, n = 1572864, CUDALucas v1.49 M( 26176441 )C, 0x19283a19b247ba__, n = 1572864, CUDALucas v1.49 First is a match. Second is not (therefore I masked it). Both results coming from a gtx580 standard clock (no overclock this time). I am not going to repeat the second test as long as it is not confirmed as bad by a p95 run. After someone will clear the exponent, and if my test proved bad, I will repeat it to see if it come from the program. edit: another small observation, the -c switch does not work for the screen, like in the older versions (I did not check if it really works for the checkpoint files too, but the screen effect is the first one observable) used to work. For -c30000, v1.49 still outputs to the screen every 10k iterations. Last fiddled with by LaurV on 2012-02-12 at 11:37 |
|
|
|
|
|
#734 | |
|
Dec 2009
Peine, Germany
14B16 Posts |
Quote:
|
|
|
|
|
|
|
#735 | |
|
Dec 2009
Peine, Germany
331 Posts |
Quote:
I'm ashamed for this question but does someone feed CUDALucas with something else than the command line options? In other words, is it possible to let CL immediately start the next expo when finishing the latest? |
|
|
|
|
|
|
#736 |
|
Jun 2005
3×43 Posts |
They're missing on lines 1525 and 1587 of rw.cu in the version 1.49 I downloaded from post 723. Also, the one at line 1494 is newer than my old code, so it may or may not need one as well. At least that's the big difference between my version 1.2.whatever and the current builds. Not sure if you're updating the source before building which may explain why you're seeing it there.
And I don't think I messed with the -c output, if that helps narrow down where to look for those changes. |
|
|
|
|
|
#737 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
27AE16 Posts |
I don't currently run CUDALucas. However, there were discussions a while back of loading a stack of assignment command lines into a batch file to feed CL. I don't remember now for sure who it was, but Christenson was pretty active in those talks, and is a strong batch files proponent.
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |