mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-24, 03:28   #1090
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

21438 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Wait, whoops...
When you reported the result, you lost the assignment key and it was reassigned. I'd rather not poach...

Edit: Repeat for emphasis: If you want me to do a quick DC with Prime95, you must not submit the result to PrimeNet, so that we retain control of the exponent. (Yes, that does mean checking the residue for a match before submitting.) Sorry about that flash.
No, my fault. I should have checked before I submitted. I've gotten used to everything matching, so I didn't check first. I'll let my CuLu finish, but you're right. Next time...
flashjh is offline   Reply With Quote
Old 2012-03-24, 11:12   #1091
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

5×1,931 Posts
Default

It took me a while and a few re-runs, but I ended up with two good residues with v1.69. (26251817 and 26240761).

After a lot of experimenting I reached the conclusion that you have to use -t always (that means: IS A MUST), just to be on the safe side. In this case, for "production", the polite or aggressive has no influence. Because checking the sums/errors at every iteration works somehow same as the polite "trick" with the memory, it will "give a break" to the GPU for a while, about 20%, so with polite trick the GPU is only busy 79%, and when aggressive, it become 81% busy. In both cases the -t holds the most, and in both cases -t is necessary to be on the safe side (otherwise you will be sorry at the end when residues wont match), and in both cases the computer is responsive enough (this means good!) for daily job and average-hungry graphic applications. If you need more output, next step is to disable -t and enable aggressive mode in the same time. In such case the GPU load goes to 98-99%, you WILL get 25% more output (from 100 to 80 it is 20%, but from 80 to 100 it is 25% :P) but you computer is hotter, louder, much less responsive (assuming the card is also used ad primary graphic) and you lose the confidence. For DC could be ok, if you can afford it, because you have the former residue on PrimeNet and can check your result. But still it is not recommended. For first-time-LL, running without -t would be a BIG mistake, unless you are sure, but SURE, objective, not subjective (like "my card is the best because is mine!"), that your card is a very stable one and does not produce hardware errors, does not get hot, etc.

Much better is if you let -t there, and when you really-really want to maximize your GPU, add one copy of mfaktc. This way you can make nice credit too :D

Last fiddled with by LaurV on 2012-03-24 at 11:15
LaurV is online now   Reply With Quote
Old 2012-03-24, 12:57   #1092
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
I'd like to put together a CUAlucas performance comparison chart
Thanks to those who have submitted data, but I need more data points, please.

After looking over a few benchmark results, I'm going to standardize and ask that everyone submit results using v1.69 on three specific exponents:
Code:
CUDAlucas -polite 0 26214400
CUDAlucas -polite 0 52428800
CUDAlucas -polite 0 78643200
And (important), I need to know what FFT size was used. You may see it start with a smaller FFT size at first and then move up if the error is too high:
Quote:
C:\Prime95\cudalucas>CUDALucas_169_20 -polite 0 26214400

start M26214400 fft length = 1310720
iteration = 22 < 1000 && err = 0.26196 >= 0.25, increasing n from 1310720


start M26214400 fft length = 1572864
Iteration 10000 M( 26214400 )C, 0x0344448e4bf0eb62, n = 1572864, CUDALucas v1.69
err = 0.02403 (0:31 real, 3.0623 ms/iter, ETA 22:17:12)

Iteration 20000 M( 26214400 )C, 0x9f4a57b1f324d325, n = 1572864, CUDALucas v1.69
err = 0.02403 (0:30 real, 3.0247 ms/iter, ETA 22:00:15)
For consistency, I'm using the timing data as reported on iteration 20000. So for anyone willing to run (or re-run) benchmark data for me, please:
* use v1.69 (Windows binaries here)
* use the exact 3 commandlines above
* send me the output from start to 20000 iteration (as the above example).
James Heinrich is offline   Reply With Quote
Old 2012-03-24, 13:40   #1093
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Quote:
Originally Posted by LaurV View Post
you have to use -t always (that means: IS A MUST), just to be on the safe side.
Good point.
I experiment was cudaMemcpyAsync();
But slow.
msft is offline   Reply With Quote
Old 2012-03-24, 14:18   #1094
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by msft View Post
Good point.
I experiment was cudaMemcpyAsync();
But slow.
The -t option doesn't have to copy g_error to the CPU every iteration. It could copy every 10th, or 100th, or whatever. Just make sure you check g_error before writing a new save file.

Last fiddled with by Prime95 on 2012-03-24 at 14:18
Prime95 is offline   Reply With Quote
Old 2012-03-24, 14:49   #1095
msft
 
msft's Avatar
 
Jul 2009
Tokyo

61010 Posts
Default

Quote:
Originally Posted by Prime95 View Post
The -t option doesn't have to copy g_error to the CPU every iteration. It could copy every 10th, or 100th, or whatever. Just make sure you check g_error before writing a new save file.
Yes,Yes.Now test.
Code:
Iteration 80000 M( 86243 )C, 0x871aac1149a65db1, n = 4608, CUDALucas v2.00 err = 0.01172 (0:17 real, 1.7138 ms/iter, ETA 0:00)
M( 86243 )C, 0x0000000000000000, n = 4608, CUDALucas v2.00
msft is offline   Reply With Quote
Old 2012-03-24, 16:57   #1096
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by flashjh View Post
Cool, thanks . I'll post my CuLu re-run results when it's done...

Edit: I attatched the full run test (minus the last residue)
The original P95 DC is correct based on my second run, so the P95 DC will be correct.

M( 26229943 )C, 0x76916187254012__, n = 1474560, CUDALucas v1.69
flashjh is offline   Reply With Quote
Old 2012-03-24, 16:59   #1097
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

46316 Posts
Default

I logged in to complie v2.0, it's gone? Where did it go msft?
flashjh is offline   Reply With Quote
Old 2012-03-24, 17:18   #1098
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by flashjh View Post
I logged in to complie v2.0, it's gone? Where did it go msft?
Sorry find fatal error.

Ver 2.00
1) Speed up with -t option.
2) use "sEXPONENT.ITERATION.RESIDUE.txt"
Code:
$ ./CUDALucas -polite 0 26974951
Iteration 23300000 M( 26974951 )C, 0x31b4d280a170995a, n = 1474560, CUDALucas v2.00 err = 0.1797 (0:56 real, 5.6171 ms/iter, ETA 5:43:34)
$ ./CUDALucas -polite 0 26974951 -t
Iteration 23320000 M( 26974951 )C, 0x537f9e116a703252, n = 1474560, CUDALucas v2.00 err = 0.207 (0:56 real, 5.6250 ms/iter, ETA 5:42:11)
Attached Files
File Type: bz2 CUDALucas.2.00.tar.bz2 (11.6 KB, 87 views)
msft is offline   Reply With Quote
Old 2012-03-24, 17:23   #1099
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Does anyone have a link to the 4.1 cudart64 and cufft64 dll's? I tested 3.2 and 4.0 on one GPU so far, and 3.2 is faster, so I wanted to check 4.1 as well. Thanks.
bcp19 is offline   Reply With Quote
Old 2012-03-24, 17:26   #1100
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2·3·11·73 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Thanks to those who have submitted data, but I need more data points, please.

After looking over a few benchmark results, I'm going to standardize and ask that everyone submit results using v1.69 on three specific exponents:
Code:
CUDAlucas -polite 0 26214400
CUDAlucas -polite 0 52428800
CUDAlucas -polite 0 78643200
And (important), I need to know what FFT size was used. You may see it start with a smaller FFT size at first and then move up if the error is too high:For consistency, I'm using the timing data as reported on iteration 20000. So for anyone willing to run (or re-run) benchmark data for me, please:
* use v1.69 (Windows binaries here)
* use the exact 3 commandlines above
* send me the output from start to 20000 iteration (as the above example).
I am using a GTX275, CUDA toolkit 3.0, cc 1.3.

Here are my benchmarks:

Code:
luigi@luigi-desktop:~/luigi/CUDA/cudaLucas/test/cudalucas.1.69$ ./CUDALucas -polite 0 26214400

start M26214400 fft length = 1310720
iteration = 21 < 1000 && err = 0.287598 >= 0.25, increasing n from 1310720

start M26214400 fft length = 1572864
Iteration 10000 M( 26214400 )C, 0x0344448e4bf0eb62, n = 1572864, CUDALucas v1.69 err = 0.04517 (4:56 real, 29.6005 ms/iter, ETA 215:25:33)
Iteration 20000 M( 26214400 )C, 0x9f4a57b1f324d325, n = 1572864, CUDALucas v1.69 err = 0.04517 (4:54 real, 29.4113 ms/iter, ETA 213:58:00)

---

luigi@luigi-desktop:~/luigi/CUDA/cudaLucas/test/cudalucas.1.69$ ./CUDALucas -polite 0 52428800

start M52428800 fft length = 2621440
iteration = 21 < 1000 && err = 0.25 >= 0.25, increasing n from 2621440

start M52428800 fft length = 3145728
Iteration 10000 M( 52428800 )C, 0x3ceee1cc01747326, n = 3145728, CUDALucas v1.69 err = 0.05469 (9:09 real, 54.8493 ms/iter, ETA 798:30:51)
Iteration 20000 M( 52428800 )C, 0x9281347573ff62eb, n = 3145728, CUDALucas v1.69 err = 0.05469 (9:00 real, 53.9812 ms/iter, ETA 785:43:32)

---

luigi@luigi-desktop:~/luigi/CUDA/cudaLucas/test/cudalucas.1.69$ ./CUDALucas -polite 0 78643200

start M78643200 fft length = 3932160
iteration = 20 < 1000 && err = 0.25 >= 0.25, increasing n from 3932160

start M78643200 fft length = 4194304
iteration = 25 < 1000 && err = 0.339844 >= 0.25, increasing n from 4194304

start M78643200 fft length = 4718592
Iteration 10000 M( 78643200 )C, 0x0a6f35cd25e82e0f, n = 4718592, CUDALucas v1.69 err = 0.07617 (13:12 real, 79.2440 ms/iter, ETA 1730:49:15)
Iteration 20000 M( 78643200 )C, 0x00dda91d63971fb3, n = 4718592, CUDALucas v1.69 err = 0.07617 (13:16 real, 79.5197 ms/iter, ETA 1736:37:17)
The timings were higher than with v1.3, and my computer was nearly unusable (with v1.3 there was no apparent slowdown).

Luigi
ET_ is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 10:00.


Mon Aug 2 10:00:35 UTC 2021 up 10 days, 4:29, 0 users, load averages: 1.18, 1.34, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.