![]() |
|
|
#12 | |
|
Serpentine Vermin Jar
Jul 2014
2×13×131 Posts |
Quote:
Otherwise if you're doing first time checks and the system is a bit flaky, we won't know for years (and eventually you'll land on the "spewers of junk" list that I or perhaps some future version of me will generate based on "probably bad" machines) ![]() I wouldn't bother mentioning it except you've already encountered the 0x2 residue loop, so just tweaking it enough to avoid that particular problem doesn't necessarily mean it's stable. Ultimately though there are probably a lot of people who just run first-time checks and are blissfully unaware of any lurking problems, so it's good you were paying attention and saw an issue and dealt with it... kudos. |
|
|
|
|
|
|
#13 |
|
Dec 2012
111002 Posts |
Yeah, I think this is a great idea. I kicked off a first check, but is there any way I can save the progress on it, stop it, kick off a couple of double checks and then resume the initial check? Or alternatively do I consign the current work to the bin?
|
|
|
|
|
|
#14 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
1015810 Posts |
Stop CUDALucas. Add the double-check(s) to the top of worktodo.txt, save, and restart the program. When the DCs finish, the first time check will resume where it left off.
|
|
|
|
|
|
#15 |
|
Dec 2012
22·7 Posts |
|
|
|
|
|
|
#16 | |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Quote:
Although it is of course very reasonable, and we have ample evidence that consumer GPUs have memory problems of some sort, I take it entirely on your word that they're deliberately shipped that way because it doesn't matter.
|
|
|
|
|
|
|
#17 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
I didn't see identification of the GPU model you're using. Some have known issues with certain CUDALucas settings. You'll need to avoid those. Standard operating procedure for bringing a new GPU card on line in CUDALucas ought be something like the following, in approximate time sequence: Get the May 2.06beta. It includes code to check for systematic errors that most builds of 2.05.1 or earlier did not, on Windows at least. Set up to run the following in sequence, with redirection of screen output to a text file. Run -memtest on as many of the 25MB blocks as you can get it to run on. Not just 10 or 20 starting from the low address. For modern cards this may be well over 100 blocks. Try (nominal GB of the card, times 1024)/25 -4 as a starting point. Example: 3GB card, 3072/25-4= 122.88-4=~119. If it's too much, it will complain, reduce it by one and retry. Check whether your GPU is a model known to have issues with 1024 threads. Or 32 threads. Some GPU models are not compatible with running with 1024 square threads. Yet -threadbench will run 1024 on these and often pick 1024 as the fastest. (That's probably because some call somewhere fails, so steps in the iterations get skipped.) See the bug and wish list at http://www.mersenneforum.org/showpos...postcount=2618 Run -fftbench over the range of lengths you expect to use in a year, expanding the start and end to the powers of two bracketing that range. For example, if you expect to run exponents from 40M double checks to 100M first time checks, that would be 2160k to 5600k fft length, so run 2048k-8192k. Check that the iteration times are generally increasing with the fft length. If longer fft lengths are yielding faster iteration times than at half the fft length, there's a problem that needs to be circumvented. Rerunning -fftbench after addressing the problem is recommended. Run -threadbench. Check that the iteration times are fairly consistent within an fft length. Normal variations within an fft length are modest, often under 10 percent. If the iteration times show large differences, there's a problem that needs to be circumvented. That may involve using the mask bits to prevent use of 1024 or 32 thread counts. And rerunning -threadbench to create a good threads file. Run it with " -r 1" option to check many residues through 10,000 iterations. Note that this test has limitations. It does not include any residue tests for fft lengths larger than 8192k currently. (Optionally: do -fftbench, -threadbench, once for each CUDA level & bitness combination executable, saving between runs with file names to label them as to version, if you want to extract the last several percent of performance out of your GPU. Different CUDA levels & bitness benchmark fastest on a given card depending on the fftlength and GPU model.) Running a successful test on a small Mersenne prime is recommended and success is encouraging, but it uses a small fft length and does _not_ mean other fft lengths will be error-free. Running a double check successfully on a ~40M exponent that will contribute toward the overall GIMPS double check is good practice, and a match with another person's LL test residue is encouraging, but it does not necessarily mean larger exponents will also process correctly. Small or moderate exponents may run fine on a GPU card that has error-prone memory in the midrange or high end. Now, if those prior tests were all passed, eventually, is time to consider running a first-time exponent checked out from the manual assignments page. Remain alert to unexpected results of any kind. Faster than expected iteration time is a symptom of some problems. Repeating 64-bit residues having value 0x02, 0x00, or 0xfffffffffffffffd are also symptoms of problems. It's probably a good idea to retest memory annually, and note any mismatching residues and any patterns. Hardware ages and eventually fails. |
|
|
|
|
|
|
#18 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
283316 Posts |
wow! see? he only said he is paying in beer and see what a post you put!
|
|
|
|
|
|
#19 | |
|
Dec 2012
22×7 Posts |
Quote:
After a couple of bad results with a 970, I've just plugged in a new 1080Ti and chugged a M4x,xxx,xxx double check successfully with downclocked memory, but will go back and follow up with those tests you have described. Again, thank you. |
|
|
|
|
|
|
#20 |
|
Dec 2012
1C16 Posts |
I should add, the 1080ti completed the check in ~1.5 days. The card is impressive.
|
|
|
|
|
|
#21 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
172208 Posts |
Quote:
I get 43.2M double checks through a GTX1070 in 32-36 hours. Your 1080Ti should be noticeably faster than a 1070. I've found scaling for run time is ~2.03 power of exponent for CUDALucas. There's about a 10% variation in iteration time among different CUDA level and bitness executables and my impression, having benchmarked a lot, before having done a rigorous comprehensive table is, the fastest executable changes a bit depending on fft length and GPU model combination. (I haven't checked yet how long it would take to get throughput payback on the time invested in benchmarking.) |
|
|
|
|
|
|
#22 |
|
Dec 2012
2810 Posts |
ah I should have added that was during otherwise casual use, not on all the time. For the next M8x,xxx,xxx number I'll see what the total time estimated is and report back.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Prime95 vs. CUDALucas | storm5510 | Software | 10 | 2022-04-18 05:11 |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas gives all-zero residues | fivemack | GPU Computing | 4 | 2016-07-21 15:49 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| Primes in residual classes | Unregistered | Information & Answers | 6 | 2008-09-11 12:57 |