mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-06-12, 15:31   #12
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

1101010011102 Posts
Default

Quote:
Originally Posted by evoflash View Post
Quick update to say all looks good. Dropped GPU clock to base setting and bottomed out the memory clock, no errors so far. Thanks for advice.
FYI, do yourself (and all of us) a favor and do at least one or two double-checks when doing a bunch of tweaking and testing where you're not entirely confident that it will produce an error free run.

Otherwise if you're doing first time checks and the system is a bit flaky, we won't know for years (and eventually you'll land on the "spewers of junk" list that I or perhaps some future version of me will generate based on "probably bad" machines)

I wouldn't bother mentioning it except you've already encountered the 0x2 residue loop, so just tweaking it enough to avoid that particular problem doesn't necessarily mean it's stable.

Ultimately though there are probably a lot of people who just run first-time checks and are blissfully unaware of any lurking problems, so it's good you were paying attention and saw an issue and dealt with it... kudos.
Madpoo is offline   Reply With Quote
Old 2017-06-13, 09:53   #13
evoflash
 
Dec 2012

2810 Posts
Default

Quote:
Originally Posted by Madpoo View Post
FYI, do yourself (and all of us) a favor and do at least one or two double-checks when doing a bunch of tweaking and testing where you're not entirely confident that it will produce an error free run.
Yeah, I think this is a great idea. I kicked off a first check, but is there any way I can save the progress on it, stop it, kick off a couple of double checks and then resume the initial check? Or alternatively do I consign the current work to the bin?
evoflash is offline   Reply With Quote
Old 2017-06-13, 15:11   #14
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

100111101011102 Posts
Default

Quote:
Originally Posted by evoflash View Post
Yeah, I think this is a great idea. I kicked off a first check, but is there any way I can save the progress on it, stop it, kick off a couple of double checks and then resume the initial check? Or alternatively do I consign the current work to the bin?
Stop CUDALucas. Add the double-check(s) to the top of worktodo.txt, save, and restart the program. When the DCs finish, the first time check will resume where it left off.
kladner is offline   Reply With Quote
Old 2017-06-13, 15:15   #15
evoflash
 
Dec 2012

22×7 Posts
Default

Quote:
Originally Posted by kladner View Post
Stop CUDALucas. Add the double-check(s) to the top of worktodo.txt, save, and restart the program. When the DCs finish, the first time check will resume where it left off.
I owe you beer.
evoflash is offline   Reply With Quote
Old 2017-06-13, 19:46   #16
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

11100001101012 Posts
Default

Quote:
Originally Posted by LaurV View Post
+1. You just explained very well and in fewer words what I am trying to say for a long time.
Probably because basically the entire information of that post was stuff I've picked up from you Although it is of course very reasonable, and we have ample evidence that consumer GPUs have memory problems of some sort, I take it entirely on your word that they're deliberately shipped that way because it doesn't matter.
Dubslow is offline   Reply With Quote
Old 2017-08-03, 19:17   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

782410 Posts
Default

Quote:
Originally Posted by evoflash View Post
Hello everyone, I'm starting out with CudaLucas, and ran a trial factor test to see if was working ok. So far so good. However when I ran a LL-double check I had residual of 0x0000000000000002 and received an error message at 100% which was confounding.

I put this down to high overclocking and restarted another check with a milder overclock that I use for stable gaming. I'm now getting patches of similar residue (see attachment), is this normal or am I wasting time/gpu watts?

Thanks for looking.
Lucas-Lehmer testing is particularly unforgiving of hardware error. As such, getting a configuration for which CUDALucas produces reliable results can also be useful for getting reliable results in other programs.

I didn't see identification of the GPU model you're using. Some have known issues with certain CUDALucas settings. You'll need to avoid those.

Standard operating procedure for bringing a new GPU card on line in CUDALucas ought be something like the following, in approximate time sequence:

Get the May 2.06beta. It includes code to check for systematic errors that most builds of 2.05.1 or earlier did not, on Windows at least.

Set up to run the following in sequence, with redirection of screen output to a text file.

Run -memtest on as many of the 25MB blocks as you can get it to run on. Not just 10 or 20 starting from the low address. For modern cards this may be well over 100 blocks. Try (nominal GB of the card, times 1024)/25 -4 as a starting point. Example: 3GB card, 3072/25-4= 122.88-4=~119. If it's too much, it will complain, reduce it by one and retry.

Check whether your GPU is a model known to have issues with 1024 threads. Or 32 threads. Some GPU models are not compatible with running with 1024 square threads. Yet -threadbench will run 1024 on these and often pick 1024 as the fastest. (That's probably because some call somewhere fails, so steps in the iterations get skipped.) See the bug and wish list at http://www.mersenneforum.org/showpos...postcount=2618

Run -fftbench over the range of lengths you expect to use in a year, expanding the start and end to the powers of two bracketing that range. For example, if you expect to run exponents from 40M double checks to 100M first time checks, that would be 2160k to 5600k fft length, so run 2048k-8192k. Check that the iteration times are generally increasing with the fft length. If longer fft lengths are yielding faster iteration times than at half the fft length, there's a problem that needs to be circumvented. Rerunning -fftbench after addressing the problem is recommended.

Run -threadbench. Check that the iteration times are fairly consistent within an fft length. Normal variations within an fft length are modest, often under 10 percent. If the iteration times show large differences, there's a problem that needs to be circumvented. That may involve using the mask bits to prevent use of 1024 or 32 thread counts. And rerunning -threadbench to create a good threads file.

Run it with " -r 1" option to check many residues through 10,000 iterations. Note that this test has limitations. It does not include any residue tests for fft lengths larger than 8192k currently.

(Optionally: do -fftbench, -threadbench, once for each CUDA level & bitness combination executable, saving between runs with file names to label them as to version, if you want to extract the last several percent of performance out of your GPU. Different CUDA levels & bitness benchmark fastest on a given card depending on the fftlength and GPU model.)

Running a successful test on a small Mersenne prime is recommended and success is encouraging, but it uses a small fft length and does _not_ mean other fft lengths will be error-free.

Running a double check successfully on a ~40M exponent that will contribute toward the overall GIMPS double check is good practice, and a match with another person's LL test residue is encouraging, but it does not necessarily mean larger exponents will also process correctly. Small or moderate exponents may run fine on a GPU card that has error-prone memory in the midrange or high end.

Now, if those prior tests were all passed, eventually, is time to consider running a first-time exponent checked out from the manual assignments page.

Remain alert to unexpected results of any kind. Faster than expected iteration time is a symptom of some problems. Repeating 64-bit residues having value 0x02, 0x00, or 0xfffffffffffffffd are also symptoms of problems.

It's probably a good idea to retest memory annually, and note any mismatching residues and any patterns. Hardware ages and eventually fails.
kriesel is online now   Reply With Quote
Old 2017-08-06, 18:04   #18
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

wow! see? he only said he is paying in beer and see what a post you put!
LaurV is offline   Reply With Quote
Old 2017-08-15, 09:10   #19
evoflash
 
Dec 2012

348 Posts
Default

Quote:
Originally Posted by kriesel View Post
Lucas-Lehmer testing is particularly unforgiving of hardware error. As such, getting a configuration for which CUDALucas produces reliable results can also be useful for getting reliable results in other programs.

I didn't see identification of the GPU model you're using. Some have known issues with certain CUDALucas settings. You'll need to avoid those.

Standard operating procedure for bringing a new GPU card on line in CUDALucas ought be something like the following, in approximate time sequence:

Get the May 2.06beta. It includes code to check for systematic errors that most builds of 2.05.1 or earlier did not, on Windows at least.

Set up to run the following in sequence, with redirection of screen output to a text file.

Run -memtest on as many of the 25MB blocks as you can get it to run on. Not just 10 or 20 starting from the low address. For modern cards this may be well over 100 blocks. Try (nominal GB of the card, times 1024)/25 -4 as a starting point. Example: 3GB card, 3072/25-4= 122.88-4=~119. If it's too much, it will complain, reduce it by one and retry.

Check whether your GPU is a model known to have issues with 1024 threads. Or 32 threads. Some GPU models are not compatible with running with 1024 square threads. Yet -threadbench will run 1024 on these and often pick 1024 as the fastest. (That's probably because some call somewhere fails, so steps in the iterations get skipped.) See the bug and wish list at http://www.mersenneforum.org/showpos...postcount=2618

Run -fftbench over the range of lengths you expect to use in a year, expanding the start and end to the powers of two bracketing that range. For example, if you expect to run exponents from 40M double checks to 100M first time checks, that would be 2160k to 5600k fft length, so run 2048k-8192k. Check that the iteration times are generally increasing with the fft length. If longer fft lengths are yielding faster iteration times than at half the fft length, there's a problem that needs to be circumvented. Rerunning -fftbench after addressing the problem is recommended.

Run -threadbench. Check that the iteration times are fairly consistent within an fft length. Normal variations within an fft length are modest, often under 10 percent. If the iteration times show large differences, there's a problem that needs to be circumvented. That may involve using the mask bits to prevent use of 1024 or 32 thread counts. And rerunning -threadbench to create a good threads file.

Run it with " -r 1" option to check many residues through 10,000 iterations. Note that this test has limitations. It does not include any residue tests for fft lengths larger than 8192k currently.

(Optionally: do -fftbench, -threadbench, once for each CUDA level & bitness combination executable, saving between runs with file names to label them as to version, if you want to extract the last several percent of performance out of your GPU. Different CUDA levels & bitness benchmark fastest on a given card depending on the fftlength and GPU model.)

Running a successful test on a small Mersenne prime is recommended and success is encouraging, but it uses a small fft length and does _not_ mean other fft lengths will be error-free.

Running a double check successfully on a ~40M exponent that will contribute toward the overall GIMPS double check is good practice, and a match with another person's LL test residue is encouraging, but it does not necessarily mean larger exponents will also process correctly. Small or moderate exponents may run fine on a GPU card that has error-prone memory in the midrange or high end.

Now, if those prior tests were all passed, eventually, is time to consider running a first-time exponent checked out from the manual assignments page.

Remain alert to unexpected results of any kind. Faster than expected iteration time is a symptom of some problems. Repeating 64-bit residues having value 0x02, 0x00, or 0xfffffffffffffffd are also symptoms of problems.

It's probably a good idea to retest memory annually, and note any mismatching residues and any patterns. Hardware ages and eventually fails.
This should be a sticky post for first time users! Thank you.

After a couple of bad results with a 970, I've just plugged in a new 1080Ti and chugged a M4x,xxx,xxx double check successfully with downclocked memory, but will go back and follow up with those tests you have described.

Again, thank you.
evoflash is offline   Reply With Quote
Old 2017-08-15, 09:11   #20
evoflash
 
Dec 2012

22×7 Posts
Default

I should add, the 1080ti completed the check in ~1.5 days. The card is impressive.
evoflash is offline   Reply With Quote
Old 2017-08-16, 17:46   #21
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by evoflash View Post
I should add, the 1080ti completed the check in ~1.5 days. The card is impressive.
You're welcome, and I come at these questions from the new user perspective because I am one in the gpu terrain.

I get 43.2M double checks through a GTX1070 in 32-36 hours. Your 1080Ti should be noticeably faster than a 1070. I've found scaling for run time is ~2.03 power of exponent for CUDALucas. There's about a 10% variation in iteration time among different CUDA level and bitness executables and my impression, having benchmarked a lot, before having done a rigorous comprehensive table is, the fastest executable changes a bit depending on fft length and GPU model combination. (I haven't checked yet how long it would take to get throughput payback on the time invested in benchmarking.)
kriesel is online now   Reply With Quote
Old 2017-11-13, 12:04   #22
evoflash
 
Dec 2012

111002 Posts
Default

ah I should have added that was during otherwise casual use, not on all the time. For the next M8x,xxx,xxx number I'll see what the total time estimated is and report back.
evoflash is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 vs. CUDALucas storm5510 Software 10 2022-04-18 05:11
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas gives all-zero residues fivemack GPU Computing 4 2016-07-21 15:49
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
Primes in residual classes Unregistered Information & Answers 6 2008-09-11 12:57

All times are UTC. The time now is 15:00.


Fri Jul 7 15:00:28 UTC 2023 up 323 days, 12:29, 0 users, load averages: 1.33, 1.18, 1.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔