mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-02-07, 11:31   #298
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

101101001012 Posts
Default

Quote:
Originally Posted by preda View Post
In the other direction, prime95 can be used to double-check a type-4 result from GpuOwl (because prime95 is able to do all the types).

So ideally, GpuOwl would be used for the initial PRP. If a double-check is needed, it should be done on prime95.
I've already suggested:

Quote:
Originally Posted by R. Gerbicz View Post
It would be easier to calculate also in all 5 types the a^(mp-1) mod mp residue (res64 or better res2048), that means 0-1-very few more iterations. As I can remember the gpuowl is calculating also this residue with a=3.
Note that in the calculation the modulus=k*b^n+c is independent from the type, because mod reduction by (k*b^n+c)/d is more costly for d>1, so even for type where d>1 there is only one division at the end.
And after that you reached the (N*{1,known_factors}+const)/{1,2} in exponent then with few more iterations you can reach k*b^n+c (or precisly k*b^n+c-1 if you want a Fermat residue).

Note that for Mersenne: a^(mp-1) mod mp is the same as type 1 with forcing d=1 (even when we know some factors of 2^p-1).

This is still not foolproof if you are using different base at PRP, but good when you are using different type value(s).

Quote:
Originally Posted by preda View Post
In GpuOwl I simply extend the computation past the end to the first multiple of blockSize (L). If that passes, the end was validated. (the residue is extracted at the right spot "end", but the block is continued past that point).
Yes, that is also good.

Last fiddled with by R. Gerbicz on 2019-02-07 at 11:51 Reason: correction
R. Gerbicz is offline   Reply With Quote
Old 2019-02-07, 15:41   #299
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7·1,051 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
Thought that I know what type4, but in undoc.txt in p95:
Code:
4 = 64-bit residue of a^((N-c+1)/2), only available if b=2
but for Mersenne number N=2^p-1, so c=-1 --> N-c+1 is odd, so the exponent is not an integer...
Doc bug, s.b.

Code:
4 = 64-bit residue of a^((N+1)/2), only available if b=2
Prime95 is offline   Reply With Quote
Old 2019-02-07, 15:41   #300
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7·19·37 Posts
Default

Quote:
Originally Posted by Prime95 View Post
We can never completely eliminate all potential problems. For example, a hardware error could corrupt the instruction pointer such that prime95 jumps straight to the "report the residue" code.
Yes, the achievable optimal is above zero, and there's also a balance to be struck between reliability and performance. The more code there is between start and finish, the more opportunity for error either in code or hardware/time. One could compute everything twice and compare, and the compare could fail somehow.

"But who guards against the guards?" (some ancient Roman if I recall correctly)
Analysis of optimum reliability testing (NASA; testing cycles on hardware use up a portion of the service life, improving detection of early problems but bringing eventual failure closer. Eventually additional testing cycles reduce future reliability. Astronauts want them to stop testing before that.)
"The leading cause of forest fires is trees." (Pat Paulsen, on LaughIn)

Thanks for your continuing efforts to make it faster and reliable. (Seems sorta Sisyphean.)
kriesel is offline   Reply With Quote
Old 2019-02-07, 17:52   #301
simon389
 
Aug 2013

3·29 Posts
Default

Update: on one machine I’ve lowered the AVX512 clock speed using the “AVX3 offset” in the BIOS, and have yet to get a stable AIDA64 result. Even at 3.8Ghz, which is 300Mhz lower than the 4.1Ghx I was originally at. So now I’ve upped the voltage 1.1 (and made it “override” instead of “adaptive”) and lowered the multiplier to 37 (ie the GHz to 3.7). Amazed I STILL can’t get a stable AVX512 setting but non AVX @ 4.1Ghz is no problem. Maybe there’s a setting I’m missing.

On the machine that previously gave bad PRP doublechecks (though I think all four machines have that capability) I’ve edited prime.txt to add InterimFiles=1000000 and GerbiczVerbosity=3. Otherwise everything is the exact same and I have little doubt the error will repeat itself. Currently testing 78410041. I will keep the machine on and hopefully the screen gives some helpful info.

George, I will send you the file soon. Thanks.

Simon
simon389 is offline   Reply With Quote
Old 2019-02-07, 18:00   #302
GP2
 
GP2's Avatar
 
Sep 2003

50278 Posts
Default

Quote:
Originally Posted by simon389 View Post
On the machine that previously gave bad PRP doublechecks (though I think all four machines have that capability) I’ve edited prime.txt to add InterimFiles=1000000 and GerbiczVerbosity=3.
In addition to InterimFiles, it might also be useful to turn on InterimResidues=1000000

In that case, assuming your PRP test of 78410041 mismatches with the first test, then a triple check could also be run with InterimResidues and the point of divergence could be identified.
GP2 is offline   Reply With Quote
Old 2019-02-07, 18:05   #303
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

19·499 Posts
Default

Quote:
Originally Posted by simon389 View Post
On the machine that previously gave bad PRP doublechecks (though I think all four machines have that capability) I’ve edited prime.txt to add InterimFiles=1000000 and GerbiczVerbosity=3. Otherwise everything is the exact same and I have little doubt the error will repeat itself.
Please forgive me if I'm telling you how to chew gum, but I've spent a lot of my life "making friends" with kit. Sometimes said kit was uncooperative...

My advice: Keep a log-book where you write down everything you change. Label all your kit with an identification number, right down to individual RAM modules.

Lastly, where possible, try to change only one thing at a time (including firmware/software settings). And, give it time... Problems often don't manifest as quickly as one might like....
chalsall is online now   Reply With Quote
Old 2019-02-07, 18:06   #304
Mysticial
 
Mysticial's Avatar
 
Sep 2016

331 Posts
Default

Quote:
Originally Posted by simon389 View Post
Update: on one machine I’ve lowered the AVX512 clock speed using the “AVX3 offset” in the BIOS, and have yet to get a stable AIDA64 result. Even at 3.8Ghz, which is 300Mhz lower than the 4.1Ghx I was originally at. So now I’ve upped the voltage 1.1 (and made it “override” instead of “adaptive”) and lowered the multiplier to 37 (ie the GHz to 3.7). Amazed I STILL can’t get a stable AVX512 setting but non AVX @ 4.1Ghz is no problem. Maybe there’s a setting I’m missing.

On the machine that previously gave bad PRP doublechecks (though I think all four machines have that capability) I’ve edited prime.txt to add InterimFiles=1000000 and GerbiczVerbosity=3. Otherwise everything is the exact same and I have little doubt the error will repeat itself. Currently testing 78410041. I will keep the machine on and hopefully the screen gives some helpful info.

George, I will send you the file soon. Thanks.

Simon
This might sound counter-intuitive, but try dropping the AVX speed as well.

There are some AVX512 instructions that aren't affected by the AVX512 offset. They will run at the AVX speed instead. These are called "light AVX512" instructions. (#4 on my earlier post)
Mysticial is offline   Reply With Quote
Old 2019-02-07, 18:11   #305
simon389
 
Aug 2013

5716 Posts
Default

Quote:
Originally Posted by Mysticial View Post
This might sound counter-intuitive, but try dropping the AVX speed as well.

There are some AVX512 instructions that aren't affected by the AVX512 offset. They will run at the AVX speed instead. These are called "light AVX512" instructions. (#4 on my earlier post)
I’m dropping all the speeds. I keep an eye on CPU Z and even with all speeds @ 3.8 (incl AVX and AVX512 @ 3.8) I cant get a stable system. Failed at 7.5 hours. Trying now with 3.7 ::shrug::

Last fiddled with by simon389 on 2019-02-07 at 18:12
simon389 is offline   Reply With Quote
Old 2019-02-07, 18:22   #306
Mysticial
 
Mysticial's Avatar
 
Sep 2016

331 Posts
Default

Quote:
Originally Posted by simon389 View Post
I’m dropping all the speeds. I keep an eye on CPU Z and even with all speeds @ 3.8 (incl AVX and AVX512 @ 3.8) I cant get a stable system. Failed at 7.5 hours. Trying now with 3.7 ::shrug::
wtf...

Ok, that makes things a lot more complicated... as it likely throws out the CPU AVX512-instability hypothesis. I have seen one case on my own hardware where a memory instability would only be exposed with an AVX workload. But as I've only seen it on one piece of hardware (albeit many times - enough for me to track it down), I can't say much more about it.

How memory-intensive is the AIDA64 AVX512 workload? How memory-intensive is PRP (I'm guessing just as intensive as LL)?

Have you tried other AVX512 workloads? If so, are there any that are stable?

Last fiddled with by Mysticial on 2019-02-07 at 18:23
Mysticial is offline   Reply With Quote
Old 2019-02-07, 19:07   #307
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101000010012 Posts
Default

Quote:
Originally Posted by simon389 View Post
I’m dropping all the speeds. I keep an eye on CPU Z and even with all speeds @ 3.8 (incl AVX and AVX512 @ 3.8) I cant get a stable system. Failed at 7.5 hours. Trying now with 3.7 ::shrug::
What is, if any, the commonality of the misbehaving systems?

Exact same MB model and/or manufacturer? CPU model? RAM?

You seem to be in a bit of an unique situation in that you have spent a reasonable amount of coin on multiple machines, but don't have stability on many (all?) of them.

Figuring out why is valuable; both to you, and others.
chalsall is online now   Reply With Quote
Old 2019-02-07, 19:15   #308
Mysticial
 
Mysticial's Avatar
 
Sep 2016

331 Posts
Default

Quote:
Originally Posted by chalsall View Post
What is, if any, the commonality of the misbehaving systems?

Exact same MB model and/or manufacturer? CPU model? RAM?

You seem to be in a bit of an unique situation in that you have spent a reasonable amount of coin on multiple machines, but don't have stability on many (all?) of them.

Figuring out why is valuable; both to you, and others.
I didn't realize there were multiple of these machines!

I guess the obvious questions/clarifications are:
  • Are all the systems identical? (same CPU, mobo, memory, etc...)
  • Are all the systems unstable?
  • If they are unstable, are they all unstable in the same way? (AIDA64 AVX512 + Prime95 PRP?)
Mysticial is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 02:03.

Fri Mar 5 02:03:23 UTC 2021 up 91 days, 22:14, 0 users, load averages: 1.74, 1.69, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.