mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-01-08, 03:19   #133
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×107 Posts
Default

Quote:
Originally Posted by kladner View Post
Just to keep things separate, this post has a modified "p95 screen cap.zip", which is "p95 screen cap 02.zip" It has an additional P95 worker window capture, with the output of an additional hang pasted onto the end of p95 screen cap.txt. It also has the latest version of results.txt.
So, no trouble reproducing the issue I take it. What OS version, cpu type etc did these occur on, or is that shown in your attachments?
kriesel is offline   Reply With Quote
Old 2019-01-08, 12:31   #134
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by kriesel View Post
So, no trouble reproducing the issue I take it. What OS version, cpu type etc did these occur on, or is that shown in your attachments?
Windows 7 Pro, 64 bit, i7-6700K, 4300 MHz, 16GiB RAM, 3200 MHz, Asus SABERTOOTH-Z170-MARK-1
Let me know if there is other information that might be helpful.
kladner is offline   Reply With Quote
Old 2019-01-08, 20:52   #135
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default Bad residues from 29.5 for b=10 and some p=small

All builds of version 29.5 produce incorrect Type 1 residues for some small exponents for base 10.

I didn't test Type 5, since I think there were some other issues reported for them in 29.4.

Note that there is no Gerbicz error checking for bases other than Mersenne and Wagstaff.

The smallest exponent affected is 1543:

Code:
[Tue Jan  8 19:17:21 2019]
{"status":"C", "k":1, "b":10, "n":1543, "c":-1, "known-factors":"9", "worktype":"PRP-3",
"res64":"2FC2CEC17A45A7A7", "residue-type":1, "fft-length":256, "error-code":"00000000", "security-code":"0C0E0C0E", "program":{"name":"Prime95",
"version":"29.4", "build":5, "port":8}, "timestamp":"2019-01-08 19:17:21", ...}
[Tue Jan  8 19:17:58 2019]
{"status":"C", "k":1, "b":10, "n":1543, "c":-1, "known-factors":"9", "worktype":"PRP-3",
"res64":"2FC2CEC17A45A7A7", "residue-type":1, "fft-length":256, "error-code":"00000000", "security-code":"0C0E0C0E", "program":{"name":"Prime95",
"version":"29.4", "build":8, "port":8}, "timestamp":"2019-01-08 19:17:58", ...}

[Tue Jan  8 19:18:08 2019]
{"status":"C", "k":1, "b":10, "n":1543, "c":-1, "known-factors":"9", "worktype":"PRP-3",
"res64":"2997FDD526659E47", "residue-type":1, "res2048":"...", "fft-length":1024, "error-code":"00000000", "security-code":"0C0E0C0E", "program":{"name":"Prime95",
"version":"29.5", "build":2, "port":8}, "timestamp":"2019-01-08 19:18:08", ...}
[Tue Jan  8 19:18:46 2019]
{"status":"C", "k":1, "b":10, "n":1543, "c":-1, "known-factors":"9", "worktype":"PRP-3",
"res64":"2997FDD526659E47", "residue-type":1, "res2048":"...", "fft-length":1024, "error-code":"00000000", "security-code":"0C0E0C0E", "program":{"name":"Prime95",
"version":"29.5", "build":6, "port":8}, "timestamp":"2019-01-08 19:18:46", ...}
Very small exponents like this can be independently verified with, for instance, Python, and that shows that version 29.4 was correct:

Code:
# Type 1 PRP residue
def res64(a, b, p):
    print("{:X}".format(pow(a, (b**p - 1)//(b - 1) - 1, (b**p - 1)//(b - 1)) & ((1<<64) - 1)))

>>> res64(3, 10, 1543)
2FC2CEC17A45A7A7
This doesn't affect all small exponents. Below 1543 everything is good, from 1543 to 2861 everything is bad; 2879 is good; 2887 and 2897 are bad; 2903 is good; 2909, 2917, 2927, 2939, 2953 are bad; 2957 to at least 3000 is good. Beyond that I didn't test everything, but 12413 is bad.

Everything from 400k to 500k is good. I doubled checked the residues from 29.5 against the results of the repunit prime project, and got only three mismatches, but the triple check with 29.4 gave the same results as 29.5

The residues seem to be good when the non-standard PRP base = 2 is used, but I didn't test those extensively.

Apart from this, I also have been doing a very large number of type-5 Wagstaff PRP tests, currently in the 7M range, with the various builds of 29.5. These are Gerbicz error checked and there has never been an error reported.

Although very small exponents p and bases b other than Mersenne are not a priority, it's conceivable that the same issue might arise, although much more rarely, for larger exponents or for LL tests. I remember the "shift-count > p minus 64" bug which was much more likely to strike small exponents, but also affected one LL test for M37830997.

Last fiddled with by GP2 on 2019-01-08 at 20:56 Reason: reformatted to make relevant information stand out better
GP2 is offline   Reply With Quote
Old 2019-01-08, 22:58   #136
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

Here is the worktodo line to run Type 1 residues with base b=10

For exponent 1543, it's:

Code:
PRP=1,10,1543,-1,99,0,3,1,"9"
If you omit the ,99,0,3,1 section in the middle, it will do a Type 5 residue instead by default.

Changing the ,3, to ,2, would do a PRP test with base 2 instead of 3.
GP2 is offline   Reply With Quote
Old 2019-01-09, 01:01   #137
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×1,873 Posts
Default

Quote:
Originally Posted by GP2 View Post
Code:
PRP=1,10,1543,-1,99,0,3,1,"9"
Works for me! Assuming this is an AVX-512 CPU, prime95 should select a length 256 FMA3 FFT. Your output indicates a length 1024 FFT is chosen.
Prime95 is offline   Reply With Quote
Old 2019-01-09, 01:16   #138
GP2
 
GP2's Avatar
 
Sep 2003

259010 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Works for me! Assuming this is an AVX-512 CPU, prime95 should select a length 256 FMA3 FFT. Your output indicates a length 1024 FFT is chosen.
I am using p95v295b6.linux64.tar.gz straight from the download directory.

I guess this fix didn't make it into the Linux build?

Quote:
Originally Posted by Prime95 View Post
29.5 build 6.

1) Fixed testing tiny numbers with AVX-512 FFTs, numbers less than about 5120 bits are forced to use FMA FFTs.
For cksum I get:
Code:
cksum mprime
110777695 39985144 mprime
I also tried build 2 and got the same bad residue.

This is on a c5.xlarge instance on AWS.
Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz


I just tried it again:

Code:
[Main thread Jan 9 01:10] Mersenne number primality test program version 29.5
[Main thread Jan 9 01:10] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 1 MB, L3 cache size: 25344 KB
[Main thread Jan 9 01:10] Starting worker.
[Work thread Jan 9 01:10] Worker starting
[Work thread Jan 9 01:10] Setting affinity to run worker on CPU core #1
[Work thread Jan 9 01:10] Starting PRP test of (10^1543-1)/9 using AVX-512 FFT length 1K
[Work thread Jan 9 01:10] (10^1543-1)/9 is not prime.  RES64: 2997FDD526659E47. Wh8: 0C0E0C0E,00000000
[Work thread Jan 9 01:10] No work to do at the present time.  Waiting.
Code:
[Wed Jan  9 01:10:07 2019]
{"status":"C", "k":1, "b":10, "n":1543, "c":-1, "known-factors":"9", "worktype":"PRP-3", "res64":"2997FDD526659E47", "residue-type":1, "res2048":"...", "fft-length":1024, "error-code":"00000000", "security-code":"0C0E0C0E", "program":{"name":"Prime95", "version":"29.5", "build":6, "port":8}, "timestamp":"2019-01-09 01:10:07", ...}

Last fiddled with by GP2 on 2019-01-09 at 01:35
GP2 is offline   Reply With Quote
Old 2019-01-09, 01:22   #139
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×1,873 Posts
Default

Grasping at straws. Any chance the benchmark problem could be related to problems with BIOS patches for Specter, Meltdown, etc.

Ken, you seem to have the machine that can most easily reproduce the bug, would you be willing to try 1) disabling HT in the BIOS and trying again, 2) install the latest BIOS for your motherboard and retrying?

Dell Computers, for example, had this advice early last year:

Code:
Patch Guidance (update 2018-01-22):
Intel has communicated new guidance regarding "reboot issues and unpredictable system behavior" with the microcode included in the BIOS updates released to address Spectre (Variant 2), CVE-2017-5715. Dell is advising that all customers should not deploy the BIOS update for the Spectre (Variant 2) vulnerability at this time. We have removed the impacted BIOS updates from our support pages and are working with Intel on a new BIOS update that will include new microcode from Intel.

If you have already deployed the BIOS update, in order to avoid unpredictable system behavior, you can revert back to a previous BIOS version. See the tables below.

As a reminder, the Operating System patches are not impacted and still provide mitigation to Spectre (Variant 1) and Meltdown (Variant 3). The microcode update is only required for Spectre (Variant 2), CVE-2017-5715.
Prime95 is offline   Reply With Quote
Old 2019-01-09, 01:43   #140
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

Quote:
Originally Posted by Prime95 View Post
29.5 build 6.

1) Fixed testing tiny numbers with AVX-512 FFTs, numbers less than about 5120 bits are forced to use FMA FFTs.

...

Linux 64-bit: ftp://mersenne.org/gimps/p95v295b6.linux64.tar.gz
One hypothesis is that fix number 1) above somehow didn't make it into the Linux build.

However 10^12413-1 is much larger than 5120 bits. I think the exponent 16187 also gives a divergent result.

So maybe there is some other cause, or maybe the threshold needs to be considerably higher than 5120 bits.

Edit:
I'm pretty sure that's the answer, because ((10**1543-1)//9).bit_length() = 5123

The problem starts happening as soon as the 5120-bit threshold is exceeded. All the exponents smaller than 1543 were good.

And this confirms it:
Code:
[Main thread Jan 9 01:50] Mersenne number primality test program version 29.5
[Main thread Jan 9 01:50] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 1 MB, L3 cache size: 25344 KB
[Main thread Jan 9 01:50] Starting worker.
[Work thread Jan 9 01:50] Worker starting
[Work thread Jan 9 01:50] Setting affinity to run worker on CPU core #1
[Work thread Jan 9 01:50] Starting PRP test of (10^1531-1)/9 using FMA3 FFT length 256
[Work thread Jan 9 01:50] (10^1531-1)/9 is not prime.  RES64: 7D3E4C1A504AD151. Wh8: 0BF60BF6,00000000
[Work thread Jan 9 01:50] Starting PRP test of (10^1543-1)/9 using AVX-512 FFT length 1K
[Work thread Jan 9 01:50] (10^1543-1)/9 is not prime.  RES64: 2997FDD526659E47. Wh8: 0C0E0C0E,00000000
[Work thread Jan 9 01:50] No work to do at the present time.  Waiting.
1531 has a good residue, 1543 has a bad residue.

Last fiddled with by GP2 on 2019-01-09 at 01:52
GP2 is offline   Reply With Quote
Old 2019-01-09, 02:36   #141
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

165048 Posts
Default

Quote:
Originally Posted by GP2 View Post

However 10^12413-1 is much larger than 5120 bits. I think the exponent 16187 also gives a divergent result.
My mistake, I had CpuSupportsAVX512F=0 set for a different problem. I'll have a fix for build 7.
Prime95 is offline   Reply With Quote
Old 2019-01-09, 15:54   #142
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·107 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Grasping at straws. Any chance the benchmark problem could be related to problems with BIOS patches for Specter, Meltdown, etc.

Ken, you seem to have the machine that can most easily reproduce the bug, would you be willing to try 1) disabling HT in the BIOS and trying again, 2) install the latest BIOS for your motherboard and retrying?

Dell Computers, for example, had this advice early last year:

Code:
Patch Guidance (update 2018-01-22):
Intel has communicated new guidance regarding "reboot issues and unpredictable system behavior" with the microcode included in the BIOS updates released to address Spectre (Variant 2), CVE-2017-5715. Dell is advising that all customers should not deploy the BIOS update for the Spectre (Variant 2) vulnerability at this time. We have removed the impacted BIOS updates from our support pages and are working with Intel on a new BIOS update that will include new microcode from Intel.

If you have already deployed the BIOS update, in order to avoid unpredictable system behavior, you can revert back to a previous BIOS version. See the tables below.

  As a reminder, the Operating System patches are not impacted and still provide mitigation to Spectre (Variant 1) and Meltdown (Variant 3). The microcode update is only required for Spectre (Variant 2), CVE-2017-5715.
Lucky me, mine's "reliable". Thanks for digging into it. (And it's kind of spooky, that you can know, in Florida, to check the BIOS update state, while my laptop is in a house in Wisconsin, with the shades closed, displaying a notice about a BIOS update. ;)

Interestingly, that Dell advisory was about when my i7-7500U (not showing a stall issue) was getting a replacement cpu fan under warranty.

The i7-8750H is only a month old. On it, I am doing:
1) shut down all apps
2) check backups are current
3) apply pending Win 10 updates and restart
4) apply pending DELL BIOS update and restart
5) retest prime95 v29.5b6 benchmarking in Win 10 (still stalls)

6) test prime95 v29.4b8 benchmarking in Win 10 (running now, 1024k start, 1440k and continuing, end point 32768k)
7) attempt USB boot in Ubuntu 18 and v29.5b6 mprime benchmark
8) and maybe HT off in the BIOS and retry 6 & 7
9) any further suggestions in the interim
10) report an update
Attached Thumbnails
Click image for larger version

Name:	peregrine-postupdates-benchmark-hang.png
Views:	45
Size:	130.7 KB
ID:	19614  
kriesel is offline   Reply With Quote
Old 2019-01-09, 19:24   #143
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D4416 Posts
Default

Quote:
Originally Posted by kriesel View Post
4) apply pending DELL BIOS update and restart
5) retest prime95 v29.5b6 benchmarking in Win 10 (still stalls)
Yes, that was a long shot. Thanks for trying.

Did you ever check if 29.5b6 hangs in day-to-day normal use (1 or 3 workers with or without hyperthreading). I think it should hang there too.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 20:10.

Mon May 17 20:10:50 UTC 2021 up 39 days, 14:51, 0 users, load averages: 2.88, 3.29, 3.03

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.