mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data > Marin's Mersenne-aries

Closed Thread
 
Thread Tools
Old 2017-05-28, 03:31   #1464
nofaith628
 
nofaith628's Avatar
 
Feb 2017

2·17 Posts
Default

Taken unassigned exponents below 55M.
nofaith628 is offline  
Old 2017-05-28, 07:42   #1465
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2D7D16 Posts
Default

Quote:
Originally Posted by ewmayer View Post
The 4 expos ~53M I grabbed are roughly 1/3rd done, each running at 21 msec/iter 2-threaded on David Stanfill's AMD Ryzen @2816K for a total throughput of 190 iters/sec. I lost nearly a day because I started them off using the new LOACC carry-math option in the as-yet-unreleased Mlucas v17, which gains 5-7% speed on Intel, but not, as it turns out, AMD (at least not on Ryzen). That mode gave a worrisome number of ROEs during the first few Miters, so I rebuilt the carry modules using HIACC and started from scratch. Once the fresh runs passed the original set I verified that all of the residues-at-time-of-suspension of the LOACC runs matched the HIACC ones. The latter have suffered only a handful of ROEs > 0.4 through the first ~16M iterations:

ewmayer@RyzenBeast:~$ grep "MaxErr = 0.4" mlucas/run*/*at
mlucas/run0/p53647547.stat:[May 15 00:20:09] M53647547 Iter# = 4370000 [ 8.15% complete] clocks = 00:00:00.000 [ 0.0216 sec/iter] Res64: 941DDF47EDABCE91. AvgMaxErr = 0.212353957. MaxErr = 0.427734375.
mlucas/run0/p53647547.stat:[May 16 02:12:57] M53647547 Iter# = 8650000 [16.12% complete] clocks = 00:00:00.000 [ 0.0215 sec/iter] Res64: FC79A210D061517F. AvgMaxErr = 0.212396012. MaxErr = 0.421875000.
mlucas/run0/p53647547.stat:[May 17 17:24:50] M53647547 Iter# = 15060000 [28.07% complete] clocks = 00:00:00.000 [ 0.0222 sec/iter] Res64: F5B5F463C3975209. AvgMaxErr = 0.212299970. MaxErr = 0.429687500.
mlucas/run1/p53648423.stat:[May 15 13:54:59] M53648423 Iter# = 6720000 [12.53% complete] clocks = 00:00:00.000 [ 0.0213 sec/iter] Res64: 10F71B565CC23530. AvgMaxErr = 0.211920335. MaxErr = 0.417968750.
mlucas/run2/p53648893.stat:[May 15 07:47:09] M53648893 Iter# = 5660000 [10.55% complete] clocks = 00:00:00.000 [ 0.0216 sec/iter] Res64: AD13CD5BB99ED5DD. AvgMaxErr = 0.211574025. MaxErr = 0.406250000.
Just finished these DCs [53647547,53648423,53648893,53648981] on the Ryzen system - all 4 final residues mismatch those of the first-test submission.

However, while the expected roundoff error warnings (due to these exponents being right at over slightly above the 2816K FFT upper limit - I forced 2816K for all via command-line FFT-length specification, which overrides the default) seem benign in terms of number and size, my earlier grep pattern above failed to alert me to something worrisome occurring on this system, namely frequent (slightly more than one each 1M iterations, on average) instantly-fatal ROEs as demonstrated below, which I only discovered on manual inspection of the final run-status files:
Code:
[May 14 13:53:53] M53648981 Iter# = 2630000 [ 4.90% complete] clocks = 00:00:00.000 [  0.0212 sec/iter] Res64: 96D0B09F8743EBA5. AvgMaxErr = 0.213127022. MaxErr = 0.281250000.
[May 14 13:57:26] M53648981 Iter# = 2640000 [ 4.92% complete] clocks = 00:00:00.000 [  0.0213 sec/iter] Res64: 01D333BEF917C27C. AvgMaxErr = 0.213169920. MaxErr = 0.312500000.
M53648981 Roundoff warning on iteration  2644687, maxerr =   0.492187500000
 Retrying iteration interval to see if roundoff error is reproducible.
Restarting M53648981 at iteration = 2640000. Res64: 01D333BEF917C27C
M53648981: using FFT length 2816K = 2883584 8-byte floats.
 this gives an average   18.604965556751598 bits per digit
Retry of iteration interval with fatal roundoff error was successful.
[May 14 14:02:39] M53648981 Iter# = 2650000 [ 4.94% complete] clocks = 00:00:00.000 [  0.0213 sec/iter] Res64: 1D5D3BF1B65111A9. AvgMaxErr = 0.213237492. MaxErr = 0.312500000.
[May 14 14:06:12] M53648981 Iter# = 2660000 [ 4.96% complete] clocks = 00:00:00.000 [  0.0213 sec/iter] Res64: D66A5FF8E31B6ECC. AvgMaxErr = 0.212939164. MaxErr = 0.281250000.
Here, "Retry of iteration interval with fatal roundoff error was successful" means that the code went back to the most-recent checkpoint file, restarted from there, and failed to encounter the same ROE (or any other kind of fatal ROE) in the ensuing rerun of the 10000-iteration interval starting from said checkpoint. If any of these 0.5ish errors were simply due to an inadequate FFT length or "unlucky" [in terms of conspiring to give an anomalously high ROE] set of FFT inputs for the iteration in question, the same retry mechanism would have reproduced the error on the retry and as a result switched to the next-larger FFT length and restarted from the same last-checkpoint file using the larger length.

These kinds of errors of the out-of-nowhere-and-instantly-fatal are of the variety which I usually associate with marginal/flaky/old hardware, non of which seems to apply here - new high-end system, not overclocked. Perhaps the GPUs attached to the Ryzen mobo which David is doing his 24/7 crunching on are throwing transient glitches? I need to do more sleuthing to try to uncover the cause. While all the ensuing retry-interval attempts were successful like the above exemplar, my worry is that if such data-corruption issues are happening at all, not all of them may result in a detectable fatal ROE, i.e. some may be of the 'silent' variety w.r.to the program's internal data-integrity checks.

So would appreciate if someone could grab 'em and do a third run on each, if at all possible with the code used being set up to print interim Res64s every 10000 iterations (or some multiple thereof), permitting cross-checking against my interim-Res64 data to localize the point of divergence in the case your final results mismatch mine.

Last fiddled with by ewmayer on 2017-05-28 at 07:45
ewmayer is offline  
Old 2017-05-28, 16:00   #1466
GP2
 
GP2's Avatar
 
Sep 2003

1010000110012 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Just finished these DCs [53647547,53648423,53648893,53648981] on the Ryzen system - all 4 final residues mismatch those of the first-test submission.

...

So would appreciate if someone could grab 'em and do a third run on each, if at all possible with the code used being set up to print interim Res64s every 10000 iterations (or some multiple thereof), permitting cross-checking against my interim-Res64 data to localize the point of divergence in the case your final results mismatch mine.
OK, I will do the triple checks with interim residues.
GP2 is offline  
Old 2017-05-28, 21:08   #1467
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

5×17×137 Posts
Default

Quote:
Originally Posted by GP2 View Post
OK, I will do the triple checks with interim residues.
Thanks - here is text file of the 1m-iter Res64s from my 4 runs, for you to grep against as your runs pass 1m multiples. If we encounter a divergence, we can go fine-grained-compare in the preceding 1m iters:
Attached Files
File Type: bz2 1m.txt.bz2 (6.1 KB, 39 views)
ewmayer is offline  
Old 2017-05-29, 09:33   #1468
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Thanks - here is text file of the 1m-iter Res64s from my 4 runs, for you to grep against as your runs pass 1m multiples. If we encounter a divergence, we can go fine-grained-compare in the preceding 1m iters:
OK. I'm monitoring each milestone and at the 4M mark, all four exponents match so far.
GP2 is offline  
Old 2017-05-29, 11:46   #1469
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

40267771
Mismatched.
kladner is offline  
Old 2017-05-29, 13:33   #1470
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

10010110100002 Posts
Default

Quote:
Originally Posted by kladner View Post
40267771
Mismatched.
...and no real P-1 done as well...
ET_ is offline  
Old 2017-05-29, 15:35   #1471
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

293010 Posts
Default

Quote:
Originally Posted by kladner View Post
40267771
Mismatched.
Queued. Will start tonight.
Mark Rose is offline  
Old 2017-05-30, 20:02   #1472
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

7×11×43 Posts
Default

Here's a new list... its only exponents below 60M.

Details: broken down at a monthly level, it's any machine that doesn't have any good results for that month but its total bad is larger than its total good results. I've included those total bad/good in the list so you can see what I'm talking about (I don't think I've included those before when looking at the monthly stats for a cpu).

The list of exponents is the smallest unknown exponent from that cpu for that month. If they have multiple unknowns in the month and this first one comes back bad, then we can chase down the rest. Or maybe it matches and we can chalk up a "good" result for that computer and move on to the more promising "bad" stuff:

Code:
Exponent	Bad	Good	BadT	GoodT	Unk	worktodo
40545083	0	0	25	24	1	DoubleCheck=40545083,72,1
41245613	0	0	7	6	1	DoubleCheck=41245613,73,1
41395223	0	0	7	6	1	DoubleCheck=41395223,72,1
41922103	0	0	25	24	1	DoubleCheck=41922103,72,1
42202247	0	0	1	0	1	DoubleCheck=42202247,72,1
42971651	0	0	27	21	1	DoubleCheck=42971651,72,1
43209487	0	0	3	1	2	DoubleCheck=43209487,72,1
43246309	0	0	7	6	1	DoubleCheck=43246309,72,1
43339489	0	0	2	1	2	DoubleCheck=43339489,72,1
43767511	0	0	13	11	3	DoubleCheck=43767511,72,1
44607023	0	0	1	0	1	DoubleCheck=44607023,72,1
45445843	0	0	3	2	1	DoubleCheck=45445843,72,1
45910157	0	0	1	0	1	DoubleCheck=45910157,72,1
45938873	0	0	2	1	1	DoubleCheck=45938873,73,1
46009031	0	0	2	1	1	DoubleCheck=46009031,72,1
46082471	0	0	1	0	1	DoubleCheck=46082471,72,1
46129771	0	0	11	5	1	DoubleCheck=46129771,72,1
46891709	0	0	2	1	1	DoubleCheck=46891709,72,1
47111417	0	0	5	4	1	DoubleCheck=47111417,72,1
47205713	0	0	2	1	1	DoubleCheck=47205713,72,1
47538119	0	0	5	4	1	DoubleCheck=47538119,72,1
47944483	0	0	2	1	1	DoubleCheck=47944483,72,1
48188069	0	0	3	2	4	DoubleCheck=48188069,72,1
49428013	0	0	3	2	1	DoubleCheck=49428013,72,1
49560479	0	0	4	3	6	DoubleCheck=49560479,72,1
49592923	0	0	2	1	1	DoubleCheck=49592923,72,1
49868879	0	0	13	11	3	DoubleCheck=49868879,72,1
50074643	0	0	9	5	7	DoubleCheck=50074643,73,1
50110699	0	0	3	2	1	DoubleCheck=50110699,73,1
50473831	0	0	3	2	1	DoubleCheck=50473831,73,1
50535253	0	0	2	1	1	DoubleCheck=50535253,73,1
50602247	0	0	23	21	1	DoubleCheck=50602247,73,1
50662103	0	0	2	1	2	DoubleCheck=50662103,73,1
50683939	0	0	6	5	2	DoubleCheck=50683939,73,1
50902297	0	0	23	21	1	DoubleCheck=50902297,73,1
51076063	0	0	2	1	1	DoubleCheck=51076063,73,1
51090239	0	0	3	1	3	DoubleCheck=51090239,73,1
51458051	0	0	3	2	5	DoubleCheck=51458051,73,1
51833161	0	0	23	21	1	DoubleCheck=51833161,73,1
51851221	0	0	3	2	5	DoubleCheck=51851221,73,1
51951451	1	0	37	21	15	DoubleCheck=51951451,73,1
52147999	0	0	13	11	3	DoubleCheck=52147999,73,1
52335989	0	0	6	5	1	DoubleCheck=52335989,73,1
52445741	0	0	13	11	1	DoubleCheck=52445741,73,1
52573463	0	0	13	11	3	DoubleCheck=52573463,73,1
52600231	0	0	2	1	4	DoubleCheck=52600231,73,1
52633459	0	0	9	5	8	DoubleCheck=52633459,73,1
52717579	0	0	3	1	1	DoubleCheck=52717579,73,1
52857713	0	0	40	25	6	DoubleCheck=52857713,73,1
52875601	0	0	2	1	4	DoubleCheck=52875601,73,1
52917121	0	0	2	1	1	DoubleCheck=52917121,73,1
53026177	0	0	3	2	1	DoubleCheck=53026177,73,1
53524609	0	0	23	21	1	DoubleCheck=53524609,73,1
53717549	0	0	4	3	4	DoubleCheck=53717549,73,1
53737139	0	0	4	3	4	DoubleCheck=53737139,73,1
53741773	0	0	39	29	1	DoubleCheck=53741773,73,1
54121889	0	0	2	1	3	DoubleCheck=54121889,73,1
54134383	0	0	2	1	4	DoubleCheck=54134383,73,1
54264983	0	0	36	23	4	DoubleCheck=54264983,73,1
54441767	0	0	39	29	2	DoubleCheck=54441767,73,1
54872291	0	0	3	2	2	DoubleCheck=54872291,73,1
55006123	0	0	36	23	2	DoubleCheck=55006123,73,1
55061557	0	0	2	1	4	DoubleCheck=55061557,73,1
55100737	0	0	4	3	1	DoubleCheck=55100737,73,1
55297247	0	0	39	29	1	DoubleCheck=55297247,73,1
55298843	0	0	2	1	4	DoubleCheck=55298843,73,1
55299281	0	0	39	29	1	DoubleCheck=55299281,73,1
55449209	0	0	36	23	2	DoubleCheck=55449209,73,1
55684813	0	0	2	1	2	DoubleCheck=55684813,73,1
55816259	0	0	36	23	2	DoubleCheck=55816259,73,1
55956871	0	0	36	23	2	DoubleCheck=55956871,73,1
55983827	0	0	2	1	3	DoubleCheck=55983827,73,1
56004757	0	0	2	1	4	DoubleCheck=56004757,73,1
56088629	0	0	4	3	3	DoubleCheck=56088629,73,1
56566841	0	0	3	2	1	DoubleCheck=56566841,73,1
56604923	0	0	4	3	3	DoubleCheck=56604923,73,1
56644061	0	0	3	2	2	DoubleCheck=56644061,73,1
56736437	0	0	36	23	2	DoubleCheck=56736437,73,1
57091651	0	0	39	29	4	DoubleCheck=57091651,73,1
57143657	0	0	23	21	1	DoubleCheck=57143657,73,1
57255067	0	0	2	1	3	DoubleCheck=57255067,73,1
57374939	0	0	36	23	2	DoubleCheck=57374939,73,1
57404041	0	0	3	2	1	DoubleCheck=57404041,73,1
57423991	0	0	2	1	1	DoubleCheck=57423991,73,1
57435919	0	0	37	21	6	DoubleCheck=57435919,73,1
57580739	0	0	3	2	1	DoubleCheck=57580739,73,1
57607213	0	0	4	3	1	DoubleCheck=57607213,73,1
57655019	0	0	2	1	2	DoubleCheck=57655019,73,1
57885211	0	0	19	14	2	DoubleCheck=57885211,73,1
57995117	0	0	3	2	2	DoubleCheck=57995117,73,1
58351453	0	0	23	21	1	DoubleCheck=58351453,73,1
58382663	0	0	36	23	1	DoubleCheck=58382663,73,1
58463819	0	0	36	23	2	DoubleCheck=58463819,73,1
59332439	0	0	4	3	4	DoubleCheck=59332439,73,1
59378471	0	0	39	29	4	DoubleCheck=59378471,73,1
59485343	0	0	2	1	1	DoubleCheck=59485343,75,1
59574169	0	0	4	3	3	DoubleCheck=59574169,73,1
59626481	0	0	3	2	2	DoubleCheck=59626481,73,1
Madpoo is offline  
Old 2017-05-30, 22:04   #1473
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2·7·13 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Here's a new list... its only exponents below 60M.
I took all below 50M.
rudi_m is offline  
Old 2017-05-30, 22:14   #1474
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

5·17·137 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Here's a new list... its only exponents below 60M.
Grabbed these:

57580739
57607213
57655019
57885211
57995117
58351453
58382663
58463819
ewmayer is offline  
Closed Thread



Similar Threads
Thread Thread Starter Forum Replies Last Post
Double-Double Arithmetic Mysticial Software 52 2021-04-23 06:51
Clicking an exponent leads to 404 page marigonzes Information & Answers 2 2017-02-14 16:56
x.265 half the size, double the computation; so if you double again? 1/4th? jasong jasong 7 2015-08-17 10:56
What about double-checking TF/P-1? 137ben PrimeNet 6 2012-03-13 04:01
Double the area, Double the volume. Uncwilly Puzzles 8 2006-07-03 16:02

All times are UTC. The time now is 08:24.


Tue Jul 27 08:24:39 UTC 2021 up 4 days, 2:53, 0 users, load averages: 2.07, 1.85, 1.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.