mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2017-09-13, 22:24   #34
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by Prime95 View Post
BTW, you can turn off Jacobi checking, see undoc.txt. This might be a good idea for highly-reliable Amazon servers.
If the overhead is 0.07%, as estimated in undoc.txt, then it hardly seems worthwhile.

BTW, I would strongly argue that the Jacobi check of at least the final result should be mandatory, regardless of any parameter setting. Maybe even remove the JacobiErrorChecking=0 or 1 parameter altogether and just let people go ahead and set JacobiErrorCheckingInterval=N to some arbitrarily high value if they're really so inclined. But still do that final check.

The same people who overclock without due care and tip their machines into unreliability will probably be the ones most likely to turn off Jacobi checking just for the sake of it, without stopping to consider whether there's any meaningful benefit.

PS,
JacobiErrorCheckingInterval=N is documented twice in undoc.txt for some reason.

Last fiddled with by GP2 on 2017-09-13 at 22:27
GP2 is offline   Reply With Quote
Old 2017-09-13, 22:31   #35
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Does relaunching read from a save file? If so, prime95 runs a Jacobi check and will let that save file become a .bu3 or .bu4.
Yes, during the relaunch process a script runs to look for orphaned work directories in the EFS network file system, and if one is found then it is "adopted", otherwise a new work directory would be created and it would grab new assignments from PrimeNet in the usual way.

However, none of the timestamps indicated in the previous messages correspond to the time of a reboot or relaunch.

I will log savefile timestamps over the next few days to see if any patterns emerge.
GP2 is offline   Reply With Quote
Old 2017-09-13, 23:22   #36
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×11×137 Posts
Default

Quote:
Originally Posted by GP2 View Post
However, none of the timestamps indicated in the previous messages correspond to the time of a reboot or relaunch.
It would correspond to the time the last save file was written before being "adopted".
Prime95 is online now   Reply With Quote
Old 2017-09-13, 23:44   #37
Gordon
 
Gordon's Avatar
 
Nov 2008

1111101012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Prime95 version 29.3 build 1 is available.

[snip]

2) The GCD step in P-1 and ECM factoring is faster.
I'm doing ECM stage 1 only in P95, the GCD already takes 0.000 seconds....
Gordon is offline   Reply With Quote
Old 2017-09-14, 04:35   #38
bayanne
 
bayanne's Avatar
 
"Tony Gott"
Aug 2002
Yell, Shetland, UK

5148 Posts
Default

Quote:
Originally Posted by Prime95 View Post
You are my Mac guinea pig.

The .dylib is included. Try renaming it from libgmp.dylib to libgmp.10.dylib.
If that does not work, try copying it to /usr/local/lib.

Let me know what works so we can update the readme file.
Renaming it and copying it to /usr/local/lib/ worked

The application is shown as v29.3,build 1
bayanne is offline   Reply With Quote
Old 2017-09-14, 05:09   #39
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

22×3×163 Posts
Default Event ID: 1000

Code:
Faulting application name: prime95.exe, version: 29.3.1.0, time stamp: 0x59b15479
Faulting module name: prime95.exe, version: 29.3.1.0, time stamp: 0x59b15479
Exception code: 0xc0000005
Fault offset: 0x0000000001f6a6e0
Faulting process ID: 0x86c
Faulting application start time: 0x01d32d1291406a4c
Faulting application path: C:\Prime95_293B1\prime95.exe
Faulting module path: C:\Prime95_293B1\prime95.exe
Report ID: d0d836b0-5575-42f2-991d-43697e1aefa9
Faulting package full name: 
Faulting package-relative application ID:
I found v29.3 stopped a short time ago. The message in front didn't indicate much. I found the above in the Application Log Files using the Event Viewer.
storm5510 is offline   Reply With Quote
Old 2017-09-14, 11:52   #40
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally Posted by Prime95 View Post
It would correspond to the time the last save file was written before being "adopted".
That's simply not the case. I will give some examples below.

Note: all of these instances were launched back on August 23/24 (the AWS console displays the instance launch times) and all were rebooted at 06:55 UTC on Sep 11, and mprime 29.3 has been running continuously on all instances since then.

These servers do nothing but run mprime, so the uptime matches the CPU time used by mprime (it's a one-core virtual machine). Here is some typical output:

Code:
$ uptime
 11:21:50 up 3 days,  4:26,  1 user,  load average: 1.00, 1.00, 1.00
$ ps -C mprime
  PID TTY          TIME CMD
 2684 ?        3-04:25:55 mprime
Here are a couple of examples that I've been tracking. Each example has four snapshots in reverse chronological order (most recent data comes first). I have dozens more examples.

Code:
p42Q9543: [Sep 14 10:29] LL (mprime): 68111035 / 76429543 [89.12%]
p42Q9543.bu: [Sep 14 09:59] LL (mprime): 68030908 / 76429543 [89.01%]
p42Q9543.bu2: [Sep 14 09:29] LL (mprime): 67950775 / 76429543 [88.91%]
p42Q9543.bu3: [Sep 14 03:59] LL (mprime): 67071461 / 76429543 [87.76%]
p42Q9543.bu4: [Sep 13 23:54] LL (mprime): 66433192 / 76429543 [86.92%]

p42Q9543: [Sep 14 00:29] LL (mprime): 66511869 / 76429543 [87.02%]
p42Q9543.bu: [Sep 13 23:59] LL (mprime): 66433193 / 76429543 [86.92%]
p42Q9543.bu2: [Sep 13 23:54] LL (mprime): 66433192 / 76429543 [86.92%]
p42Q9543.bu3: [Sep 13 15:59] LL (mprime): 65162359 / 76429543 [85.26%]
p42Q9543.bu4: [Sep 13 03:59] LL (mprime): 63248529 / 76429543 [82.75%]

p42Q9543: [Sep 13 23:29] LL (mprime): 66365194 / 76429543 [86.83%]
p42Q9543.bu: [Sep 13 22:59] LL (mprime): 66284888 / 76429543 [86.73%]
p42Q9543.bu2: [Sep 13 22:29] LL (mprime): 66204596 / 76429543 [86.62%]
p42Q9543.bu3: [Sep 13 15:59] LL (mprime): 65162359 / 76429543 [85.26%]
p42Q9543.bu4: [Sep 13 03:59] LL (mprime): 63248529 / 76429543 [82.75%]

p42Q9543: [Sep 13 18:59] LL (mprime): 65642653 / 76429543 [85.89%]
p42Q9543.bu: [Sep 13 18:29] LL (mprime): 65562329 / 76429543 [85.78%]
p42Q9543.bu2: [Sep 13 17:59] LL (mprime): 65482047 / 76429543 [85.68%]
p42Q9543.bu3: [Sep 13 15:59] LL (mprime): 65162359 / 76429543 [85.26%]
p42Q9543.bu4: [Sep 13 03:59] LL (mprime): 63248529 / 76429543 [82.75%]
Note that the timestamps of Sep 13 and 14 are long after the last reboot on Sep 11.

In the older snapshots at the bottom there is a 12-hour interval between .bu3 and .bu4, but then we see there was a savefile written at Sep 13 23:54 even though this is out of sync with the 30-minute interval for DiskWriteTime, and another savefile gets written five minutes later at Sep 13 23:59. In the most recent snapshot at the top, this 23:54 savefile has become .bu4 and the interval between .bu3 and .bu4 is no longer 12 hours.

Oddly, although there is a five-minute difference between the 23:54 and 23:59 savefiles, they differ by only one iteration! I don't know if this is some strange behavior of the network file system, or if mprime is doing something weird. It normally wouldn't take five minutes to write a file of size 9.55 MB.


Code:
p8Q62549: [Sep 14 10:28] LL (mprime): 20166915 / 51862549 [38.89%]
p8Q62549.bu: [Sep 14 09:58] LL (mprime): 20047386 / 51862549 [38.65%]
p8Q62549.bu2: [Sep 14 09:28] LL (mprime): 19927848 / 51862549 [38.42%]
p8Q62549.bu3: [Sep 14 06:55] LL (mprime): 19315690 / 51862549 [37.24%]
p8Q62549.bu4: [Sep 13 18:55] LL (mprime): 16449750 / 51862549 [31.72%]

p8Q62549: [Sep 14 00:28] LL (mprime): 17778061 / 51862549 [34.28%]
p8Q62549.bu: [Sep 13 23:58] LL (mprime): 17658555 / 51862549 [34.05%]
p8Q62549.bu2: [Sep 13 23:28] LL (mprime): 17539225 / 51862549 [33.82%]
p8Q62549.bu3: [Sep 13 18:55] LL (mprime): 16449750 / 51862549 [31.72%]
p8Q62549.bu4: [Sep 13 06:55] LL (mprime): 13593769 / 51862549 [26.21%]

p8Q62549: [Sep 13 23:28] LL (mprime): 17539225 / 51862549 [33.82%]
p8Q62549.bu: [Sep 13 22:58] LL (mprime): 17419756 / 51862549 [33.59%]
p8Q62549.bu2: [Sep 13 22:28] LL (mprime): 17300385 / 51862549 [33.36%]
p8Q62549.bu3: [Sep 13 18:55] LL (mprime): 16449750 / 51862549 [31.72%]
p8Q62549.bu4: [Sep 13 06:55] LL (mprime): 13593769 / 51862549 [26.21%]

p8Q62549: [Sep 13 18:58] LL (mprime): 16464170 / 51862549 [31.75%]
p8Q62549.bu: [Sep 13 18:55] LL (mprime): 16449750 / 51862549 [31.72%]
p8Q62549.bu2: [Sep 13 18:28] LL (mprime): 16346077 / 51862549 [31.52%]
p8Q62549.bu3: [Sep 13 06:55] LL (mprime): 13593769 / 51862549 [26.21%]
p8Q62549.bu4: [Sep 13 02:54] LL (mprime): 12654927 / 51862549 [24.40%]
Again, note that the timestamps of Sep 13 and 14 are long after the last reboot on Sep 11.

Here, the oldest snapshot has .bu4 and .bu3 with a 4-hour interval (Sep 13 02:54 and Sep 13 06:55), and note the 3-minute interval between the p8Q62549 and p8Q62549.bu (Sep 13 18:58 and Sep 13 18:55), but in the later snapshots toward the top there is once again a 12-hour interval between .bu3 and .bu4
GP2 is offline   Reply With Quote
Old 2017-09-14, 19:35   #41
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

753510 Posts
Default

Ah! Did you get a roundoff > 0.4 error (check results.txt)? If so, prime95 can backtrack to last save file and execute a 5 minute pause (explaining only 1 iteration in 5 minutes).
Prime95 is online now   Reply With Quote
Old 2017-09-14, 21:14   #42
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Ah! Did you get a roundoff > 0.4 error (check results.txt)? If so, prime95 can backtrack to last save file and execute a 5 minute pause (explaining only 1 iteration in 5 minutes).
No roundoff error, but it does have

Code:
[Wed Sep 13 23:54:33 2017]
Iteration 66433192 / 76429543
FFTlen=2560K, Type=3, Arch=4, Pass1=128, Pass2=20480, clm=4 (1 core, 1 worker): 16.49 ms.  Throughput: 60.64 iter/sec.
FFTlen=2560K, Type=3, Arch=4, Pass1=128, Pass2=20480, clm=2 (1 core, 1 worker): 16.37 ms.  Throughput: 61.10 iter/sec.
... etc ...
That is, an iteration count is printed out, followed by a bunch of gwnum.txt -style benchmark data.

And more recently, the savefiles look like this:

Code:
p42Q9543: [Sep 14 20:59] LL (mprime): 69778606 / 76429543 [91.30%]
p42Q9543.bu: [Sep 14 20:54] LL (mprime): 69778605 / 76429543 [91.30%]
p42Q9543.bu2: [Sep 14 20:29] LL (mprime): 69710852 / 76429543 [91.21%]
p42Q9543.bu3: [Sep 14 15:59] LL (mprime): 68991898 / 76429543 [90.27%]
p42Q9543.bu4: [Sep 14 03:59] LL (mprime): 67071461 / 76429543 [87.76%]
and in results.txt there is:

Code:
[Thu Sep 14 20:54:33 2017]
Iteration 69778605 / 76429543
... gwnum-style benchmarking data ...
So I guess it pauses from time to time to do gwnum benchmarking, and writes a savefile before doing so? I run strategic doublechecks on exponents in all ranges, so it's certainly possible that this particular FFT length was not previously benchmarked.

Last fiddled with by GP2 on 2017-09-14 at 21:14
GP2 is offline   Reply With Quote
Old 2017-09-14, 21:32   #43
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011011112 Posts
Default

Very good. Mystery solved.

The benchmark interrupted normal processing. When it resumed a save file was read and Jacobi-checked. Once the save file was Jacobi-checked it became eligible for .bu3 and .bu4.
Prime95 is online now   Reply With Quote
Old 2017-09-14, 23:12   #44
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011011112 Posts
Default

I'll look at restarting the 12-hour Jacobi timer after a benchmark is run.
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 18:22.


Sun Aug 1 18:22:10 UTC 2021 up 9 days, 12:51, 0 users, load averages: 2.68, 2.84, 2.73

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.