![]() |
|
|
#859 |
|
"Seth"
Apr 2019
2×3×83 Posts |
Thanks for the quick double check, weird that this is happening only on my machine :shrug: computers.
I restarted the machine with a lower memory speed and after initially thinking it worked, I discovered I had set SumInputsErrorCheck=0 in prime.txt, SUMOUT error still reproduces. |
|
|
|
|
|
#860 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
|
|
|
|
|
|
#861 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17×487 Posts |
For now, do not use "SumInputsErrorCheck=1" with ECM.
|
|
|
|
|
|
#862 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
782110 Posts |
Prime95 v30.8b15, on Win7/ dual xeon E5645 ECC ram system:
One worker is having serious issues (other is fine). Some excerpts: comm window Code:
[Comm thread Feb 26 16:41:25] Sending expected completion date for M333043493: Jun 19 2023 [Comm thread Feb 26 16:41:25] Sending expected completion date for M74218931: Feb 28 2023 [Comm thread Feb 26 16:41:25] Done communicating with server. [Main thread Feb 27 10:52:54] In write_gwnum, unexpected gwtogiant failure, retcode -1 [Main thread Feb 27 11:22:54] In write_gwnum, unexpected gwtogiant failure, retcode -1 [Main thread Feb 27 11:52:54] In write_gwnum, unexpected gwtogiant failure, retcode -1 [Main thread Feb 27 12:22:54] In write_gwnum, unexpected gwtogiant failure, retcode -1 worker 1 window: Code:
[Mon Feb 27 10:52:54 2023] In write_gwnum, unexpected gwtogiant failure, retcode -1 Error writing intermediate file: p333043493 Errno: 34, Result too large DOSerrno: 2 [Mon Feb 27 11:22:54 2023] In write_gwnum, unexpected gwtogiant failure, retcode -1 Error writing intermediate file: p333043493 (continues for about a week, without successfully writing a save file.) Stop worker, rename p333043493 to wasp333043493, resume worker from p333043493.bu, cross fingers... Jacobi check passed during the restart. in Directory of C:\Users\ken\Documents\prime95x64 Code:
02/27/2023 10:22 AM 41,630,508 p333043493.bu 02/27/2023 09:52 AM 41,630,508 p333043493.bu2 02/27/2023 09:28 AM 41,630,508 p333043493.bu3 02/26/2023 09:28 PM 41,630,508 p333043493.bu4 02/27/2023 07:26 PM 41,630,508 wasp333043493 Same problem with restart from .bu2 also; Code:
[Mar 5 13:02:13] Worker starting [Mar 5 13:02:13] Setting affinity to run worker on CPU core #1 [Mar 5 13:02:15] Setting affinity to run helper thread 1 on CPU core #2 [Mar 5 13:02:15] Setting affinity to run helper thread 2 on CPU core #3 [Mar 5 13:02:15] Setting affinity to run helper thread 3 on CPU core #4 [Mar 5 13:02:15] Setting affinity to run helper thread 4 on CPU core #5 [Mar 5 13:02:15] Setting affinity to run helper thread 5 on CPU core #6 [Mar 5 13:02:15] Trying backup intermediate file: p333043493.bu2 [Mar 5 13:02:19] Running Jacobi error check. Passed. Time: 354.439 sec. [Mar 5 13:08:12] Resuming primality test of M333043493 using FFT length 18M, Pass1=4K, Pass2=4608, clm=4, 6 threads [Mar 5 13:08:12] Iteration: 214768550 / 333043493 [64.48%]. [Mar 5 13:11:54] Iteration: 214770000 / 333043493 [64.48%], ms/iter: 153.382, ETA: 209d 23:10 [Mar 5 13:37:45] Iteration: 214780000 / 333043493 [64.49%], ms/iter: 154.785, ETA: 211d 20:49 [Mar 5 14:02:13] Error writing intermediate file: p333043493 [Mar 5 14:02:13] Errno: 34, Result too large [Mar 5 14:02:13] DOSerrno: 2 [Mar 5 14:03:23] Iteration: 214790000 / 333043493 [64.49%], ms/iter: 153.811, ETA: 210d 12:24 [Mar 5 14:29:01] Iteration: 214800000 / 333043493 [64.49%], ms/iter: 153.839, ETA: 210d 12:54 [Mar 5 14:32:13] Error writing intermediate file: p333043493 [Mar 5 14:54:35] Iteration: 214810000 / 333043493 [64.49%], ms/iter: 153.362, ETA: 209d 20:49 [Mar 5 15:02:13] Error writing intermediate file: p333043493 [Mar 5 15:15:37] Error writing intermediate file: p333043493 [Mar 5 15:15:37] Stopping primality test of M333043493 at iteration 214818176 [64.50%] [Mar 5 15:15:37] Worker stopped. Temperatures of cpu cores and ram sticks look ok to me; all 78C or lower for max readings. Trying start from .bu3 now, also have a .bu4, not looking promising after all 3 of the later files failed. Copies saved of p333043493, and the .bu3, .bu4. If those don't work on that hardware either, I could try copying some over to another system/version/cpu-instruction-set & retry one more time. Last fiddled with by kriesel on 2023-03-05 at 22:00 |
|
|
|
|
|
#863 |
|
Dec 2022
3×132 Posts |
I can't deny this is a serious issue, but it seems you should be able to complete it (on some machine), if other attempts fail, by not writing any more save files. That can't be any worse than abandoning it, and it's what was happening, as you say, for a week before you restarted. (it would be nice if prime95 workers could be paused rather than stopped, retaining all current state in memory).
|
|
|
|
|
|
#864 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
32·11·79 Posts |
Re post 862: Maybe George would like to weigh in on what is happening there & here, or receive & examine a save file.
The .bu4 file on the original (SSE2) system (oldest available save file from the run) also resulted in an iteration loop. And a separate continuation attempt from the .bu4 file, on different hardware, Xeon Phi 7250 (AVX512) & ECC MCDRAM, Windows 10 Pro, prime95 v30.8b14 worker 2 of 2 also looped: Quote:
Lessons learned: 100Mdigit LL is hard. It's harder on slow hardware. All the software error checks and ECC ram don't solve everything. If on prime95/mprime, at the outset, for more restart possibilities, and comparison between runs: four backup files are not deep enough set InterimFiles=50000000 or similar in prime.txt set InterimResidues=10000000 or similar in prime.txt Some errors can not be recovered from with .bun files, and they are not apparent until too late. Parallel LL runs have been started from scratch; one on prime95 v30.8b14 on the xeon phi 7250, and another in gpuowl v6.11-380 on a highly reliable Radeon VII. The GPU is indicating about 3.4 times the speed of the prime95 worker. Gpuowl outputs interim residues every 10k; the prime95 worker is set currently for 1M. Last fiddled with by kriesel on 2023-03-07 at 22:20 |
|
|
|
|
|
|
#865 | ||
|
"Jacob"
Sep 2006
Brussels, Belgium
36428 Posts |
Quote:
According to DOS Error Numbers "DOSerrno: 2" would be "File not found". What is the size of the pXXXXXXXXX.* files ? What is the filesystem off the place the program writes to ? Could it be a filesize limitation ? Then only the first of the recurring "1 ILLEGAL SUMOUT/bad FFT data." message corresponds to an error occurring : from undoc.txt : Quote:
|
||
|
|
|
|
|
#866 |
|
Dec 2022
7738 Posts |
As his post already indicated, the file sizes are just over 40MB (essentially one residue) and we has written many before, and he has plenty of free disk space. So, it would seem impossible that file size or disk space has anything to do with it, though a hard disk error seems possible as the trouble apparently started with writing and reading save files.
Otherwise the options seem to be a genuine hardware error, a rare bug in prime95/gwnum, or both. A 'file not found' error, if that is what it is, would seem to implicate prime95 or the OS, but when accompanying another error I wouldn't trust that to be reported sensibly. |
|
|
|
|
|
#867 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
782110 Posts |
Quote:
333043493 / 8 /1024/1024 = 39.7 MiB. (FAT32 is not for boot disks, or data I care about, when there are better alternatives, IMO. I use it reluctantly on USB memory sticks, where it sometimes prevents storing big files.) It's occurred with the same data input, on two different systems, so unlikely to be a drive hardware error. Last fiddled with by kriesel on 2023-03-08 at 14:48 |
|
|
|
|
|
|
#868 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Ken, have you tried forcing use of a 20M FFT length to get past the trouble iterations?
|
|
|
|
|
|
#869 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
782110 Posts |
No. How? I looked for a way to do that and must have missed it (in v30.8b15 readme, undoc, whatsnew). I was also contemplating giving v30.11b1 a try on the 7250 xeon phi.
Last fiddled with by kriesel on 2023-03-08 at 15:58 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Do not post your results here! | kar_bon | Prime Wiki | 40 | 2022-04-03 19:05 |
| what should I post ? | science_man_88 | science_man_88 | 24 | 2018-10-19 23:00 |
| Where to post job ad? | xilman | Linux | 2 | 2010-12-15 16:39 |
| Moderated Post | kar_bon | Forum Feedback | 3 | 2010-09-28 08:01 |
| Something that I just had to post/buy | dave_0273 | Lounge | 1 | 2005-02-27 18:36 |