mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2020-06-21, 14:59   #540
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

130310 Posts
Default

It is already reported bug/ feature


[Worker #5 Jun 20 16:44:40] Worker starting
[Worker #5 Jun 20 16:44:40] Trying backup intermediate file: p8_1100198.write
[Worker #5 Jun 20 16:44:40] Error reading intermediate file: p8_1100198.write
[Worker #5 Jun 20 16:44:40] Renaming p8_1100198.write to p8_1100198.bad1
[Worker #5 Jun 20 16:44:40] All intermediate files bad. Temporarily abandoning work unit.


And all will be good if this candidate is not erased from worktodo.txt
Please add some control in program: if all intermediate files are bad then dont delete it from worktodo.txt
It was fortune that I have XLS table , and when I sorted results I see one is missing

Last fiddled with by pepi37 on 2020-06-21 at 15:15
pepi37 is offline   Reply With Quote
Old 2020-07-12, 22:22   #541
delraykevin
 
Jul 2020
South Florida

38 Posts
Default

Quote:
Originally Posted by smonkie View Post
Hi Franz! And the others too. :)

I've read carefully your issue. I've been dealing with the same exact trouble with a 3950X. I first tried it in an Aorus Master X570, and for the love of god I couldn't manage to get it Prime95's small ffts stable at stock settings. I tried many BIOS version, I tried different fan configurations, I tried different RAM modules. I couldn't.

Yesterday I decided to try a new board (X570 Taichi) and even when Prime95 seems to cope a little bit better with UEFI defaults, one worker crashes within the first minutes.

So this is clearly something related with not just a particular brand, rather a general power delivery issue extended to Ryzen boards. It's really a boomer because I love to get all my systems Prime95 stable, it just feels right to have that kind of stability. But can't see how we are going to get over this...
So it seems that there is going to be another frustrated 3950X parent joining your ranks...

I have a similar problem with small FFTs. It is reproducible in terms of both which cores generate the errors and the time at which it happens. I have never seen a BSOD or any other sort of error indication and the chip has handled literally everything else I've thrown at it. (No, not an argument for stability, or even a whine, just another data point.)

Given the consistency of how reproducible the issue issue was I started looking around at it, and found that I could "move" the problem by varying the number of workers I used and how many cores I gave to each. However, within a configuration (say 4 workers each with 4 cores), the issue was 100% consistently reproducible both in terms of time and which cores failed.

Even though I've now "worked through" the issue, I'm not sure what I think about my conclusions, which is why I'm here.

What I found was:
  • XMP makes the problem WAY worse (errors are almost instantaneous when XMP is active.) I think that this is related to the UEFI under-volting the memory when XMP is active. I have not yet tried my own profiles, but did try tweaking the one resident in the system. For right now, XMP is just off the table for me until I get some other comfort back.
  • From my days as an overclocker (no, I don't really do that anymore, I just need my systems to be stable), it looked like vdroop may have been the cause as the CPU was always in "turbo" (something I think of as a manufacturer condoned overclock) when it hit and the voltages just looked like it. This was kind of a funny headspace to be in but all that time spent abusing CPUs is finally paying off in diagnosing one that is running stock.

So, I went and made one tweak to the UEFI, I simply set LLC. (It was an incremental change, so don't get the impression I only did it once.) That is the ONLY tweak I made to the UEFI, everything else is 100% stock on the latest and greatest revision of the UEFI.

And presto! With LLC set to "low", small FFTs (or even Blend for that matter) have run for hours. The small FFTs run was just over 8 hours. I actually stopped it to try some other tests. "Low" was the smallest setting that would do the job. From what I can tell "Auto", "Normal" and "Standard" are all the same, they were tested without any effect. The first setting that appears to be different is "Low" and it worked.

Caution: I don't consider LLC to be one of the "big" overclocking tools, but like any tool, it can still hurt you if you use it wrong. If you don't know what you're doing, approach the setting slowly, from the low end. Sorry, don't mean to preach to the choir, but you never know who may read things like this.

Now comes the quandary, what the heck is the problem? I know what the "fix" is, but why? Is it a bad chip? Is it a bad UEFI (or bad implementation of "Auto"/"Normal"/"Standard")? That memory voltage thing? A bad VRM on the board? A bad power connector? Etc, etc...

The board in question is a Gigabyte X570 Aorus Pro (not the ITX version which has a -I designator.) No, not the highest end board in the world, but it should be more than capable of running the 3950x at stock and doing all the things I need it to do.

Sadly I don't have another Ryzen 9 laying around to swap my chip out and try that test.

I did also do some testing with only one stick of memory (and swapping the stick I used), but those results all seemed indistinguishable from the tests with two sticks.

The package temp can spike into the 80's when running Blend (I didn't look to see which test it was doing when that happened) but those are momentary. For the most part the temps stay in the low 60's (it's air cooled.)

I'd love to hear what your feedback is! The one person who responded in this thread saying they were stable at stock is the only report I've heard of someone with this chip who has not struggled getting it p95 stable. On the other hand there doesn't seem to be alot of discussion around this chip, people seem more interested in Cinebench benchmarks and stuff like that.

I have read forums in other places where people have described RMAing their chips (AMD seems quite accommodating on this at this point) only to get worse samples back. The one that stands out is the person who had a issue similar to mine (two "cores" consistently tossing errors at consistent times) and so they RMA'd it and got a new sample that tossed errors on five "cores". That went back to AMD and they sent another that failed on even more. I can only assume that person still working with AMD on the issue.

Sorry for the long ramble!
delraykevin is offline   Reply With Quote
Old 2020-07-13, 20:53   #542
nordi
 
Dec 2016

2×19 Posts
Default

Quote:
Originally Posted by delraykevin View Post
I'd love to hear what your feedback is!
My 3950X has been working on small FFTs on all 32 virtual cores for many weeks without any error or crash. I did not do any tweaking, it just worked out of the box.

With "small FFTs" I mean an FFT size of 1536 (not 1536k!), so everything fits into L1 cache, thus maximizing the load on the CPU. Not sure if that's what you meant with "small". My mainboard is an Asus Prime X570-Pro, btw.
nordi is offline   Reply With Quote
Old 2020-07-13, 21:04   #543
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

1,303 Posts
Default

I dont have 3950X I have 3900X. And my motherboard is half price of your motherboard. I run rock stable, with very small undervolt and very small overclock ( it is at 3.9 Ghz)
I dont test like you, but for now I process over 50000 candidates from 96K to 768K , many of them was on base2 and Gerbitz test didn't catch any error, or I see any error in mprime log file.
pepi37 is offline   Reply With Quote
Old 2020-07-13, 21:10   #544
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

25·3·73 Posts
Default

FYI: The mysterious "Error writing intermediate file" bug during a PRP test has been found and fixed in version 30.1. The bug is is fairly harmless and happens about 1 in every 1000 save file creation attempts.
Prime95 is offline   Reply With Quote
Old 2020-07-13, 21:25   #545
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

24278 Posts
Default

Quote:
Originally Posted by Prime95 View Post
FYI: The mysterious "Error writing intermediate file" bug during a PRP test has been found and fixed in version 30.1. The bug is is fairly harmless and happens about 1 in every 1000 save file creation attempts.
Where to download that version?
pepi37 is offline   Reply With Quote
Old 2020-07-13, 21:28   #546
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

29·101 Posts
Default

Quote:
Originally Posted by pepi37 View Post
Where to download that version?
http://mersenne.org/download/
As soon as George has finished working on it and releases it. Which isn't yet.
James Heinrich is online now   Reply With Quote
Old 2020-07-13, 21:30   #547
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

1,303 Posts
Default

And George if you can fix bug when you start bench marking ( under Windows) and benchmark start and exit immediately ( and of course doesn't benchmark anything)
Also in that case bench.txt file is empty ( only topology of CPU is written)

Last fiddled with by pepi37 on 2020-07-13 at 21:30
pepi37 is offline   Reply With Quote
Old 2020-07-14, 03:18   #548
delraykevin
 
Jul 2020
South Florida

316 Posts
Default

Quote:
Originally Posted by nordi View Post
With "small FFTs" I mean an FFT size of 1536 (not 1536k!), so everything fits into L1 cache, thus maximizing the load on the CPU. Not sure if that's what you meant with "small". My mainboard is an Asus Prime X570-Pro, btw.
So by "small FFT" I mean the default option on the Torture Test menu (see attached screen shot). It almost seems like what you're describing is "smallest FFT". My chip has run those successfully for over 8 hours.
Attached Thumbnails
Click image for larger version

Name:	7-13-2020 11-13-14 PM.jpg
Views:	12
Size:	42.0 KB
ID:	22789  
delraykevin is offline   Reply With Quote
Old 2020-07-14, 16:11   #549
Fan Ming
 
Oct 2019

5×19 Posts
Default

Is there any option for Prime95 not to generate save files for small work units(for example, <1 GHz-day)? It's a painful experience to delete a lot of such save files on Google Drive or adjust the DiskWriteTime option manually every time when doing these kind of works.

Last fiddled with by Fan Ming on 2020-07-14 at 16:12
Fan Ming is offline   Reply With Quote
Old 2020-07-19, 14:41   #550
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

101338 Posts
Default Large differences in iteration times

Running dual Xeon systems, I'm seeing double to nearly 5x ratios on iteration times on workers using a Xeon or half each, for modest differences in fft length or exponent. These substantial timing differences survive prime95 stop and restart.
Attached Thumbnails
Click image for larger version

Name:	ostrich timing anomalies.png
Views:	19
Size:	63.9 KB
ID:	22827   Click image for larger version

Name:	roa iteration time discrepancy 2.png
Views:	13
Size:	45.4 KB
ID:	22828  

Last fiddled with by kriesel on 2020-07-19 at 14:43
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 29.2 Prime95 Software 71 2017-09-16 16:55
Prime95 version 29.1 Prime95 Software 95 2017-08-22 22:46
Prime95 version 26.5 Prime95 Software 175 2011-04-04 22:35
Prime95 version 25.9 Prime95 Software 143 2010-01-05 22:53
Prime95 version 25.8 Prime95 Software 159 2009-09-21 16:30

All times are UTC. The time now is 00:39.

Tue Aug 4 00:39:15 UTC 2020 up 17 days, 20:26, 0 users, load averages: 1.11, 1.34, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.