mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-01-28, 19:15   #199
simon389
 
Aug 2013

3×29 Posts
Default

Quote:
Originally Posted by ET_ View Post
On what (class of) exponent(s)?
https://www.mersenne.org/report_expo...1794089&full=1
simon389 is offline   Reply With Quote
Old 2019-01-28, 19:36   #200
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally Posted by simon389 View Post
My AVX512 machine is totally fine with regular green double checks on version 29.4 b8 but when I run 29.5 b9 it has hardware errors. Like 0.49 > 0.4.
Version 29.4 doesn't actually contain any AVX-512 code. So perhaps your hardware was sufficiently reliable for the old code but not for the new code.
GP2 is offline   Reply With Quote
Old 2019-01-28, 19:54   #201
simon389
 
Aug 2013

3·29 Posts
Default

Quote:
Originally Posted by GP2 View Post
Version 29.4 doesn't actually contain any AVX-512 code. So perhaps your hardware was sufficiently reliable for the old code but not for the new code.
Does running AVX512 optimized code add additional stress on CPU and RAM?
simon389 is offline   Reply With Quote
Old 2019-01-28, 20:18   #202
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×3×11×73 Posts
Default

Quote:
Originally Posted by simon389 View Post
Does running AVX512 optimized code add additional stress on CPU and RAM?
AFAIK It does, as AVX512 instructions require a frequency lowering on the CPU because of the more stress implied. And the FFT cutoff is different as well
ET_ is offline   Reply With Quote
Old 2019-01-29, 13:38   #203
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541910 Posts
Default 29.5b8 did not resume primality test after self initiated benchmark

What the title says.
Primality test under way, prime95 29.5b8 decided to run a brief benchmark, did so, and then did not resume the interrupted primality test in the next 20 hours until I found it stalled and manually intervened.

Very similar to the previous type benchmark hangs, which were user initiated benchmarks. Had to kill the process with task manager on this one also. Continue was grayed out in the Test dropdown menu, stop did not return control.

This occurred on the i7-8750H Dell G3 3579 with Windows Ten.
Attached Thumbnails
Click image for larger version

Name:	did not resume primality test after self initiated benchmark.png
Views:	59
Size:	391.2 KB
ID:	19778  
Attached Files
File Type: txt window text from benchmark stop.txt (2.7 KB, 51 views)
kriesel is offline   Reply With Quote
Old 2019-01-29, 15:35   #204
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

Quote:
Originally Posted by kriesel View Post
What the title says.
Primality test under way, prime95 29.5b8 decided to run a brief benchmark, did so, and then did not resume the interrupted primality test in the next 20 hours until I found it stalled and manually intervened.
There is a build #9 now, see post #184.
ATH is offline   Reply With Quote
Old 2019-01-29, 16:09   #205
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541910 Posts
Default

Quote:
Originally Posted by ATH View Post
There is a build #9 now, see post #184.
Yes, and I had already downloaded it. I follow this thread closely and frequently. Since build 8 was from after the benchmark stall issue was thought to be resolved, the new hang occurrence seemed worth reporting, promptly. The latest hang occurred on the same i7-8750H system that was probably the most "reliable" at reproducing the earlier hang behavior.
There are some things we may not learn about a build if jumping to the latest immediately each time. Some things take a while to show up.

Last fiddled with by kriesel on 2019-01-29 at 16:10
kriesel is offline   Reply With Quote
Old 2019-01-29, 17:08   #206
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2·743 Posts
Default

Quote:
Originally Posted by GP2 View Post
You should continue with the exponent.

I think we have enough confidence in Gerbicz error checking now, so the program can just continue to run with the smaller FFT length and recover from errors as necessary.
That is not question, but with those rollbacks you are redoing iterations, hence your running time will be higher. If these are really FFT computation errors then maybe a higher FFT size would lower the expected(!) running time; here note that even only the number of errors doesn't really matter, say for p~1e12 seeing roughly 100 rollbacks would not be an issue. And if those are only hardware errors, then changing the FFT size doesn't help.
R. Gerbicz is offline   Reply With Quote
Old 2019-01-29, 22:11   #207
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

331310 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
That is not question, but with those rollbacks you are redoing iterations, hence your running time will be higher. If these are really FFT computation errors then maybe a higher FFT size would lower the expected(!) running time; here note that even only the number of errors doesn't really matter, say for p~1e12 seeing roughly 100 rollbacks would not be an issue. And if those are only hardware errors, then changing the FFT size doesn't help.
For what it's worth, I'm starting a very deep dive on analyzing the specific error reporting codes, including a good/bad breakdown when looking at whether errors were repeatable.

It's in the early stages, but at first glance it seemed like even a run with repeatable errors had a higher than average rate of bad results. That was somewhat surprising to me, and may be to George also since we flag those as "clean" and not "suspect".

My goal in this was to see if we can improve how a result is marked clean/suspect when it's turned in... it's actually pretty spot on when it comes to marking results suspect, but I think some things it marks as clean may not be so squeaky clean.
Madpoo is offline   Reply With Quote
Old 2019-01-30, 13:47   #208
simon389
 
Aug 2013

1278 Posts
Exclamation A warning about AVX512 optimizations

I have four quad-channel AVX512 machines dedicated to Prime95 and all of them work fine in 29.4 but have random hardware errors on 29.5.

I have tried both 7820X and 9800X CPUs
I have tried two different kinds of quad channel 3600mhz RAM
I have tried both EVGA X299 Micro motherboards
I have invested in better coolers and kept temps below 70C
I have tried every build of 29.5 from 5-9
(Maybe a 400W platinum rated PSU isn’t enough?)

Hardware errors like 0.49 > 0.4 on all of them.

I’m rolling back to 29.4 until this hopefully gets sorted out someday. Kind of bummed because the optimizations really did make a big difference.
simon389 is offline   Reply With Quote
Old 2019-01-30, 16:18   #209
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2·1,579 Posts
Default

Have you tried any double checks in 29.4 to test if they are producing good results?

Did you watch CPU temperature when running 29.5 ? Those 70C was with 29.5?

Last fiddled with by ATH on 2019-01-30 at 16:20
ATH is offline   Reply With Quote
Reply



All times are UTC. The time now is 20:45.


Sun Aug 1 20:45:18 UTC 2021 up 9 days, 15:14, 0 users, load averages: 1.52, 1.49, 1.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.