![]() |
[QUOTE=ET_;507044]On what (class of) exponent(s)?[/QUOTE]
[url]https://www.mersenne.org/report_exponent/?exp_lo=51794089&full=1[/url] |
[QUOTE=simon389;507039]My AVX512 machine is totally fine with regular green double checks on version 29.4 b8 but when I run 29.5 b9 it has hardware errors. Like 0.49 > 0.4.[/QUOTE]
Version 29.4 doesn't actually contain any AVX-512 code. So perhaps your hardware was sufficiently reliable for the old code but not for the new code. |
[QUOTE=GP2;507054]Version 29.4 doesn't actually contain any AVX-512 code. So perhaps your hardware was sufficiently reliable for the old code but not for the new code.[/QUOTE]
Does running AVX512 optimized code add additional stress on CPU and RAM? |
[QUOTE=simon389;507056]Does running AVX512 optimized code add additional stress on CPU and RAM?[/QUOTE]
AFAIK It does, as AVX512 instructions require a frequency lowering on the CPU because of the more stress implied. And the FFT cutoff is different as well |
29.5b8 did not resume primality test after self initiated benchmark
2 Attachment(s)
What the title says.
Primality test under way, prime95 29.5b8 decided to run a brief benchmark, did so, and then did not resume the interrupted primality test in the next 20 hours until I found it stalled and manually intervened. Very similar to the previous type benchmark hangs, which were user initiated benchmarks. Had to kill the process with task manager on this one also. Continue was grayed out in the Test dropdown menu, stop did not return control. This occurred on the i7-8750H Dell G3 3579 with Windows Ten. |
[QUOTE=kriesel;507089]What the title says.
Primality test under way, prime95 29.5b8 decided to run a brief benchmark, did so, and then did not resume the interrupted primality test in the next 20 hours until I found it stalled and manually intervened.[/QUOTE] There is a build #9 now, see post #184. |
[QUOTE=ATH;507099]There is a build #9 now, see post #184.[/QUOTE]
Yes, and I had already downloaded it. I follow this thread closely and frequently. Since build 8 was from after the benchmark stall issue was thought to be resolved, the new hang occurrence seemed worth reporting, promptly. The latest hang occurred on the same i7-8750H system that was probably the most "reliable" at reproducing the earlier hang behavior. There are some things we may not learn about a build if jumping to the latest immediately each time. Some things take a while to show up. |
[QUOTE=GP2;507046]You should continue with the exponent.
I think we have enough confidence in Gerbicz error checking now, so the program can just continue to run with the smaller FFT length and recover from errors as necessary.[/QUOTE] That is not question, but with those rollbacks you are redoing iterations, hence your running time will be higher. If these are really FFT computation errors then maybe a higher FFT size would lower the expected(!) running time; here note that even only the number of errors doesn't really matter, say for p~1e12 seeing roughly 100 rollbacks would not be an issue. And if those are only hardware errors, then changing the FFT size doesn't help. |
[QUOTE=R. Gerbicz;507107]That is not question, but with those rollbacks you are redoing iterations, hence your running time will be higher. If these are really FFT computation errors then maybe a higher FFT size would lower the expected(!) running time; here note that even only the number of errors doesn't really matter, say for p~1e12 seeing roughly 100 rollbacks would not be an issue. And if those are only hardware errors, then changing the FFT size doesn't help.[/QUOTE]
For what it's worth, I'm starting a very deep dive on analyzing the specific error reporting codes, including a good/bad breakdown when looking at whether errors were repeatable. It's in the early stages, but at first glance it seemed like even a run with repeatable errors had a higher than average rate of bad results. That was somewhat surprising to me, and may be to George also since we flag those as "clean" and not "suspect". My goal in this was to see if we can improve how a result is marked clean/suspect when it's turned in... it's actually pretty spot on when it comes to marking results suspect, but I think some things it marks as clean may not be so squeaky clean. :smile: |
A warning about AVX512 optimizations
I have four quad-channel AVX512 machines dedicated to Prime95 and all of them work fine in 29.4 but have random hardware errors on 29.5.
I have tried both 7820X and 9800X CPUs I have tried two different kinds of quad channel 3600mhz RAM I have tried both EVGA X299 Micro motherboards I have invested in better coolers and kept temps below 70C I have tried every build of 29.5 from 5-9 (Maybe a 400W platinum rated PSU isn’t enough?) Hardware errors like 0.49 > 0.4 on all of them. I’m rolling back to 29.4 until this hopefully gets sorted out someday. Kind of bummed because the optimizations really did make a big difference. |
Have you tried any double checks in 29.4 to test if they are producing good results?
Did you watch CPU temperature when running 29.5 ? Those 70C was with 29.5? |
| All times are UTC. The time now is 22:33. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.