![]() |
[QUOTE=kriesel;502582]On the NVIDIA side, allowable gpu temperature specification has been declining. For older models, 100+C is common; newer say 97, 94 or 91C. See the attachment at [URL]https://www.mersenneforum.org/showpost.php?p=490611&postcount=2[/URL] for some examples with source links.[/QUOTE]
I'm not sure this temperature 100+C applies very well in practice. I have always been keeping my GPUs under 79C. lmsensors shows the CRITICAL TEMPERATURE = 94C |
When overclocking, I find stability tends to drop with temperature. I don't know what the physical mechanism is, but this may in part explain why there appears to be so much headroom at stock. They use smaller coolers and may run at higher temperatures under sustained loads. Personally I aim for under 80C.
|
[QUOTE]Are you running one, or two fans on the new cooler?[/QUOTE]
One 140mm. Does the job. At max load it’s not going above 79C. These Skylake X CPUs are monsters when it comes to high temps. I had no idea. [QUOTE]For shorter values of "long" I think anything less than 100C is perfectly fine for silicon devices.[/QUOTE] My rationale for the bigger cooler is less wanting 20 year health for the processor and more wanting to make sure these > 0.4 rounding errors are not my fault. Btw, even with the power temps I’m getting multiple > 0.4 errors, which never happened under 29.4 build 8. Is this the fault of the AVX512 optimizations? |
[QUOTE=simon389;502606]Btw, even with the power temps I’m getting multiple > 0.4 errors, which never happened under 29.4 build 8. Is this the fault of the AVX512 optimizations?[/QUOTE]
Get the latest 29.5 build 5: [url]https://mersenneforum.org/showpost.php?p=501210&postcount=85[/url] There was a problem with the FFT crossover in the first builds, so it chooses a "too small FFT", but it is still fine as long as errors are not above like 0.46. I had 20+ errors on one double check that turned out fine. |
[QUOTE=LaurV;502481]I hope they will never be. Separate programs and tools come from different people who have different competencies. All these programs and tools have different requirements for optimizations, maintenance, etc., and they stay much better as separate tools, which can be upgraded separate, debugged separate, etc.
Otherwise it will be a hell, in spite of the fact that on paper everything looks wonderful, putting all current tools in the same program is as utopic as the communism was... hehe... They both would work in an ideal society, but not in practice. I have a post here somewhere about buying a swiss knife with scissors, cork screw, nail clipper, a lot of other tools, and a small screwdriver in a corner, when what you actually need is a big robust screwdriver only.[/QUOTE] I concur. We only have one George Woltman. He's helped at times on the gpu front in the past. I trust him to spend the ample time he volunteers where it will do the most good, given his aptitudes and inclinations. The same goes for the small list of other code authors. (Good coders are a scarce and precious project resource. Less than one in a thousand of those who've completed and returned a GIMPS result, have also coded part of a cpu or gpu oriented Mersenne application in recent use, as I recall.) On the cpu side, new chip designs are released periodically and George attempts to keep up with optimizing assembly language for them, along with other activities. On the gpu side, there is man-decades and gpu-decades of work to test, document, bug-fix, enhance/extend etc. the existing separate applications. I'm gpu-years into merely testing the feasible limits of CUDAPm1 V0.20 exponents over a small sample of 9 gpu models. (Finding and documenting bug indications along the way.) The lists of outstanding issues and wish list items of gpu apps are large. They represent a lot of work. Some effort can be saved by developing code for one gpu app and proving it out there, then employing it in other apps too later. Some of that code sharing has already been done, which is apparent from comments in the various source code files. There is no CUDA PRP code (outside Preda's abandoned gpuowl extension effort). The OpenCL LL codes are no longer being maintained. The various gpu applications use similar or identical variables, ini file entries, etc in different ways. File formats differ. There would be a lot of recoding and testing just to merge existing functionality of the gpu apps for LL, P-1, and TF on CUDA. Merging OpenCL into a monolithic gpu app would be additional effort. Then merging that combination with mprime/prime95 more effort. Such merges would create new wish list items and subprojects. Multiple workers support for gpus usage. Optimization of computing resources across a heterogenous mix in a system (cpu vs. multiple, perhaps differing models, gpus). Some of these would generate whole new research projects and new or renewed philosophical debates about what is preferable. Volunteers to help with any of the coding, compiling, testing, or documenting, for the existing applications, separately, please step forward. |
[QUOTE=SELROC;502586]I'm not sure this temperature 100+C applies very well in practice.
I have always been keeping my GPUs under 79C. lmsensors shows the CRITICAL TEMPERATURE = 94C[/QUOTE] RX580s? I'd love to be able to stay under 84C. The HP Z600s I have seem to be designed to make that impractical. It's common that my newer faster gpus in them thermally throttle. All the fans have been checked and are running. I think the Z600 was designed for the 100+C spec of the GTX4xx/Quadro 2000/4000. GTX1060 or higher generally thermally throttle in the same systems. (Z600 is also limited on power plugs, to about GTX1070 support.) |
[QUOTE=ATH;502615]Get the latest 29.5 build 5:
[url]https://mersenneforum.org/showpost.php?p=501210&postcount=85[/url] There was a problem with the FFT crossover in the first builds, so it chooses a "too small FFT", but it is still fine as long as errors are not above like 0.46. I had 20+ errors on one double check that turned out fine.[/QUOTE] Already using that version. I think what I’ll do is 5-7 double checks and if they all come back fine (despite these new rounding errors) I’ll move to LL first timers |
[QUOTE=simon389;502606] Btw, even with the power temps I’m getting multiple > 0.4 errors, which never happened under 29.4 build 8. Is this the fault of the AVX512 optimizations?[/QUOTE]
Yes, AVX-512 is a culprit, but I get all the "fault". In 29.4, FMA FFTs were too conservative in choosing FFT crossover points. AVX-512 needs lower crossovers than FMA FFTs. There is a bit of an art to choosing the FFT crossovers, you don't want to panic users with too many and you don't want to leave some performance on the table by being too conservative. If you get any errors above 0.44 or so, post them here and I may adjust the crossovers further. If you see an error above 0.48 you probably had a hardware glitch. |
1 Attachment(s)
[QUOTE=Prime95;502649]If you get any errors above 0.44 or so, post them here and I may adjust the crossovers further.[/QUOTE]
I got a few from 29.5 build 5 on PRPCF and PRPCFDC exponents: Iteration: 1369666/6734381, Possible error: round off (0.4547851237) > 0.42188 Iteration: 516374/6736859, Possible error: round off (0.4698537946) > 0.42188 Iteration: 3802238/6737123, Possible error: round off (0.442078719) > 0.42188 Iteration: 498437/8641147, Possible error: round off (0.456402335) > 0.42188 See attached logs. The 3 x 6.7M DC were fine but the 8.6M has still not been double checked. |
Thanks, keep that data coming. I'll adjust the crossovers downward next release.
|
| All times are UTC. The time now is 18:17. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.