mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-12-13, 09:07   #12
SELROC
 

32·11·67 Posts
Default

Quote:
Originally Posted by kriesel View Post
On the NVIDIA side, allowable gpu temperature specification has been declining. For older models, 100+C is common; newer say 97, 94 or 91C. See the attachment at https://www.mersenneforum.org/showpo...11&postcount=2 for some examples with source links.

I'm not sure this temperature 100+C applies very well in practice.


I have always been keeping my GPUs under 79C.


lmsensors shows the CRITICAL TEMPERATURE = 94C
  Reply With Quote
Old 2018-12-13, 10:56   #13
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

6638 Posts
Default

When overclocking, I find stability tends to drop with temperature. I don't know what the physical mechanism is, but this may in part explain why there appears to be so much headroom at stock. They use smaller coolers and may run at higher temperatures under sustained loads. Personally I aim for under 80C.
mackerel is offline   Reply With Quote
Old 2018-12-13, 14:39   #14
simon389
 
Aug 2013

3×29 Posts
Default

Quote:
Are you running one, or two fans on the new cooler?
One 140mm. Does the job. At max load it’s not going above 79C. These Skylake X CPUs are monsters when it comes to high temps. I had no idea.

Quote:
For shorter values of "long" I think anything less than 100C is perfectly fine for silicon devices.
My rationale for the bigger cooler is less wanting 20 year health for the processor and more wanting to make sure these > 0.4 rounding errors are not my fault. Btw, even with the power temps I’m getting multiple > 0.4 errors, which never happened under 29.4 build 8. Is this the fault of the AVX512 optimizations?
simon389 is offline   Reply With Quote
Old 2018-12-13, 15:30   #15
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

Quote:
Originally Posted by simon389 View Post
Btw, even with the power temps I’m getting multiple > 0.4 errors, which never happened under 29.4 build 8. Is this the fault of the AVX512 optimizations?
Get the latest 29.5 build 5:
https://mersenneforum.org/showpost.p...0&postcount=85

There was a problem with the FFT crossover in the first builds, so it chooses a "too small FFT", but it is still fine as long as errors are not above like 0.46. I had 20+ errors on one double check that turned out fine.
ATH is offline   Reply With Quote
Old 2018-12-13, 17:20   #16
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124528 Posts
Default

Quote:
Originally Posted by LaurV View Post
I hope they will never be. Separate programs and tools come from different people who have different competencies. All these programs and tools have different requirements for optimizations, maintenance, etc., and they stay much better as separate tools, which can be upgraded separate, debugged separate, etc.

Otherwise it will be a hell, in spite of the fact that on paper everything looks wonderful, putting all current tools in the same program is as utopic as the communism was... hehe... They both would work in an ideal society, but not in practice.

I have a post here somewhere about buying a swiss knife with scissors, cork screw, nail clipper, a lot of other tools, and a small screwdriver in a corner, when what you actually need is a big robust screwdriver only.
I concur.

We only have one George Woltman. He's helped at times on the gpu front in the past. I trust him to spend the ample time he volunteers where it will do the most good, given his aptitudes and inclinations. The same goes for the small list of other code authors. (Good coders are a scarce and precious project resource. Less than one in a thousand of those who've completed and returned a GIMPS result, have also coded part of a cpu or gpu oriented Mersenne application in recent use, as I recall.)

On the cpu side, new chip designs are released periodically and George attempts to keep up with optimizing assembly language for them, along with other activities.

On the gpu side, there is man-decades and gpu-decades of work to test, document, bug-fix, enhance/extend etc. the existing separate applications. I'm gpu-years into merely testing the feasible limits of CUDAPm1 V0.20 exponents over a small sample of 9 gpu models. (Finding and documenting bug indications along the way.)

The lists of outstanding issues and wish list items of gpu apps are large. They represent a lot of work.
Some effort can be saved by developing code for one gpu app and proving it out there, then employing it in other apps too later. Some of that code sharing has already been done, which is apparent from comments in the various source code files. There is no CUDA PRP code (outside Preda's abandoned gpuowl extension effort). The OpenCL LL codes are no longer being maintained.

The various gpu applications use similar or identical variables, ini file entries, etc in different ways. File formats differ. There would be a lot of recoding and testing just to merge existing functionality of the gpu apps for LL, P-1, and TF on CUDA.
Merging OpenCL into a monolithic gpu app would be additional effort.
Then merging that combination with mprime/prime95 more effort.

Such merges would create new wish list items and subprojects. Multiple workers support for gpus usage. Optimization of computing resources across a heterogenous mix in a system (cpu vs. multiple, perhaps differing models, gpus). Some of these would generate whole new research projects and new or renewed philosophical debates about what is preferable.

Volunteers to help with any of the coding, compiling, testing, or documenting, for the existing applications, separately, please step forward.
kriesel is online now   Reply With Quote
Old 2018-12-13, 17:29   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×32×7×43 Posts
Default

Quote:
Originally Posted by SELROC View Post
I'm not sure this temperature 100+C applies very well in practice.

I have always been keeping my GPUs under 79C.

lmsensors shows the CRITICAL TEMPERATURE = 94C
RX580s?

I'd love to be able to stay under 84C. The HP Z600s I have seem to be designed to make that impractical. It's common that my newer faster gpus in them thermally throttle. All the fans have been checked and are running. I think the Z600 was designed for the 100+C spec of the GTX4xx/Quadro 2000/4000. GTX1060 or higher generally thermally throttle in the same systems. (Z600 is also limited on power plugs, to about GTX1070 support.)

Last fiddled with by kriesel on 2018-12-13 at 17:30
kriesel is online now   Reply With Quote
Old 2018-12-13, 18:10   #18
simon389
 
Aug 2013

8710 Posts
Default

Quote:
Originally Posted by ATH View Post
Get the latest 29.5 build 5:
https://mersenneforum.org/showpost.p...0&postcount=85

There was a problem with the FFT crossover in the first builds, so it chooses a "too small FFT", but it is still fine as long as errors are not above like 0.46. I had 20+ errors on one double check that turned out fine.
Already using that version. I think what I’ll do is 5-7 double checks and if they all come back fine (despite these new rounding errors) I’ll move to LL first timers

Last fiddled with by simon389 on 2018-12-13 at 18:11
simon389 is offline   Reply With Quote
Old 2018-12-13, 18:25   #19
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×11×137 Posts
Default

Quote:
Originally Posted by simon389 View Post
Btw, even with the power temps I’m getting multiple > 0.4 errors, which never happened under 29.4 build 8. Is this the fault of the AVX512 optimizations?
Yes, AVX-512 is a culprit, but I get all the "fault". In 29.4, FMA FFTs were too conservative in choosing FFT crossover points. AVX-512 needs lower crossovers than FMA FFTs. There is a bit of an art to choosing the FFT crossovers, you don't want to panic users with too many and you don't want to leave some performance on the table by being too conservative.

If you get any errors above 0.44 or so, post them here and I may adjust the crossovers further.

If you see an error above 0.48 you probably had a hardware glitch.
Prime95 is online now   Reply With Quote
Old 2018-12-13, 20:56   #20
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

C5616 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If you get any errors above 0.44 or so, post them here and I may adjust the crossovers further.
I got a few from 29.5 build 5 on PRPCF and PRPCFDC exponents:

Iteration: 1369666/6734381, Possible error: round off (0.4547851237) > 0.42188

Iteration: 516374/6736859, Possible error: round off (0.4698537946) > 0.42188

Iteration: 3802238/6737123, Possible error: round off (0.442078719) > 0.42188

Iteration: 498437/8641147, Possible error: round off (0.456402335) > 0.42188

See attached logs.

The 3 x 6.7M DC were fine but the 8.6M has still not been double checked.
Attached Files
File Type: txt roundoff.txt (22.0 KB, 66 views)
ATH is offline   Reply With Quote
Old 2018-12-13, 21:01   #21
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011011112 Posts
Default

Thanks, keep that data coming. I'll adjust the crossovers downward next release.
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AVX512 performance on new shiny Intel kit heliosh Hardware 19 2020-01-18 04:01
LL first test shows 4 days to complete. sr13798 Information & Answers 2 2016-11-14 16:30
Unofficial experimental beta build wombatman YAFU 22 2016-02-19 18:59
Huge ECM speed increase with GMP 6.0.0 wombatman GMP-ECM 13 2014-04-03 22:29
How do they increase processor speed? clowns789 Lounge 17 2004-02-15 00:31

All times are UTC. The time now is 18:22.


Sun Aug 1 18:22:17 UTC 2021 up 9 days, 12:51, 0 users, load averages: 2.95, 2.90, 2.75

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.