![]() |
[QUOTE=ewmayer;539344]Interrupt was a machine crash - this is my ever-flaky Haswell system. [snip][/QUOTE]
My Haswell won't boot unless I give it more voltage: vcore 1v-->1.1v (?) |
[QUOTE=paulunderwood;539352]My Haswell won't boot unless I give it more voltage: vcore 1v-->1.1v (?)[/QUOTE]
Good thought - mine's never failed to boot, but does not-infrequently freeze immediately on reboot. I'll check this setting after the next BSOD. Especially in very warm weather like we're having in NoCal currently, it shouldn't be very long. |
My first Haswell would crash when going from a high power state to a low power state (e.g. stopping prime95). I worked around the problem by disabling C states.
|
[QUOTE=ewmayer;539344]So data corruption is likely reaponsible for the symptoms - so the 'invalid' is telling me the program found a corrupted primary restart file and is resorting to using the secondary one as a result?[/QUOTE]
Yes. All gpuowl does on savefile is write the file and close it. From this point on, it's the OS's job to persist the file to disk. It turns out often the OS is lazy and prefers to keep the data in RAM for a while longer, and if a OS crash happens in this window, the savefile isn't properly persisted. |
[QUOTE=Prime95;539358]My first Haswell would crash when going from a high power state to a low power state (e.g. stopping prime95). I worked around the problem by disabling C states.[/QUOTE]
Yes, I get this a lot too with Mlucas on the Haswell - e.g. if I see that a given job has switched to a larger FFT length due to ROE >= 0.4375, but the error is an outlier in the context of the run, go to kill job in order to restart and force the default lower FFT length, crash. Had another crash overnight, had a look at core voltage in the BIOS, 1.03-something - but I couldn't see a way to change it in the setup menu for [url=https://us.msi.com/Motherboard/support/Z87-G41-PC-Mate]my MSI MoBo[/url] - there is a [manual] setting option for it, but when I enabled that and scrolled to the actual setting field beeath it, the latter permitted no changing. Maybe somewhere in the Overclock submenu? Ah, now that I actually RTFM, under the OC submenu I see a bunch of stuff including VCCIN voltage - Set the CPU input voltage CPU core voltage/CPU ring voltage/CPU GT voltage Overvoltage protection George, what model MoBo do you have? I see nothing resembling "C states" in the MSI manual. |
[QUOTE=ewmayer;539449]George, what model MoBo do you have? I see nothing resembling "C states" in the MSI manual.[/QUOTE]
I'm not home right now. Probably Gigabyte or MSI. C states are discussed on page 3-23 of your motherboard's manual. It could be that different Intel chipsets expose different BIOS settings. Also, look for a setting that boosts voltage when AVX instructions are detected. IIRC, that should be set to add 0.1V. |
Radeon VII performance comparison
Hi, I propose a standard setup for performance numbers on RadeonVII which would make them easier to compare.
So the proposed "standard" RadeonVII gpuowl setup for perf measurements would be: - sclk 4 (about 1520 Hz) - memory at 1180 Also, the GPU should not be cold -- the measurements should be in "stable-state", e.g. after at least a few minutes of running. Also the GPU should not be "thermal throttling", the fan should be high enough to keep the GPU relatively cool. I think that a GPU with the hot-spot temperature (the highest temperature among the three reported) less than 90 (possibly even a few degrees higher) would not be throttling. Please include with perf measurements the FFT size. Of course the most important FFT size is the "wavefront" which ATM is 5M, but the wavefront FFT does change overtime so better to be specific. For example, On ROCm 3.1 I get 664 us/it at 5M FFT. This is with sclk 4, RAM 1180, temperature 90, power about 185W; the fan at about 2500 RPM. Linux kernel 5.4.24. If 1180 RAM is too fast for some GPUs, we could settle on a lower value that's acceptable for almost everybody (maybe 1150?). |
[QUOTE=preda;539687]Hi, I propose a standard setup for performance numbers on RadeonVII which would make them easier to compare.
So the proposed "standard" RadeonVII gpuowl setup for perf measurements would be: - sclk 4 (about 1520 [B]M[/B]Hz) - memory at 1180 Also, the GPU should not be cold -- the measurements should be in "stable-state", e.g. after at least a few minutes of running. Also the GPU should not be "thermal throttling", the fan should be high enough to keep the GPU relatively cool. I think that a GPU with the hot-spot temperature (the highest temperature among the three reported) less than 90 (possibly even a few degrees higher) would not be throttling[/QUOTE] sclk presumes linux. As far as I know there is not a Windows equivalent. some gpus can not run 1180Mhz memory clock reliably, or 1520 gpu clock. Whatever the performance run parameters are, all relevant parameters should be stated along with the timing, so that the timing is not meaningless. |
[QUOTE=preda;539687]Hi, I propose a standard setup for performance numbers on RadeonVII which would make them easier to compare.
So the proposed "standard" RadeonVII gpuowl setup for perf measurements would be: - sclk 4 (about 1520 Hz) - memory at 1180 Also, the GPU should not be cold -- the measurements should be in "stable-state", e.g. after at least a few minutes of running. Also the GPU should not be "thermal throttling", the fan should be high enough to keep the GPU relatively cool. I think that a GPU with the hot-spot temperature (the highest temperature among the three reported) less than 90 (possibly even a few degrees higher) would not be throttling. Please include with perf measurements the FFT size. Of course the most important FFT size is the "wavefront" which ATM is 5M, but the wavefront FFT does change overtime so better to be specific. For example, On ROCm 3.1 I get 664 us/it at 5M FFT. This is with sclk 4, RAM 1180, temperature 90, power about 185W; the fan at about 2500 RPM. Linux kernel 5.4.24. If 1180 RAM is too fast for some GPUs, we could settle on a lower value that's acceptable for almost everybody (maybe 1150?).[/QUOTE] My Radeon VII won't run except at stock memory clock speed. Even 1050 Mhz produces errors. |
[QUOTE=kriesel;539701]sclk presumes linux. As far as I know there is not a Windows equivalent.
some gpus can not run 1180Mhz memory clock reliably, or 1520 gpu clock. Whatever the performance run parameters are, all relevant parameters should be stated along with the timing, so that the timing is not meaningless.[/QUOTE] I've still only ever gotten the manual sclk-setting working under Ubunto 19.10 ... Preda may recall [url=https://mersenneforum.org/showthread.php?t=24979&page=9]my flailings-about in trying to fiddle the mem-clocking[/url]. In post #64 of that thread I tabulated per-iter times @5632K FFT on my system, mem-clock at the default 1001MHz: [code]--setsclk 5: 757 us/iter, temp = 70C, watts = 400 [~120 of those are baseline, including an ongoing 4-thread Mlucas job on the CPU] --setsclk 4: 792 us/iter, temp = 65C, watts = 350 --setsclk 3: 848 us/iter, temp = 63C, watts = 300[/code] The temps are not meaningful except in a comparative sense - it seems e.g. Win and Linux interfaces take temps from different sensors on the GPU, I've seen Win users talking about temps being routinely in the 90-100C range, whereas in my setup the card starts throttlig any time whichever sensor is measuring the above rocm-smi-displayed temp gets close to 80C. |
[QUOTE=ewmayer;539719] it seems e.g. Win and Linux interfaces take temps from different sensors on the GPU, I've seen Win users talking about temps being routinely in the 90-100C range, whereas in my setup the card starts throttlig any time whichever sensor is measuring the above rocm-smi-displayed temp gets close to 80C.[/QUOTE]
There are lots of sensors on a RadeonVII. Apparently numerous sensors built into the chip. Similarly for the RX5700. [URL]https://www.tomshardware.com/news/amd-rx-5700-graphics-card-thermal-management,40144.html[/URL] It throttles by design at 110C on the hottest of the many sensors. GPU-Z temperature displays for a Radeon VII (this one is cut back considerably on clock rates for reliability, so running cooler than most) GPU 62C GPU hot spot 72 memory 64 gpu VRM 63 SOC VRM 58 Mem1 64 Mem2 66 fan speed 33% CPU temp 63C |
| All times are UTC. The time now is 23:10. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.