mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2019-06-20, 21:47   #78
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×2,939 Posts
Default

I've see timing variations of that order of magnitude from one run invocation to the next - seems to be partly a function of whatever memory mapping one gets at run start. Also seen poor timings one day, then same run cranking along 4-5% faster the next day, without any change in user-perspective system load or ambient temps. (I only use my odroid for Mlucas runs, plus occasional builds and print jobs - it's the only one of my devices which supports my bought-this-year cheapie HP printer, as I have no Windows devices and my Mac's OS is older than said printer requires.

Re. heat, I had my N2 (before I shipped it off in Paul L's direction) sitting on my desk, no fan air, just relying on convective air wafting around the heat sink, which did get quite warm to the touch. The heat sink is large enough that any small amount of moving air should suffice, whether secondary air from a nearby device's exhaust fan or an open window.
ewmayer is offline   Reply With Quote
Old 2019-06-20, 23:53   #79
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

10010010011012 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I've see timing variations of that order of magnitude from one run invocation to the next - seems to be partly a function of whatever memory mapping one gets at run start. Also seen poor timings one day, then same run cranking along 4-5% faster the next day, without any change in user-perspective system load or ambient temps. (I only use my odroid for Mlucas runs, plus occasional builds and print jobs - it's the only one of my devices which supports my bought-this-year cheapie HP printer, as I have no Windows devices and my Mac's OS is older than said printer requires.

Re. heat, I had my N2 (before I shipped it off in Paul L's direction) sitting on my desk, no fan air, just relying on convective air wafting around the heat sink, which did get quite warm to the touch. The heat sink is large enough that any small amount of moving air should suffice, whether secondary air from a nearby device's exhaust fan or an open window.
I've ordered a to-arrive-tomorrow USB fan to put underneath it. My C integer only program heats my R-pi 3B+ up to 80C and it throttles. So, your floating point is going to heat up the N2! Although the heat sink is huge, it is quite hot to touch if I leave my finger on it for 3 or more seconds -- it must have taken several days for the heat to build in it. When the fan is installed I will try to ascertain cause and effect.
paulunderwood is offline   Reply With Quote
Old 2019-06-21, 02:31   #80
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

152118 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
... it is quite hot to touch if I leave my finger on it for 3 or more seconds ...
To me that would appear to be ~55C. Generally I find that 60C equates to about 1 second for most people before the pain dictates some quick action. And 50C equates to no desperate need to remove the fingers.

But the heat-sink temperature isn't really too important. It is the junction temps that need monitoring. If those are less than 90C then everything should be good without the need for throttling. If there is some setting in the OS it might be worth adjusting it to get better throttling behaviour.
retina is offline   Reply With Quote
Old 2019-06-21, 03:12   #81
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

124D16 Posts
Default

Quote:
Originally Posted by retina View Post
To me that would appear to be ~55C. Generally I find that 60C equates to about 1 second for most people before the pain dictates some quick action. And 50C equates to no desperate need to remove the fingers.

But the heat-sink temperature isn't really too important. It is the junction temps that need monitoring. If those are less than 90C then everything should be good without the need for throttling. If there is some setting in the OS it might be worth adjusting it to get better throttling behaviour.
I installed "cpulimit" and limited the mlucas_v19 process to 300% -- was 400% The temperature of the heat sink should start to drop. "lm-sensors" does not work on the N2., but I will do the finger test in a few hours and if it seems to have cooled significantly. I am hoping the new fan cures the overheating problem. I have seen videos where the first thing done with a new N2 is to replace the thermal paste, but I don't plan to do that.
paulunderwood is offline   Reply With Quote
Old 2019-06-21, 04:45   #82
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
My C integer only program heats my R-pi 3B+ up to 80C and it throttles. So, your floating point is going to heat up the N2!
The Raspberry Pi (3B+ and 3A+) will actually start throttling the clock at 60C, unless set otherwise (temp_soft_limit in config.txt). Very light throttling in the beginning, 1.4 GHz to 1.2 GHz and dropping the core voltage a bit at the same time. Then there's a hard limit at 85C. What I've found is that without a heat sink, there is no hope at all to run at full load without throttling. And with Mlucas using the NEON ASIMD instructions, there needs to be some airflow over that heatsink. Not much though. I've stacked five RPI 3A+ with 14x14x14 mm heatsinks and have two of these stacks side by side. Without a fan, they will go up to about 75C. With a single undervolted 120mm fan (12V fan running at 5V) cooling the whole 2x5 stack, all of them stay comfortable at around 45-50C.

But the N2 is very much a different beast. The Raspberry Pi processor is made on a 40 nm process which is not that great anymore for anything that needs to do things instead of mostly sitting idle. Smaller process nodes may have bigger leakage currents i.e. idle consumption, but the operating power consumption is still getting smaller from node to node. And the N2 processor, Amlogic S922X, is made in 12 nm, so even with the bigger A73 cores, the stock heatsink should be enough for most normal loads. Of course, again the vector instructions generate more heat than normal float.

For temperature monitoring take a peek in /sys/class/thermal . This varies a bit from device to device, but there should at least be one directory called thermal_zone0 below that. Maybe more, thermal_zone1 etc. depending on the chip. Anyway, in each of those directories, there is a file called temp that tells the temperature (probably scaled by x1000) and another file called type that tells what's being measured.
nomead is offline   Reply With Quote
Old 2019-06-21, 13:01   #83
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·937 Posts
Default

Thanks to nomead for the pointers to /sys/class/thermal/thermal_zone0/temp. With cpulimit set to 300% the temperature was 49.5C, without (i.e. 400%) it was 61.1. On fitting the USB fan the temperature dropped rapidly to 44.1C. However it seems there is no impact on iterations time and that Ernst's observation about different runs have different timings seems to be true. To wit I will be restarting mlucas_v19 until I get it back to 102.5ms instead of 109.5ms.
paulunderwood is offline   Reply With Quote
Old 2019-06-21, 18:58   #84
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·2,939 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Thanks to nomead for the pointers to /sys/class/thermal/thermal_zone0/temp. With cpulimit set to 300% the temperature was 49.5C, without (i.e. 400%) it was 61.1. On fitting the USB fan the temperature dropped rapidly to 44.1C. However it seems there is no impact on iterations time and that Ernst's observation about different runs have different timings seems to be true. To wit I will be restarting mlucas_v19 until I get it back to 102.5ms instead of 109.5ms.
Yeah, crank it up - no guts, no glory! Like I said, my low-tech way to detect throttling is simply watching the timings, but of course that is reliable within the context of a single run, i.e. in the absence of run-to-run timing variability, which I've found to be much larger on my ARM devices than my x86 ones. The sys/class/thermal tip is definitely useful in terms of more-precise tracking of thermals, what do the various entries mean? On my Odroid C2, sys/class/thermal has links to cooling_device0-3 and thermal_zone0-1 ... don't know what the former are about; the latter 2 have 'temp' files with entries 68000 (which presumably means 68C) and 2000. Paul, does your N2 have separate thermal_zone dirs for the a53 and a73 portions of the chip?
ewmayer is offline   Reply With Quote
Old 2019-06-21, 19:15   #85
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×937 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Yeah, crank it up - no guts, no glory! Like I said, my low-tech way to detect throttling is simply watching the timings, but of course that is reliable within the context of a single run, i.e. in the absence of run-to-run timing variability, which I've found to be much larger on my ARM devices than my x86 ones. The sys/class/thermal tip is definitely useful in terms of more-precise tracking of thermals, what do the various entries mean? On my Odroid C2, sys/class/thermal has links to cooling_device0-3 and thermal_zone0-1 ... don't know what the former are about; the latter 2 have 'temp' files with entries 68000 (which presumably means 68C) and 2000. Paul, does your N2 have separate thermal_zone dirs for the a53 and a73 portions of the chip?
It has two zones -- which I guess is for each chip:

Code:
 cat /sys/class/thermal/thermal_zone0/temp
45100
Code:
 cat /sys/class/thermal/thermal_zone1/temp
42900
On this run I am getting a minimum of 107.7ms/iteration.

I guess there will be no use using all six cores. Will the a53 slow down the a73?

Last fiddled with by paulunderwood on 2019-06-21 at 19:32
paulunderwood is offline   Reply With Quote
Old 2019-06-21, 19:48   #86
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×2,939 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
It has two zones -- which I guess is for each chip:

Code:
 cat /sys/class/thermal/thermal_zone0/temp
45100
Code:
 cat /sys/class/thermal/thermal_zone1/temp
42900
On this run I am getting a minimum of 107.7ms/iteration.

I guess there will be no use using all six cores. Will the a53 slow down the a73?
OK, so it seems the thermal_zone1 on my C2 is just a placeholder, since there is no second CPU on the die. Re. running on both - yes, that will slow down your a73 run, but the total throughput will still be more than a73-only, albeit only modestly. On my N2, here were the approximate numbers from my runs:

o Dual-core a53 is ~1/4 the FLOPS of the 4-core a73, with respect to each running in standalone mode at max throughput (2-threaded Mlucas on a53, 4-threaded on a73);

o Running on both a53 and a73 slows each down by ~10% versus that-CPU-only running, thus total throughput is equivalent to roughly 0.9*(4+1) = 4.5 a73 CPUs.
ewmayer is offline   Reply With Quote
Old 2019-06-21, 19:52   #87
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·937 Posts
Default

Quote:
Originally Posted by ewmayer View Post
OK, so it seems the thermal_zone1 on my C2 is just a placeholder, since there is no second CPU on the die. Re. running on both - yes, that will slow down your a73 run, but the total throughput will still be more than a73-only, albeit only modestly. On my N2, here were the approximate numbers from my runs:

o Dual-core a53 is ~1/4 the FLOPS of the 4-core a73, with respect to each running in standalone mode at max throughput (2-threaded Mlucas on a53, 4-threaded on a73);

o Running on both a53 and a73 slows each down by ~10% versus that-CPU-only running, thus total throughput is equivalent to roughly 0.9*(4+1) = 4.5 a73 CPUs.
I'd like to try all six. Do I need to rerun the self test/configuration?
paulunderwood is offline   Reply With Quote
Old 2019-06-21, 20:14   #88
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

267548 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I'd like to try all six. Do I need to rerun the self test/configuration?
You should only have to re-run self-tests for the a53 CPU ... in my experience using both CPUs increases the absolute timings but does not appreciably affect the optimal-FFT-parameters for each CPU, so you can run the a53 self-test without pausing your a73 job. Just make sure to run the a53 timings in a separate dir, so as to create a separate mlucas.cfg file for that CPU, and use -s m -cpu 0:1, obviously. Even using just the default 100 iterations per timing sample that self-test will take a while due to the puniness of the a53 CPU, probably a couple of hours.

If I still had my N2 I'd shoot you the a53-specific mlucas.cfg file, but I didn't save copies of those config files before shipping the unit off to Paul L., I only copied the .stat and savefiles for the 2 jobs I was running on it - the a73 LL-test is now queued up on my Intel NUC, and the a53 DC on my Odroid C2.
ewmayer is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mprime on Odroid 64bit ET_ Software 2 2017-02-24 15:42
GPU72 plans post-announcement garo GPU to 72 25 2013-03-04 10:11
The Prime Announcement Thread axn Sierpinski/Riesel Base 5 61 2008-12-08 16:28
Subscribing to announcement thread fetofs GMP-ECM 1 2006-05-30 04:32
Fourth known factor of M(M31) (preliminary announcement) ewmayer Operazione Doppi Mersennes 22 2005-07-06 00:33

All times are UTC. The time now is 04:28.


Fri Jul 7 04:28:23 UTC 2023 up 323 days, 1:56, 0 users, load averages: 1.31, 1.65, 1.59

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔