mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2019-06-21, 20:39   #89
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5·937 Posts
Default

Quote:
Originally Posted by ewmayer View Post
You should only have to re-run self-tests for the a53 CPU ... in my experience using both CPUs increases the absolute timings but does not appreciably affect the optimal-FFT-parameters for each CPU, so you can run the a53 self-test without pausing your a73 job. Just make sure to run the a53 timings in a separate dir, so as to create a separate mlucas.cfg file for that CPU, and use -s m -cpu 0:1, obviously. Even using just the default 100 iterations per timing sample that self-test will take a while due to the puniness of the a53 CPU, probably a couple of hours.

If I still had my N2 I'd shoot you the a53-specific mlucas.cfg file, but I didn't save copies of those config files before shipping the unit off to Paul L., I only copied the .stat and savefiles for the 2 jobs I was running on it - the a73 LL-test is now queued up on my Intel NUC, and the a53 DC on my Odroid C2.
Thanks but I have just now configured it for one LL on all six cores. top is showing ~470%. Of course this will be more when I shutdown the browser for the night!

Last fiddled with by paulunderwood on 2019-06-21 at 20:39
paulunderwood is offline   Reply With Quote
Old 2019-06-21, 20:54   #90
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1175610 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Thanks but I have just now configured it for one LL on all six cores. top is showing ~470%. Of course this will be more when I shutdown the browser for the night!
Hmm ... 470% sounds great, but what do the actual run timings show? In my experience, running one job across both CPUs is worse than just using the a73 CPU.
ewmayer is offline   Reply With Quote
Old 2019-06-21, 21:42   #91
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

124D16 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Hmm ... 470% sounds great, but what do the actual run timings show? In my experience, running one job across both CPUs is worse than just using the a73 CPU.
136.4894ms, but that was not idle -- I reverted it back to -cpu 2:5 (with the correct configuration file). The CPUs dropped 3C each during the 6 core run. I am hoping for a great run-to-run value.
paulunderwood is offline   Reply With Quote
Old 2019-07-03, 16:50   #92
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3·199 Posts
Default

I don't know where to post so here we go

I ran mlucas_v19 (posted on this thread IIRC) on a board with a Qualcomm SD845 (Cortex-A75) https://www.96boards.org/product/rb3-platform/

There's no heatsink and no fan so obviously the CPU throttled during the run.

setaffinity was failing; it works for 0:3 but these are the little CPU; 4:7 fails for some reason. I'll take a look at that.

Code:
./mlucas -s m -iters 100 -cpu 4:7
18.0
      2048  msec/iter =   44.03  ROE[avg,max] = [0.255691964, 0.312500000]  radices = 128 16 16 32  0  0  0  0  0  0
      2304  msec/iter =   54.09  ROE[avg,max] = [0.247767857, 0.312500000]  radices = 288 16 16 16  0  0  0  0  0  0
      2560  msec/iter =   55.49  ROE[avg,max] = [0.236635045, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =   65.30  ROE[avg,max] = [0.223967634, 0.250000000]  radices =  44 32 32 32  0  0  0  0  0  0
      3072  msec/iter =   67.83  ROE[avg,max] = [0.270591518, 0.312500000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =   74.63  ROE[avg,max] = [0.224553571, 0.281250000]  radices = 208  8  8  8 16  0  0  0  0  0
      3584  msec/iter =   80.38  ROE[avg,max] = [0.273772321, 0.312500000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =   83.62  ROE[avg,max] = [0.249135045, 0.312500000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =   91.08  ROE[avg,max] = [0.252901786, 0.281250000]  radices = 128 16 32 32  0  0  0  0  0  0
      4608  msec/iter =  101.29  ROE[avg,max] = [0.248046875, 0.312500000]  radices = 288 32 16 16  0  0  0  0  0  0
      5120  msec/iter =  106.82  ROE[avg,max] = [0.235030692, 0.281250000]  radices = 160  8  8 16 16  0  0  0  0  0
      5632  msec/iter =  129.85  ROE[avg,max] = [0.223102679, 0.250000000]  radices = 352 16 16 32  0  0  0  0  0  0
      6144  msec/iter =  138.96  ROE[avg,max] = [0.222753906, 0.281250000]  radices = 768 16 16 16  0  0  0  0  0  0
      6656  msec/iter =  149.21  ROE[avg,max] = [0.271651786, 0.312500000]  radices = 208  8  8 16 16  0  0  0  0  0
      7168  msec/iter =  146.54  ROE[avg,max] = [0.242801339, 0.312500000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =  150.23  ROE[avg,max] = [0.243743025, 0.312500000]  radices = 240 16 32 32  0  0  0  0  0  0
I cross-compiled it myself.
ldesnogu is offline   Reply With Quote
Old 2019-07-03, 19:06   #93
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×2,939 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
I don't know where to post so here we go

I ran mlucas_v19 (posted on this thread IIRC) on a board with a Qualcomm SD845 (Cortex-A75) https://www.96boards.org/product/rb3-platform/

There's no heatsink and no fan so obviously the CPU throttled during the run.

setaffinity was failing; it works for 0:3 but these are the little CPU; 4:7 fails for some reason. I'll take a look at that.

Code:
./mlucas -s m -iters 100 -cpu 4:7
[snip]
I cross-compiled it myself.
So what about -cpu 4:7 failed for you? The data you posted look OK, only weirdness I see is that 5632K seems slower than expected and the timings from 6144-7680 are all in a narrow range, possibly related to throttling.

If you have any way to get some decent airflow over the CPU during your tests, even if it's not a practical one for long-term running, that might be useful, as well as monitoring the temperature data in /sys/class/thermal/thermal_zone*/temp during the run, if those files exist on your system.

Oh, what does the cfg-file for -cpu 0:3 look like?

Last fiddled with by ewmayer on 2019-07-03 at 19:12
ewmayer is offline   Reply With Quote
Old 2019-07-04, 06:24   #94
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

10010101012 Posts
Default

Quote:
Originally Posted by ewmayer View Post
So what about -cpu 4:7 failed for you?
I get messages like this during the run:


Code:
sched_setaffinity: Invalid argument

I will see if I can fix this the issue.



Quote:
The data you posted look OK, only weirdness I see is that 5632K seems slower than expected and the timings from 6144-7680 are all in a narrow range, possibly related to throttling.

If you have any way to get some decent airflow over the CPU during your tests, even if it's not a practical one for long-term running, that might be useful, as well as monitoring the temperature data in /sys/class/thermal/thermal_zone*/temp during the run, if those files exist on your system.
Yeah I will definitely have to fix that throttling issue as I intend on using the board for benchmarking.


Quote:
Oh, what does the cfg-file for -cpu 0:3 look like?
Here it is:
Code:
      2048  msec/iter =   71.01  ROE[avg,max] = [0.238755580, 0.312500000]  radices = 256 16 16 16  0  0  0  0  0  0
      2304  msec/iter =   79.91  ROE[avg,max] = [0.247767857, 0.312500000]  radices = 288 16 16 16  0  0  0  0  0  0
      2560  msec/iter =   87.87  ROE[avg,max] = [0.236635045, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =   98.75  ROE[avg,max] = [0.270312500, 0.375000000]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =  106.91  ROE[avg,max] = [0.270591518, 0.312500000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =  116.82  ROE[avg,max] = [0.252232143, 0.312500000]  radices = 208 16 16 32  0  0  0  0  0  0
      3584  msec/iter =  123.03  ROE[avg,max] = [0.273772321, 0.312500000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =  133.95  ROE[avg,max] = [0.249135045, 0.312500000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =  139.86  ROE[avg,max] = [0.227650670, 0.250000000]  radices = 256 16 16 32  0  0  0  0  0  0
      4608  msec/iter =  160.17  ROE[avg,max] = [0.250837054, 0.343750000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =  180.82  ROE[avg,max] = [0.296875000, 0.343750000]  radices = 320 16 16 32  0  0  0  0  0  0
      5632  msec/iter =  196.83  ROE[avg,max] = [0.223102679, 0.250000000]  radices = 352 16 16 32  0  0  0  0  0  0
      6144  msec/iter =  223.74  ROE[avg,max] = [0.253571429, 0.281250000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =  243.88  ROE[avg,max] = [0.232924107, 0.250000000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =  259.09  ROE[avg,max] = [0.242801339, 0.312500000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =  280.20  ROE[avg,max] = [0.243743025, 0.312500000]  radices = 240 16 32 32  0  0  0  0  0  0

By the way results.txt contains these 3 lines:
Code:
FATAL: iter =         14; nonzero exit carry in radix384_ditN_cy_dif1 - input wordsize may be too small.
FATAL: iter =         14; nonzero exit carry in radix384_ditN_cy_dif1 - input wordsize may be too small.
FATAL: iter =         12; nonzero exit carry in radix384_ditN_cy_dif1 - input wordsize may be too small.
ldesnogu is offline   Reply With Quote
Old 2019-07-04, 10:48   #95
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
...
setaffinity was failing; it works for 0:3 but these are the little CPU; 4:7 fails for some reason. I'll take a look at that.
...
I've encountered much weirdness when it comes to big.LITTLE CPUs in phones, maybe your problem is related. There are different ways the cores can be configured and presented to the user. Try "cat /proc/cpuinfo" under no load, under load, and many times in quick succession (to see how the SoC reacts to going from no load to some load). All the issues boiled down to the number of cores presented to the user being dynamic and seemingly done differently by every manufacturer and gen to gen. If a core is not present in cpuinfo when mlucas tries to do something with that specific core (like set affinity) it usually fails. It looks like your chip is a Snapdragon 845 with DynamIQ, the successor to big.LITTLE ( https://en.wikipedia.org/wiki/DynamIQ#Scheduling ). Never tested one so it'll be interesting to see what quirks it has. If it does need some caressing hopefully it's as simple as running two jobs on 0:3 and letting the SoC do the load balancing for you.

Last fiddled with by M344587487 on 2019-07-04 at 10:49 Reason: Arrays start at 0
M344587487 is offline   Reply With Quote
Old 2019-07-04, 12:43   #96
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×199 Posts
Default

Quote:
Originally Posted by M344587487 View Post
I've encountered much weirdness when it comes to big.LITTLE CPUs in phones, maybe your problem is related. There are different ways the cores can be configured and presented to the user. Try "cat /proc/cpuinfo" under no load, under load, and many times in quick succession (to see how the SoC reacts to going from no load to some load).
I tried that but /proc/cpuinfo always displays the same result.

Quote:
All the issues boiled down to the number of cores presented to the user being dynamic and seemingly done differently by every manufacturer and gen to gen. If a core is not present in cpuinfo when mlucas tries to do something with that specific core (like set affinity) it usually fails. It looks like your chip is a Snapdragon 845 with DynamIQ, the successor to big.LITTLE ( https://en.wikipedia.org/wiki/DynamIQ#Scheduling ). Never tested one so it'll be interesting to see what quirks it has. If it does need some caressing hopefully it's as simple as running two jobs on 0:3 and letting the SoC do the load balancing for you.
Time permitting I will investigate some more. Thanks!

I set the governor to performance and got this:
Code:
      2048  msec/iter =   39.31  ROE[avg,max] = [0.255691964, 0.312500000]  radices = 128 16 16 32  0  0  0  0  0  0
      2304  msec/iter =   47.31  ROE[avg,max] = [0.228906250, 0.265625000]  radices =  36 32 32 32  0  0  0  0  0  0
      2560  msec/iter =   48.45  ROE[avg,max] = [0.236635045, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =   62.14  ROE[avg,max] = [0.243805804, 0.312500000]  radices = 352 16 16 16  0  0  0  0  0  0
      3072  msec/iter =   63.29  ROE[avg,max] = [0.217623465, 0.250000000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =   70.39  ROE[avg,max] = [0.219866071, 0.250000000]  radices =  52 32 32 32  0  0  0  0  0  0
      3584  msec/iter =   73.67  ROE[avg,max] = [0.213588170, 0.265625000]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =   78.74  ROE[avg,max] = [0.249135045, 0.312500000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =   81.66  ROE[avg,max] = [0.252901786, 0.281250000]  radices = 128 16 32 32  0  0  0  0  0  0
      4608  msec/iter =   92.61  ROE[avg,max] = [0.299107143, 0.375000000]  radices = 144 16 32 32  0  0  0  0  0  0
      5120  msec/iter =  100.73  ROE[avg,max] = [0.234685407, 0.281250000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =  118.50  ROE[avg,max] = [0.246205357, 0.312500000]  radices = 176  8  8 16 16  0  0  0  0  0
      6144  msec/iter =  127.28  ROE[avg,max] = [0.253571429, 0.281250000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =  139.66  ROE[avg,max] = [0.271651786, 0.312500000]  radices = 208  8  8 16 16  0  0  0  0  0
      7168  msec/iter =  144.24  ROE[avg,max] = [0.242801339, 0.312500000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =  154.26  ROE[avg,max] = [0.243743025, 0.312500000]  radices = 240 16 32 32  0  0  0  0  0  0
It's faster than the run above except for 7680:
Code:
         perfor         ratio
2048    39.31   44.03   1.12
2304    47.31   54.09   1.14
2560    48.45   55.49   1.15
2816    62.14   65.3    1.05
3072    63.29   67.83   1.07
3328    70.39   74.63   1.06
3584    73.67   80.38   1.09
3840    78.74   83.62   1.06
4096    81.66   91.08   1.12
4608    92.61   101.29  1.09
5120    100.73  106.82  1.06
5632    118.5   129.85  1.10
6144    127.28  138.96  1.09
6656    139.66  149.21  1.07
7168    144.24  146.54  1.02
7680    154.26  150.23  0.97
I saw the temperature going above 95 degrees in some of the thermal zones (the kernel exposes more than 70 thermal zones, hard to know what is what). With nothing running max temp is 75 degrees.

I checked frequency a few times, and it always was 2.8 GHz on the fast chips and 1.8 on the slowest. Given the ratio above it's possible that part of the last two sizes were on the slower CPU.
ldesnogu is offline   Reply With Quote
Old 2019-07-04, 14:04   #97
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×199 Posts
Default

Another run:
Code:
      2048  msec/iter =   35.88  ROE[avg,max] = [0.250446429, 0.281250000]  radices = 1024 32 32  0  0  0  0  0  0  0
      2304  msec/iter =   42.26  ROE[avg,max] = [0.228906250, 0.265625000]  radices =  36 32 32 32  0  0  0  0  0  0
      2560  msec/iter =   45.14  ROE[avg,max] = [0.241992188, 0.281250000]  radices =  40 32 32 32  0  0  0  0  0  0
      2816  msec/iter =   54.73  ROE[avg,max] = [0.223967634, 0.250000000]  radices =  44 32 32 32  0  0  0  0  0  0
      3072  msec/iter =   55.49  ROE[avg,max] = [0.270591518, 0.312500000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =   66.12  ROE[avg,max] = [0.252232143, 0.312500000]  radices = 208 16 16 32  0  0  0  0  0  0
      3584  msec/iter =   70.29  ROE[avg,max] = [0.273772321, 0.312500000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =   74.62  ROE[avg,max] = [0.249135045, 0.312500000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =   80.18  ROE[avg,max] = [0.252901786, 0.281250000]  radices = 128 16 32 32  0  0  0  0  0  0
      4608  msec/iter =   92.21  ROE[avg,max] = [0.299107143, 0.375000000]  radices = 144 16 32 32  0  0  0  0  0  0
      5120  msec/iter =  100.66  ROE[avg,max] = [0.234685407, 0.281250000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =  115.37  ROE[avg,max] = [0.223102679, 0.250000000]  radices = 352 16 16 32  0  0  0  0  0  0
      6144  msec/iter =  127.71  ROE[avg,max] = [0.253571429, 0.281250000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =  140.97  ROE[avg,max] = [0.232924107, 0.250000000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =  146.96  ROE[avg,max] = [0.242801339, 0.312500000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =  156.33  ROE[avg,max] = [0.243743025, 0.312500000]  radices = 240 16 32 32  0  0  0  0  0  0
Crazy variability against previous run.

I monitored frequency and temps every second and frequency didn't change.
Code:
while true; do sleep 1; cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq | tr '\012' '       '; cat /sys/class/thermal/thermal_zone*/temp 2> /dev/null | sort -n | uniq | tail -1 | tr '\012' '    '; date "+%H:%M:%S"; done
After some research it's not obvious I will be able to properly cool the beast since it's using PoP RAM (RAM chip is stacked on the SoC). Add to that the Linux on it is useless (cross compilation is required) and I'm starting to wonder if that board isn't completely useless for my needs
ldesnogu is offline   Reply With Quote
Old 2019-07-04, 14:36   #98
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
I saw the temperature going above 95 degrees in some of the thermal zones (the kernel exposes more than 70 thermal zones, hard to know what is what). With nothing running max temp is 75 degrees.
In the same directory as the temp file, see the type file. It should tell what the corresponding temperature reading is for.
nomead is offline   Reply With Quote
Old 2019-07-04, 19:35   #99
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×2,939 Posts
Default

Laurent, thanks for the data. Based on your timings, it seems the OS is doing a decent job of load-balancing even in the affinity-fail cases, though some of the large run-to-run timing variability you observe may be due to that.

The radix-384 errors you got are expected, that is a newly-added front-end radix in v19, there is a bug in multithreaded runs of it I've so far been unable to find. It failing the self-tests leaves you no worse off at FFT lengths of form 3*2^k that you would be using v18.

Oh, would you be so kind as to post a copy of your /proc/cpuinfo file? Thanks.
ewmayer is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mprime on Odroid 64bit ET_ Software 2 2017-02-24 15:42
GPU72 plans post-announcement garo GPU to 72 25 2013-03-04 10:11
The Prime Announcement Thread axn Sierpinski/Riesel Base 5 61 2008-12-08 16:28
Subscribing to announcement thread fetofs GMP-ECM 1 2006-05-30 04:32
Fourth known factor of M(M31) (preliminary announcement) ewmayer Operazione Doppi Mersennes 22 2005-07-06 00:33

All times are UTC. The time now is 04:28.


Fri Jul 7 04:28:39 UTC 2023 up 323 days, 1:57, 0 users, load averages: 1.41, 1.65, 1.59

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔