![]() |
I don't have one to play with, but I wonder what sort of difference in idle power it makes if HDMI is disabled. From what I just read, on the earlier pi3 it saves about 30mA. Maybe more savings available for rpi4 since it has two HDMI ports, or at least more powerful graphics processing?
Also maybe USB ports could be disabled too(assuming you just access via SSH) for some savings? Is there a build of powertop or similar program which breaks down what devices power is going towards? |
[QUOTE=hansl;520809]I don't have one to play with, but I wonder what sort of difference in idle power it makes if HDMI is disabled. From what I just read, on the earlier pi3 it saves about 30mA. Maybe more savings available for rpi4 since it has two HDMI ports, or at least more powerful graphics processing?[/QUOTE]
I can test this, but I left the thing on my desk at work... Monday at the earliest, then. Also, I don't know what power saving tricks the official Raspbian distribution does by default, maybe it runs cooler? But yeah, Monday. [QUOTE=hansl;520809]Also maybe USB ports could be disabled too(assuming you just access via SSH) for some savings?[/QUOTE] Hmm... will have to look into it. Anyway, there is apparently a firmware update for the USB chip now available, that can reduce the power consumption somewhat - by about 300 mW. [URL="https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=243500&p=1490467&hilit=vl805#p1490467"]https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=243500&p=1490467&hilit=vl805#p1490467[/URL] But apparently this needs to be done under 32-bit Linux (for example plain old Raspbian), trying to run the upgrade utility just gives an error message for me. [QUOTE=hansl;520809]Is there a build of powertop or similar program which breaks down what devices power is going towards?[/QUOTE] Not to my knowledge, no. I was under the impression that it can only tell where the CPU power consumption is going, not the peripherals. |
Nomead, thanks for the data. Re. idle-power, on the Odroid-C2 there is a removable jumper whose pulling-off saves some power, not sure if anything similar on your board.
[QUOTE=nomead;520797]The Pi3B (and 3A) likes to use radix-352 at 2816K FFT size, but the Pi4 for some reason is slower with it (not by much, 78.82 ms/iter for radix-352 vs. 76.70 ms for radix-176). By the way, the same thing happens on the Cortex-A57 on the Jetson Nano. Also, only 2 of 5 radix sets for 2304K passed, so it was skipped, no entry in mlucas.cfg .[/QUOTE] 352 is mainly geared for 5632K where it makes a significant difference on most of my ARM devices, whether it also helps at 2816K is hit or miss. Do you still have the screen log from your self-tests? I'd like to look at the 2304K self-tests outputs to see why only 2 of the various FFT-radix combos at that length passed. Thanks. |
1 Attachment(s)
[QUOTE=ewmayer;520817]Nomead, thanks for the data. Re. idle-power, on the Odroid-C2 there is a removable jumper whose pulling-off saves some power, not sure if anything similar on your board.
352 is mainly geared for 5632K where it makes a significant difference on most of my ARM devices, whether it also helps at 2816K is hit or miss. Do you still have the screen log from your self-tests? I'd like to look at the 2304K self-tests outputs to see why only 2 of the various FFT-radix combos at that length passed. Thanks.[/QUOTE] No jumpers on the Pi4... if something can be disabled, it needs to be done via software or firmware. 352 happens to help on Pi3 / BCM2837 so there it's a definite hit. I'll attach the screenlog to this message. |
[QUOTE=nomead;520819]352 happens to help on Pi3 / BCM2837 so there it's a definite hit.
I'll attach the screenlog to this message.[/QUOTE] Thanks - looking at the 2304K self-tests in the log, 3 of the 5 runs suffer ROE >= 0.4375, and since that means fewer than half the tests passed, the code treats that FFT length as having failed the self-tests. Based on the good data points, here is a manually created cfg-file line for that length: [code] 2304 msec/iter = 61.16 ROE[avg,max] = [0.249911153, 0.343750000] radices = 288 16 16 16 0 0 0 0 0 0[/code] I quick-checked that length on my Odroid C2 just now and 3 of 5 tests passed so it wrote a cfgfile entry for me ... that difference had me puzzled - same code, same CPU hardware - until I recalled that the random residue shift can lead to such otherwise-identical-everything differences. If you try rerunning just that one FFT length in self-test mode via [i] ./Mlucas -fftlen 2304 -iters 100 -cpu 0:3 [/i] you should see different residue shifts from your Mlucas -s m run, and perhaps will get the one more good data point that is needed for the cfg-file to get written. The self-test exponents are already set at the extreme high end of the range computed for each FFT length, so sometimes a little manual hackery of this kind is needed to get a complete set of cfg-file entries. |
[QUOTE=ewmayer;520822]T
[i] ./Mlucas -fftlen 2304 -iters 100 -cpu 0:3 [/i] you should see different residue shifts from your Mlucas -s m run, and perhaps will get the one more good data point that is needed for the cfg-file to get written. The self-test exponents are already set at the extreme high end of the range computed for each FFT length, so sometimes a little manual hackery of this kind is needed to get a complete set of cfg-file entries.[/QUOTE] Yup - and it gives the same [C]288 16 16 16[/C] radix set on three consecutive tries, with 3 of 5 sets passed. Oh, and here are the rest of the self-test runs. v18.0: [CODE] 4096 msec/iter = 116.38 ROE[avg,max] = [0.000227303, 0.312500000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 129.56 ROE[avg,max] = [0.000248429, 0.312500000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 181.85 ROE[avg,max] = [0.000234485, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0 5632 msec/iter = 204.47 ROE[avg,max] = [0.000257845, 0.343750000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 225.43 ROE[avg,max] = [0.000247003, 0.312500000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 242.89 ROE[avg,max] = [0.000266479, 0.375000000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 262.44 ROE[avg,max] = [0.000226100, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 290.09 ROE[avg,max] = [0.000236377, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0[/CODE] preview version: [CODE] 4096 msec/iter = 124.18 ROE[avg,max] = [0.227270067, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 130.03 ROE[avg,max] = [0.249110271, 0.312500000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 154.51 ROE[avg,max] = [0.296955541, 0.375000000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 166.74 ROE[avg,max] = [0.223459145, 0.281250000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 226.16 ROE[avg,max] = [0.246091736, 0.343750000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 243.35 ROE[avg,max] = [0.230394501, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 265.73 ROE[avg,max] = [0.236601462, 0.312500000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 283.72 ROE[avg,max] = [0.235477282, 0.343750000] radices = 240 32 32 16 0 0 0 0 0 0[/CODE] Indeed, there's a huge difference in 5120K and 5632K FFT sizes' speeds, because of those radix sets with 320 and 352. |
[QUOTE=nomead;520830]Yup - and it gives the same [C]288 16 16 16[/C] radix set on three consecutive tries, with 3 of 5 sets passed.[/quote]
If you rerun the same single-FFT-length way, you will get the same initial radix shift, and thus run-to-run data will be identical. You can, however, manually fiddle the initial shift via the -shift flag, if you like. [timings snipped] [quote]Indeed, there's a huge difference in 5120K and 5632K FFT sizes' speeds, because of those radix sets with 320 and 352.[/QUOTE] Hmm ... a healthy speedup at 5632K I can believe because radix-352 is new in v19, but radix-320 was already there in v18. Maybe rerun the 5120K self-test once more using each of the v18 and v19 builds? |
[QUOTE=ewmayer;520839]Hmm ... a healthy speedup at 5632K I can believe because radix-352 is new in v19, but radix-320 was already there in v18. Maybe rerun the 5120K self-test once more using each of the v18 and v19 builds?[/QUOTE]
Apparently v18 happened to give excessive roundoff on both [C]320 16 16 32[/C] and [C]320 32 16 16[/C] so that's why it wasn't using it. So again, yes, hand-massaging the test would help here. |
[QUOTE=nomead;520861]Apparently v18 happened to give excessive roundoff on both [C]320 16 16 32[/C] and [C]320 32 16 16[/C] so that's why it wasn't using it. So again, yes, hand-massaging the test would help here.[/QUOTE]
Sounds like I need to back off a bit on the self-test exponents in v19, to make sure faster but slightly more roundoff-prone FFT radix combos don't go by the wayside like that. |
Okay, power saving measurements:
[C]force_turbo=1[/C] in the configuration file for both Gentoo and Debian to keep it at 1.5 GHz even when idle. Baseline (Gentoo 64-bit, nothing disabled yet) 0.69A idle -> 1.29A Mlucas running a doublecheck at 2816K FFT Raspbian (because the firmware updater only runs on 32-bit Linux) : 0.65A idle before USB update 0.59A idle after USB update So yes, Raspbian does something different and saves a bit more power at idle. Gentoo after USB firmware update, HDMI still on: 0.61A idle -> 1.21A Mlucas Turning HDMI off with [C]tvservice -o[/C] saves a further 0.02 Amps apparently. |
powertop is worth a shot if it works, on my laptop it can disable controllers for USB, ethernet, SATA and other PCI devices. The older pi's USB/ethernet controller was a power hog if I remember rightly.
|
| All times are UTC. The time now is 04:26. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.