mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Glorious CCCG thread -- Cellphone Compute Cluster for GIMPS (https://www.mersenneforum.org/showthread.php?t=23998)

M344587487 2019-02-06 09:54

1 Attachment(s)
Just received a phone containing a Helio X25 SoC, a 10 core 20nm 2xA72 4xA53 4xA53. The /proc/cpuinfo is very odd, it only shows cores 4 and 5. Haven't been able to root it yet so don't know if mlucas will run with cores other than 4 and 5 specified. It got me thinking that perhaps the S7 results are not optimal so here's testing at 1024K with different worker groupings. Defining more than 4 cores per worker doesn't work, nor does specifying cores 6 and 7. The prior testing was 0:3 4,5 because /proc/cpuinfo shows those as two distinct sets of cores.

[B]2 workers, (0:3)(4,5) (redone because ambient is higher than before)[/B]
Primary average: ~41.3 ms/it
Secondary average: ~46.8 ms/it
Combined: ~21.9 ms/it

[B]2 workers (0:3) (0:3) (see if a worker gets migrated successfully to the second cluster as (4:7) or otherwise)[/B]
Primary average: ~122.9 ms/it
Secondary average: ~122.9 ms/it
Combined: ~61.45 ms/it

Not only could neither worker utilise the second cluster, but the competition for CPU time of the first cluster tanked performance way beyond roughly halving the performance of each.

[B]3 workers, (0:3)(4,5)(4,5) (hope the dual core workers will be spread over cores 6 and 7)[/B]
Primary average: ~49.4 ms/it
Secondary average: ~63.9 ms/it
Tertiary average: ~63.9 ms/it
Combined: ~19.4 ms/it

This is a nice improvement over (0:3)(4,5), showing that maximising throughput is not as simple as assigning a worker per cluster.

[B]8 workers (0)(1)(2)(3)(4)(4)(5)(5) (allow the scheduler the most flexibility and let it know we really want to use all 8 cores)[/B]

Sample of top:
[code]16189 auron 20 0 35884 0 2972 S 94.0 0.0 11:21.98 mlucas
16169 auron 20 0 35884 0 2976 R 91.7 0.0 10:35.65 mlucas
16161 auron 20 0 35884 0 3088 S 90.8 0.0 10:34.31 mlucas
16181 auron 20 0 35884 0 2972 S 89.5 0.0 10:28.45 mlucas
16185 auron 20 0 35884 0 3076 S 87.0 0.0 11:26.20 mlucas
16177 auron 20 0 35884 0 2972 S 84.1 0.0 10:34.64 mlucas
16173 auron 20 0 35884 0 3036 R 82.2 0.0 10:28.98 mlucas
16165 auron 20 0 35884 0 2976 S 81.3 0.0 10:35.28 mlucas[/code]Jobs keep sleeping, it's possible that a limited number of background tasks can be active at any one time. CPU percentages bouncing in the mid 80s, previous tests with less workers have been rock solid at 100%.

All workers averaged ~115.5 ms/it each
Combined: ~14.44 ms/it

It's interesting that for the first 10k iterations a job hung on to core 4 and another to core 5 to be ~4:30 faster than the other jobs, but after that the timings averaged out, the jobs must have bounced around everywhere to do that. Regardless these results are much better than one worker per assignable cluster.

[B]5 workers (0:3)(4)(4)(5)(5) (last test had a lot of bouncing around, here the quad core worker should not be able to get bounced to 4:7. Hopefully the rest will make good use of 6 and 7)[/B]

Sample of top:
[code] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18268 auron 20 0 85132 0 3028 R 100.0 0.0 85:27.17 mlucas
18256 auron 20 0 35884 0 3152 S 94.2 0.0 32:50.38 mlucas
18260 auron 20 0 35884 0 3100 S 90.0 0.0 32:44.48 mlucas
18264 auron 20 0 35884 0 3040 S 89.4 0.0 32:50.18 mlucas
18252 auron 20 0 35884 0 3028 R 86.8 0.0 32:46.25 mlucas[/code](0:3) stayed at 100.0%, the rest bounced around.

(0:3) average: ~48.2 ms/it
The rest: ~89.8 ms/it each
Combined: ~15.32 ms/it

[B]6 workers (0)(1)(2)(3)(4)(5) (sanity check that overloading cores 4 and 5 does push work to cores 6 and 7 and has a positive effect)[/B]

[code] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19883 auron 20 0 35884 0 3040 S 100.0 0.0 26:50.88 mlucas
19904 auron 20 0 35884 0 3040 S 100.0 0.0 27:00.89 mlucas
19888 auron 20 0 35884 0 2972 S 100.0 0.0 26:51.93 mlucas
19900 auron 20 0 35884 0 2972 S 97.1 0.0 26:50.02 mlucas
19892 auron 20 0 35884 0 3040 R 95.5 0.0 26:55.67 mlucas
19896 auron 20 0 35884 0 2972 S 94.2 0.0 26:45.11 mlucas[/code]Not rock solid but all cores stay close to 100% utilisation.

All workers averaged ~92.1 ms/it each
Combined: ~15.35 ms/it

Also tried 8 workers no cpu flag and 8 workers all -cpu 0, in both cases they all had ~20% utilisation on top and couldn't spit out a result as they were fighting for CPU time. So the CPU flag is important beyond defining how many cores a worker uses, which doesn't bode well for Helio performance if it can be rooted.

ewmayer 2019-02-06 20:53

[QUOTE=M344587487;507799] Also tried 8 workers no cpu flag and 8 workers all -cpu 0, in both cases they all had ~20% utilisation on top and couldn't spit out a result as they were fighting for CPU time. So the CPU flag is important beyond defining how many cores a worker uses, which doesn't bode well for Helio performance if it can be rooted.[/QUOTE]

On the plus side, you were able to boost your S7 throughput by 50% by using 8 workers and letting the scheduler bounce them among the available cores, presumably including the 2 not-listed-in-proc/cpuinfo ones. Indeed very weird re. the contents of that file on the Helio, though.

M344587487 2019-02-06 21:41

1 Attachment(s)
[QUOTE=ewmayer;507857]On the plus side, you were able to boost your S7 throughput by 50% by using 8 workers and letting the scheduler bounce them among the available cores, presumably including the 2 not-listed-in-proc/cpuinfo ones. Indeed very weird re. the contents of that file on the Helio, though.[/QUOTE]
Very pleased with the boost to throughput, will have to do some device juggling to measure power consumption as currently they're all powered from the same hub. Got mlucas running on the Helio device but it's as temperamental as expected. Can only set one or two cores to a worker and they have to be defined to mlucas as 0 and 1. Having trouble getting a worker config test going, overloaded cores 0 and 1 manually with 5 single core workers each but that ground ssh to a halt then killed it. The test might be running as we speak but I can't check without invalidating the results. Can't access cpu data in /sys/devices/system/cpu/cpu* except for some of cores 4 and 5, had no trouble on the S7 even for cores 6 and 7 I believe. The selftest log is littered with "sched_setaffinity: Invalid argument". lscpu seg faults when not executed as root, prints nothing when executed as root. I think this device is destined for the shelf for a little while.


Attached are the configs and selftests of one and two workers, and the /proc/cpuinfo. This /proc/cpuinfo labels them as cores 0 and 1, I swear a few reboots ago it labelled them as cores 4 and 5. The odd access of data in /sys/* gives me hope that I'm not going completely batty ;)

M344587487 2019-02-06 23:50

Helio X25 10 workers 1024K
 
[QUOTE=M344587487;507863]...Can only set one or two cores to a worker and they have to be defined to mlucas as 0 and 1. Having trouble getting a worker config test going, overloaded cores 0 and 1 manually with 5 single core workers each but that ground ssh to a halt then killed it. The test might be running as we speak but I can't check without invalidating the results...[/QUOTE]Surprisingly the test worked and more surprisingly seemed to utilise many of the cores:
[code]INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 21:07:23] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:37:20.036 [ 0.2240 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:45:06] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:43.092 [ 0.2263 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 22:22:46] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:39.339 [ 0.2259 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 23:00:18] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:31.660 [ 0.2252 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 21:07:36] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:37:30.727 [ 0.2251 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:45:31] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:55.136 [ 0.2275 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 22:23:25] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:53.227 [ 0.2273 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 23:01:11] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:45.907 [ 0.2266 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 21:07:25] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:37:18.322 [ 0.2238 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:45:11] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:45.640 [ 0.2266 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 22:22:48] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:35.923 [ 0.2256 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 23:00:21] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:33.209 [ 0.2253 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 21:07:27] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:37:18.261 [ 0.2238 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:45:04] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:36.859 [ 0.2257 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 22:22:45] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:40.944 [ 0.2261 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 23:00:12] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:26.261 [ 0.2246 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 21:07:17] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:37:06.340 [ 0.2226 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:44:51] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:33.542 [ 0.2254 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 22:22:27] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:36.272 [ 0.2256 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 22:59:52] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:23.807 [ 0.2244 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 20:38:34] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:26:31.411 [ 0.1591 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:15:49] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:14.495 [ 0.2234 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 21:53:30] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:40.617 [ 0.2261 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 22:31:01] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:30.070 [ 0.2250 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
[Feb 06 23:08:21] M20000047 Iter# = 50000 [ 0.25% complete] clocks = 00:37:19.793 [ 0.2240 sec/iter] Res64: 39BDD0E9AB6C5ACB. AvgMaxErr = 0.231758791. MaxErr = 0.312500000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 20:39:10] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:26:49.547 [ 0.1610 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:16:25] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:13.820 [ 0.2234 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 21:54:06] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:40.511 [ 0.2261 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 22:31:36] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:29.689 [ 0.2250 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
[Feb 06 23:09:03] M20000047 Iter# = 50000 [ 0.25% complete] clocks = 00:37:26.491 [ 0.2246 sec/iter] Res64: 39BDD0E9AB6C5ACB. AvgMaxErr = 0.231758791. MaxErr = 0.312500000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 20:37:09] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:25:57.792 [ 0.1558 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:14:18] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:08.944 [ 0.2229 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 21:51:50] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:31.510 [ 0.2252 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 22:29:20] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:29.360 [ 0.2249 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
[Feb 06 23:06:37] M20000047 Iter# = 50000 [ 0.25% complete] clocks = 00:37:17.098 [ 0.2237 sec/iter] Res64: 39BDD0E9AB6C5ACB. AvgMaxErr = 0.231758791. MaxErr = 0.312500000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 20:36:52] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:25:46.168 [ 0.1546 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:14:03] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:10.774 [ 0.2231 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 21:51:38] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:34.347 [ 0.2254 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 22:29:14] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:35.553 [ 0.2256 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
[Feb 06 23:06:36] M20000047 Iter# = 50000 [ 0.25% complete] clocks = 00:37:21.427 [ 0.2241 sec/iter] Res64: 39BDD0E9AB6C5ACB. AvgMaxErr = 0.231758791. MaxErr = 0.312500000.
INFO: no restart file found...starting run from scratch.
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
[Feb 06 20:36:49] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:25:47.597 [ 0.1548 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230922558. MaxErr = 0.312500000.
[Feb 06 21:14:00] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:37:09.658 [ 0.2230 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231116997. MaxErr = 0.343750000.
[Feb 06 21:51:28] M20000047 Iter# = 30000 [ 0.15% complete] clocks = 00:37:27.659 [ 0.2248 sec/iter] Res64: 284DC13397AAE4DB. AvgMaxErr = 0.231374126. MaxErr = 0.312500000.
[Feb 06 22:28:54] M20000047 Iter# = 40000 [ 0.20% complete] clocks = 00:37:26.105 [ 0.2246 sec/iter] Res64: C3ECAE145A41D57D. AvgMaxErr = 0.231443100. MaxErr = 0.343750000.
[Feb 06 23:06:10] M20000047 Iter# = 50000 [ 0.25% complete] clocks = 00:37:15.388 [ 0.2235 sec/iter] Res64: 39BDD0E9AB6C5ACB. AvgMaxErr = 0.231758791. MaxErr = 0.312500000.[/code]The first 10K timings were quicker on the last 5 workers because I started those first and ssh became unresponsive, then managed to limp back on and start the first 5 before getting kicked completely. tl;dr the synthetic 1024K timing is ~22.5 ms/it combining 10 workers. I didn't expect it to overcome the affinity oddities but it's seemed to. If it's going to make a habit of killing ssh that is going to put a wrench into actually administrating the device, had to restart the environment from the device to get back on.

/proc/cpuinfo is also spitting out 4 and 5 again:
[code]Processor : AArch64 Processor rev 4 (aarch64)
processor : 4
model name : AArch64 Processor rev 4 (aarch64)
BogoMIPS : 26.00
BogoMIPS : 26.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 5
model name : AArch64 Processor rev 4 (aarch64)
BogoMIPS : 26.00
BogoMIPS : 26.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

Hardware : MT6797T[/code]

ewmayer 2019-02-07 02:31

10 multimegadigit-modulus LL tests running simultaneously on a smartphone - the mind boggles...

kriesel 2019-02-07 15:22

[QUOTE=ewmayer;507883]10 multimegadigit-modulus LL tests running simultaneously on a smartphone - the mind boggles...[/QUOTE]
So, ~0.5sec/iter x 10 processes for 50M double-checks? ~3 DCs per year per phone, with a long latency. If Moore's law holds a while yet for phones that would be good.

ewmayer 2019-02-07 22:13

[QUOTE=kriesel;507916]So, ~0.5sec/iter x 10 processes for 50M double-checks? ~3 DCs per year per phone, with a long latency. If Moore's law holds a while yet for phones that would be good.[/QUOTE]

The numbers are actually better than that - 0.5sec/iter means each job does ~170,000 iter/day, thus total throughput ~1.7Miter/day, thus ~1 DC per month on average. And on Galaxy S7, M344587487's latest throughput-maximizing tweaks yielded ~50% more throughput than that, ~1 DC per 20 days.

M344587487 2019-02-07 22:44

[QUOTE=M344587487;507184]Ran the simultaneous 1024K test for a while, it's taking ~118 minutes to consume 0.01 kWh, giving a power consumption of ~5.085W unless my maths is off. It's still running so the figure may get slightly more accurate, but it's ticked over three times now and they've all been ~118 minutes for 0.01 kWh.[/QUOTE]
For the 0 1 2 3 4 4 5 5 config it takes ~88 minutes to consume 0.01 kWh, giving a power consumption of ~6.82 W. This is still more efficient than before as the timings are so much better.

[QUOTE=ewmayer;507970]The numbers are actually better than that - 0.5sec/iter means each job does ~170,000 iter/day, thus total throughput ~1.7Miter/day, thus ~1 DC per month on average. And on Galaxy S7, M344587487's latest throughput-maximizing tweaks yielded ~50% more throughput than that, ~1 DC per 20 days.[/QUOTE]
The numbers haven't had much of a chance to settle yet but the S7 8 worker 2816K test below is showing ~1 DC per month on average for 51M, unfortunately the S7 is only an 8 core. I make that out to be ~4.9 kWh per DC, how does that compare?

[code]INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 176 16 16 32
[Feb 07 21:39:50] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 01:02:38.442 [ 0.3758 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.071504672. MaxErr = 0.093750000.
[Feb 07 22:43:09] M51347137 Iter# = 20000 [ 0.04% complete] clocks = 01:03:17.943 [ 0.3798 sec/iter] Res64: 8EBC72230AB26EC8. AvgMaxErr = 0.071739440. MaxErr = 0.093750000.
INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 176 16 16 32
[Feb 07 21:39:00] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 01:01:43.985 [ 0.3704 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.071504672. MaxErr = 0.093750000.
[Feb 07 22:41:13] M51347137 Iter# = 20000 [ 0.04% complete] clocks = 01:02:11.978 [ 0.3732 sec/iter] Res64: 8EBC72230AB26EC8. AvgMaxErr = 0.071739440. MaxErr = 0.093750000.
INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 176 16 16 32
[Feb 07 21:39:48] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 01:02:27.807 [ 0.3748 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.071504672. MaxErr = 0.093750000.
[Feb 07 22:42:18] M51347137 Iter# = 20000 [ 0.04% complete] clocks = 01:02:29.213 [ 0.3749 sec/iter] Res64: 8EBC72230AB26EC8. AvgMaxErr = 0.071739440. MaxErr = 0.093750000.
INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 176 16 16 32
[Feb 07 21:39:58] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 01:02:35.987 [ 0.3756 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.071504672. MaxErr = 0.093750000.
INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 44 8 16 16 16
[Feb 07 22:04:12] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 01:26:43.203 [ 0.5203 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.079237470. MaxErr = 0.109375000.
INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 44 8 16 16 16
[Feb 07 22:03:58] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 01:26:30.866 [ 0.5191 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.079237470. MaxErr = 0.109375000.
INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 44 8 16 16 16
[Feb 07 21:31:49] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 00:54:23.420 [ 0.3263 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.079237470. MaxErr = 0.109375000.
[Feb 07 22:26:06] M51347137 Iter# = 20000 [ 0.04% complete] clocks = 00:54:15.822 [ 0.3256 sec/iter] Res64: 8EBC72230AB26EC8. AvgMaxErr = 0.079258855. MaxErr = 0.109375000.
INFO: no restart file found...starting run from scratch.
M51347137: using FFT length 2816K = 2883584 8-byte floats.
this gives an average 17.806707555597480 bits per digit
Using complex FFT radices 44 8 16 16 16
[Feb 07 21:42:19] M51347137 Iter# = 10000 [ 0.02% complete] clocks = 01:04:55.295 [ 0.3895 sec/iter] Res64: 3A8A50F278914957. AvgMaxErr = 0.079237470. MaxErr = 0.109375000.[/code]

ewmayer 2019-02-07 23:23

[QUOTE=M344587487;507975]The numbers haven't had much of a chance to settle yet but the S7 8 worker 2816K test below is showing ~1 DC per month on average for 51M, unfortunately the S7 is only an 8 core. I make that out to be ~4.9 kWh per DC, how does that compare?[/QUOTE]

Right - 50M only neends a 2560K FFT, so you are suffering a slowdown due to the larger FFT length as well as things not scaling quite as n*log(n)-wise as one would like vs your 1024K numbers.

Ken could probably provide more informed comment re. the power numbers vs top-end (in terms of Prime95 throughput/watt) x86 gear, but I suspect ~5kWh/DC is fairly competitive.

kriesel 2019-02-08 00:00

[QUOTE=ewmayer;507970]The numbers are actually better than that - 0.5sec/iter means each job does ~170,000 iter/day, thus total throughput ~1.7Miter/day, thus ~1 DC per month on average. And on Galaxy S7, M344587487's latest throughput-maximizing tweaks yielded ~50% more throughput than that, ~1 DC per 20 days.[/QUOTE]
My bad, for some reason I scaled it up roughly to 5M fft length, which is much more than needed. So, good news, yes.

Re KWh/DC data, I have none. Other than a large aggregate power bill that is. Wood stove and silicon keeping my house warm in an ice storm.

chalsall 2019-02-08 00:14

[QUOTE=kriesel;507982]My bad, for some reason I scaled it up roughly to 5M fft length, which is much more than needed.[/QUOTE]

Take it amusingly.

Sometimes people don't have a clue about magnitude. What is a Watt per Hour vs. a Pound per square inch?


All times are UTC. The time now is 16:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.