![]() |
|
|
#122 | |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Quote:
Many thanks for the builds and timings, Tom and Laurent. So A57 is a big step up (roughly 3x faster, comparing the respective 4-threaded timing data) from my little A53, performance-wise, even with no better SIMD-versus-not speedup factor. The || scaling on the A57 is really nice - based on my x86 experience, I did not expect such good numbers in going from 4 to 8 threads. Tom, without giving away any trade secrets, can you comment on the odds of improved per-cycle SIMD in forthcoming updates of the ARM architecture? As I noted earlier, even a fairly modest upgrade to the dual-issue capabilities of being able to do one vector add and one vector mul per cycle would give the FFT code a nice boost. I expect/hope that performing an FADD instruction in hardware draws a sufficiently lower amount of power than FMUL/FMA that such restricted dual-issuance would not wreck the low-power aspects of the architecture. (If such an enhancement is not on the current roadmap, maybe if we take key HW-panners out for dinner, get them good and drunk, and extract suitable promises form them...) The A57 8-core/8-thread timings are only ~20% slower than those I get from an AVX2 build of the code on my dual-2-GHz-Broadwell-core Intel NUC (running 4-threaded on the 2 hardware cores, which gives me the best overall throughput on that system), it would be illuminating to compare the wattages of those two low-power-oriented solutions. Roughly what is the cost of a bare-bones 8-core A57? My NUC cost somewhere in the $400-500 range. Last fiddled with by ewmayer on 2017-11-10 at 01:18 |
|
|
|
|
|
|
#123 |
|
∂2ω=0
Sep 2002
República de California
1175610 Posts |
Tom's Hardware piece from January 2016 on the A72 - it mentions reduced latency for e.g. FMA, but actual numbers will tell the tale of whether that boosts the FFT code appreciably.
|
|
|
|
|
|
#124 | |
|
Sep 2003
2·5·7·37 Posts |
Quote:
The values cortex-a57.cortex-a53, cortex-a72.cortex-a53, cortex-a73.cortex-a35, cortex-a73.cortex-a53 specify that GCC should tune for a big.LITTLE system. |
|
|
|
|
|
|
#125 |
|
Nov 2017
2 Posts |
I have access to GGC Farm (https://cfarm.tetaneutral.net/machines/list/) and I did a test of compiling and running a benchmark of mlucas on those machine and here are the results.
If you need more different test or more info, just ask. I used Mlucas v17.1 Compile: $ gcc -c -O3 -DUSE_ARM_V8_SIMD ../src/*.c |& tee build.log Great, no error $ gcc --version gcc (SUSE Linux) 5.3.1 20160301 [gcc-5-branch revision 233849] $ gcc -o mlucas *.o -lm -lpthread -lrt $ ./mlucas -s m |& tee selftest.log On AMD Opteron 1100 (gcc118), the content of mlucas.cfg is: Code:
17.1
1024 msec/iter = 49.45 ROE[avg,max] = [0.217382812, 0.281250000] radices = 16 32 32 32 0 0 0 0 0 0
1152 msec/iter = 60.75 ROE[avg,max] = [0.237018694, 0.281250000] radices = 36 16 32 32 0 0 0 0 0 0
1280 msec/iter = 66.72 ROE[avg,max] = [0.289955357, 0.375000000] radices = 20 32 32 32 0 0 0 0 0 0
1408 msec/iter = 78.04 ROE[avg,max] = [0.244419643, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0
1536 msec/iter = 80.59 ROE[avg,max] = [0.227845982, 0.281250000] radices = 24 32 32 32 0 0 0 0 0 0
1664 msec/iter = 94.73 ROE[avg,max] = [0.226674107, 0.281250000] radices = 52 16 32 32 0 0 0 0 0 0
1792 msec/iter = 97.68 ROE[avg,max] = [0.230078125, 0.281250000] radices = 56 16 32 32 0 0 0 0 0 0
1920 msec/iter = 110.56 ROE[avg,max] = [0.234151786, 0.265625000] radices = 60 16 32 32 0 0 0 0 0 0
2048 msec/iter = 109.44 ROE[avg,max] = [0.228236607, 0.281250000] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 132.20 ROE[avg,max] = [0.250456892, 0.312500000] radices = 36 32 32 32 0 0 0 0 0 0
2560 msec/iter = 146.44 ROE[avg,max] = [0.236383929, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 167.99 ROE[avg,max] = [0.260044643, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 175.99 ROE[avg,max] = [0.224818638, 0.251953125] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 197.09 ROE[avg,max] = [0.280803571, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0
3584 msec/iter = 206.07 ROE[avg,max] = [0.223172433, 0.250000000] radices = 56 32 32 32 0 0 0 0 0 0
3840 msec/iter = 229.05 ROE[avg,max] = [0.248437500, 0.343750000] radices = 240 32 16 16 0 0 0 0 0 0
4096 msec/iter = 240.88 ROE[avg,max] = [0.243750000, 0.296875000] radices = 64 32 32 32 0 0 0 0 0 0
4608 msec/iter = 276.55 ROE[avg,max] = [0.251339286, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0
5120 msec/iter = 300.46 ROE[avg,max] = [0.237053571, 0.265625000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 342.46 ROE[avg,max] = [0.261160714, 0.281250000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 380.17 ROE[avg,max] = [0.255022321, 0.343750000] radices = 192 16 32 32 0 0 0 0 0 0
6656 msec/iter = 403.75 ROE[avg,max] = [0.266085379, 0.312500000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 433.80 ROE[avg,max] = [0.233168248, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0
7680 msec/iter = 468.23 ROE[avg,max] = [0.239662388, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0
Code:
17.1
1024 msec/iter = 70.94 ROE[avg,max] = [0.228257533, 0.281250000] radices = 32 8 8 16 16 0 0 0 0 0
1152 msec/iter = 81.80 ROE[avg,max] = [0.221268136, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 91.57 ROE[avg,max] = [0.264508929, 0.343750000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 107.83 ROE[avg,max] = [0.227343750, 0.265625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 114.63 ROE[avg,max] = [0.272656250, 0.312500000] radices = 48 16 32 32 0 0 0 0 0 0
1664 msec/iter = 127.96 ROE[avg,max] = [0.270758929, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 137.40 ROE[avg,max] = [0.230078125, 0.281250000] radices = 56 16 32 32 0 0 0 0 0 0
1920 msec/iter = 146.12 ROE[avg,max] = [0.257756696, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 155.02 ROE[avg,max] = [0.236921038, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 177.40 ROE[avg,max] = [0.248751395, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 194.36 ROE[avg,max] = [0.236383929, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 225.86 ROE[avg,max] = [0.260044643, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 243.38 ROE[avg,max] = [0.267466518, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 266.68 ROE[avg,max] = [0.279910714, 0.343750000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 285.43 ROE[avg,max] = [0.252566964, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 303.83 ROE[avg,max] = [0.249302455, 0.343750000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 322.17 ROE[avg,max] = [0.229129464, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 366.85 ROE[avg,max] = [0.249079241, 0.281250000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 414.81 ROE[avg,max] = [0.232087054, 0.250000000] radices = 160 8 8 16 16 0 0 0 0 0
5632 msec/iter = 480.58 ROE[avg,max] = [0.232352121, 0.281250000] radices = 176 8 8 16 16 0 0 0 0 0
6144 msec/iter = 518.04 ROE[avg,max] = [0.297767857, 0.343750000] radices = 192 8 8 16 16 0 0 0 0 0
6656 msec/iter = 568.81 ROE[avg,max] = [0.310044643, 0.375000000] radices = 208 8 8 16 16 0 0 0 0 0
7168 msec/iter = 610.64 ROE[avg,max] = [0.234877232, 0.281250000] radices = 224 8 8 16 16 0 0 0 0 0
7680 msec/iter = 650.18 ROE[avg,max] = [0.245975167, 0.281250000] radices = 240 8 8 16 16 0 0 0 0 0
gcc113: $ uname -a Linux gcc113 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:45:34 UTC 2016 aarch64 aarch64 aarch64 GNU/Linux $ cat /proc/cpuinfo Processor : AArch64 Processor rev 1 (aarch64) processor : 0 BogoMIPS : 100.00 processor : 1 BogoMIPS : 100.00 processor : 2 BogoMIPS : 100.00 processor : 3 BogoMIPS : 100.00 processor : 4 BogoMIPS : 100.00 processor : 5 BogoMIPS : 100.00 processor : 6 BogoMIPS : 100.00 processor : 7 BogoMIPS : 100.00 Features : fp asimd evtstrm CPU implementer : 0x50 CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0x000 CPU revision : 1 Hardware : APM X-Gene Mustang board $ free total used free shared buffers cached Mem: 32969968 15259696 17710272 92 259780 13519520 -/+ buffers/cache: 1480396 31489572 Swap: 20409340 55456 20353884 gcc118: $ uname -a Linux gcc118 4.1.12-1-default #1 SMP Thu Oct 29 06:43:42 UTC 2015 (e24bad1) aarch64 aarch64 aarch64 GNU/Linux $ cat /proc/cpuinfo processor : 0 [nid: 0] ... processor : 7 [nid: 0] Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd07 CPU revision : 2 $ free total used free shared buffers cached Mem: 16642240 15505856 1136384 7680 4032 15058944 -/+ buffers/cache: 442880 16199360 Swap: 2104256 3712 2100544 Is it useful ? Do you need more info ? Best regards Rocky Last fiddled with by ewmayer on 2017-11-17 at 22:43 Reason: wrapped cfg-file date in code flag for readability |
|
|
|
|
|
#126 |
|
∂2ω=0
Sep 2002
República de California
2DEC16 Posts |
Thanks for the build/system info and timings, Rocky - I wrapped your cfg-file date in code flags for better readability.
Some questions: 1. Are these based on specific Cortex-A[something] versions, or are they AMD-custom implementations of the ARMv8 architecture? Are they available retail, and if so, at what prices? 2. What does e.g. 'gcc118' refer to? Is that some variant of the GCC compiler? (I've only heard of GCC versions up to 7, so the '118' puzzles me, since v11.8 makes no sense.) It seems rather to refer to the 2 boards you ran on, is that right? 3. Your Opteron 1100 cfg-file timings look a lot like Tom Womack's for A57 in post 114 on page 11 of this thread, if we extrapolate backward from his 4-threaded timings to your 1-threaded ones. Since both of your boards are octocores, can you also run the timings 4-threaded (./mlucas -s m -cpu 0:3) and 8-threaded (./mlucas -s m -cpu 0:7) on both systems and post the resulting cfg-file data? (Please warp them in code-blocks like I did for readability.) |
|
|
|
|
|
#127 |
|
Jan 2008
France
3×199 Posts |
Opteron 1100 is an 8 core Cortex-A57 (and likely what Tom used). X-Gene uses a custom core.
Both would cost you a lot if you can find them I'm afraid. gcc113 and gcc118 are the names of the machines in the gcc build farm that rocky linked in his message. |
|
|
|
|
|
#128 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Thanks Laurent - looks like what we need is to convince an outfit like Hardkernel to put out an A57-based version of their Odroids. The A57 has been out for enough years that I'm rather surprised at the lack of cheap micro-PC boards based on it.
|
|
|
|
|
|
#129 |
|
Banned
"Luigi"
Aug 2002
Team Italia
486510 Posts |
Here is my testbed:
Code:
Linux pi64 4.10.0-rc5-v8 #1 SMP PREEMPT Wed Jan 25 20:13:50 GMT 2017 aarch64 GNU/Linux grep asimd /proc/cpuinfo Features : fp asimd evtstrm crc32 Features : fp asimd evtstrm crc32 Features : fp asimd evtstrm crc32 Features : fp asimd evtstrm crc32 gcc version 5.4.0 (Gentoo 5.4.0-r2 p1.2, pie-0.6.5) Code:
17.1
1024 msec/iter = 65.15 ROE[avg,max] = [0.254687500, 0.312500000] radices = 256 8 16 16 0 0 0 0 0 0
1152 msec/iter = 75.04 ROE[avg,max] = [0.223256138, 0.281250000] radices = 144 16 16 16 0 0 0 0 0 0
1280 msec/iter = 81.41 ROE[avg,max] = [0.264508929, 0.343750000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 94.69 ROE[avg,max] = [0.227343750, 0.265625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 103.01 ROE[avg,max] = [0.254241071, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 114.01 ROE[avg,max] = [0.270758929, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 123.92 ROE[avg,max] = [0.220532663, 0.250000000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 134.22 ROE[avg,max] = [0.257756696, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 143.26 ROE[avg,max] = [0.236921038, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 165.91 ROE[avg,max] = [0.248751395, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 187.56 ROE[avg,max] = [0.236908831, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0
2816 msec/iter = 215.95 ROE[avg,max] = [0.262500000, 0.312500000] radices = 176 32 16 16 0 0 0 0 0 0
3072 msec/iter = 234.68 ROE[avg,max] = [0.262111119, 0.312500000] radices = 192 32 16 16 0 0 0 0 0 0
3328 msec/iter = 259.13 ROE[avg,max] = [0.281250000, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0
3584 msec/iter = 282.45 ROE[avg,max] = [0.252343750, 0.312500000] radices = 224 32 16 16 0 0 0 0 0 0
3840 msec/iter = 306.40 ROE[avg,max] = [0.248437500, 0.343750000] radices = 240 32 16 16 0 0 0 0 0 0
4096 msec/iter = 326.20 ROE[avg,max] = [0.228655134, 0.281250000] radices = 256 32 16 16 0 0 0 0 0 0
4608 msec/iter = 378.54 ROE[avg,max] = [0.251339286, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0
5120 msec/iter = 417.17 ROE[avg,max] = [0.237137277, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 479.39 ROE[avg,max] = [0.256919643, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 527.16 ROE[avg,max] = [0.246651786, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 581.98 ROE[avg,max] = [0.262500000, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0
7168 msec/iter = 635.62 ROE[avg,max] = [0.224874442, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 691.29 ROE[avg,max] = [0.237053571, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0
Luigi Last fiddled with by ET_ on 2017-11-19 at 12:30 |
|
|
|
|
|
#130 | |
|
Nov 2017
2 Posts |
Quote:
APM X-Gene Mustang board ./Mlucas -s m -cpu 0:7 (8 cpus): Code:
17.1
1024 msec/iter = 10.23 ROE[avg,max] = [0.231349041, 0.296875000] radices = 64 16 16 32 0 0 0 0 0 0
1152 msec/iter = 12.36 ROE[avg,max] = [0.223343084, 0.312500000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 14.86 ROE[avg,max] = [0.264203447, 0.375000000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 17.70 ROE[avg,max] = [0.228616585, 0.312500000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 20.17 ROE[avg,max] = [0.271927352, 0.375000000] radices = 48 16 32 32 0 0 0 0 0 0
1664 msec/iter = 22.53 ROE[avg,max] = [0.272265625, 0.406250000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 24.26 ROE[avg,max] = [0.222731285, 0.312500000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 26.30 ROE[avg,max] = [0.255133245, 0.375000000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 27.92 ROE[avg,max] = [0.312242268, 0.406250000] radices = 128 16 16 32 0 0 0 0 0 0
2304 msec/iter = 32.79 ROE[avg,max] = [0.249449173, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 36.00 ROE[avg,max] = [0.233106476, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0
2816 msec/iter = 40.67 ROE[avg,max] = [0.260065641, 0.375000000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 44.44 ROE[avg,max] = [0.266442494, 0.375000000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 48.37 ROE[avg,max] = [0.280374114, 0.375000000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 52.39 ROE[avg,max] = [0.254961340, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 56.34 ROE[avg,max] = [0.247425197, 0.343750000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 59.62 ROE[avg,max] = [0.227451789, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 68.25 ROE[avg,max] = [0.248973603, 0.312500000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 77.27 ROE[avg,max] = [0.234943193, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 88.35 ROE[avg,max] = [0.261650290, 0.343750000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 98.88 ROE[avg,max] = [0.245978727, 0.343750000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 107.94 ROE[avg,max] = [0.268344274, 0.375000000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 115.25 ROE[avg,max] = [0.230832822, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0
7680 msec/iter = 124.74 ROE[avg,max] = [0.241652278, 0.343750000] radices = 240 16 32 32 0 0 0 0 0 0
Code:
1024 msec/iter = 6.91 ROE[avg,max] = [0.241764659, 0.343750000] radices = 32 32 32 16 0 0 0 0 0 0
1152 msec/iter = 8.74 ROE[avg,max] = [0.223343084, 0.312500000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 9.55 ROE[avg,max] = [0.264203447, 0.375000000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 11.00 ROE[avg,max] = [0.228616585, 0.312500000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 11.50 ROE[avg,max] = [0.271927352, 0.375000000] radices = 48 16 32 32 0 0 0 0 0 0
1664 msec/iter = 13.25 ROE[avg,max] = [0.272265625, 0.406250000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 14.53 ROE[avg,max] = [0.222731285, 0.312500000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 15.95 ROE[avg,max] = [0.255133245, 0.375000000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 16.46 ROE[avg,max] = [0.312242268, 0.406250000] radices = 128 16 16 32 0 0 0 0 0 0
2304 msec/iter = 19.40 ROE[avg,max] = [0.270892397, 0.375000000] radices = 144 32 16 16 0 0 0 0 0 0
2560 msec/iter = 21.23 ROE[avg,max] = [0.236825121, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 24.05 ROE[avg,max] = [0.225605129, 0.281250000] radices = 176 8 8 8 16 0 0 0 0 0
3072 msec/iter = 26.94 ROE[avg,max] = [0.248841087, 0.312500000] radices = 192 8 8 8 16 0 0 0 0 0
3328 msec/iter = 28.65 ROE[avg,max] = [0.231577450, 0.312500000] radices = 208 8 8 8 16 0 0 0 0 0
3584 msec/iter = 31.00 ROE[avg,max] = [0.254961340, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 33.57 ROE[avg,max] = [0.247425197, 0.343750000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 35.65 ROE[avg,max] = [0.227451789, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 40.67 ROE[avg,max] = [0.228279161, 0.312500000] radices = 288 8 8 8 16 0 0 0 0 0
5120 msec/iter = 46.35 ROE[avg,max] = [0.234943193, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 51.91 ROE[avg,max] = [0.259536082, 0.343750000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 58.14 ROE[avg,max] = [0.245978727, 0.343750000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 61.66 ROE[avg,max] = [0.266108247, 0.375000000] radices = 208 32 32 16 0 0 0 0 0 0
7168 msec/iter = 66.42 ROE[avg,max] = [0.225737707, 0.312500000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 71.11 ROE[avg,max] = [0.236637429, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0
Code:
1024 msec/iter = 13.10 ROE[avg,max] = [0.241406250, 0.312500000] radices = 32 32 32 16 0 0 0 0 0 0
1152 msec/iter = 16.24 ROE[avg,max] = [0.221044922, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 17.22 ROE[avg,max] = [0.250167411, 0.312500000] radices = 40 16 32 32 0 0 0 0 0 0
1408 msec/iter = 21.08 ROE[avg,max] = [0.227343750, 0.265625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 21.65 ROE[avg,max] = [0.272656250, 0.312500000] radices = 48 16 32 32 0 0 0 0 0 0
1664 msec/iter = 25.20 ROE[avg,max] = [0.270758929, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 26.09 ROE[avg,max] = [0.230078125, 0.281250000] radices = 56 16 32 32 0 0 0 0 0 0
1920 msec/iter = 29.81 ROE[avg,max] = [0.257756696, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 30.09 ROE[avg,max] = [0.228236607, 0.281250000] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 35.97 ROE[avg,max] = [0.272405134, 0.343750000] radices = 144 16 16 32 0 0 0 0 0 0
2560 msec/iter = 39.08 ROE[avg,max] = [0.245999581, 0.312500000] radices = 40 32 32 32 0 0 0 0 0 0
2816 msec/iter = 44.60 ROE[avg,max] = [0.260044643, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 48.20 ROE[avg,max] = [0.224818638, 0.251953125] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 52.66 ROE[avg,max] = [0.279017857, 0.343750000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 56.96 ROE[avg,max] = [0.252566964, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 61.36 ROE[avg,max] = [0.249302455, 0.343750000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 64.82 ROE[avg,max] = [0.229129464, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 74.32 ROE[avg,max] = [0.249079241, 0.281250000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 81.62 ROE[avg,max] = [0.237137277, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 92.76 ROE[avg,max] = [0.256919643, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 102.56 ROE[avg,max] = [0.246651786, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 109.54 ROE[avg,max] = [0.262500000, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0
7168 msec/iter = 117.78 ROE[avg,max] = [0.224874442, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 126.96 ROE[avg,max] = [0.237053571, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0
|
|
|
|
|
|
|
#131 | |
|
∂2ω=0
Sep 2002
República de California
1175610 Posts |
Quote:
Rocky, thanks for the multithreaded timings, here are the resulting speedup-factor tables: X-gene: Code:
1-thr: 8-thr: speedup 1024 70.94 10.23 6.93x 1152 81.80 12.36 6.62x 1280 91.57 14.86 6.16x 1408 107.83 17.70 6.09x 1536 114.63 20.17 5.68x 1664 127.96 22.53 5.68x 1792 137.40 24.26 5.66x 1920 146.12 26.30 5.56x 2048 155.02 27.92 5.55x 2304 177.40 32.79 5.41x 2560 194.36 36.00 5.40x 2816 225.86 40.67 5.55x 3072 243.38 44.44 5.48x 3328 266.68 48.37 5.51x 3584 285.43 52.39 5.45x 3840 303.83 56.34 5.39x 4096 322.17 59.62 5.40x 4608 366.85 68.25 5.38x 5120 414.81 77.27 5.37x 5632 480.58 88.35 5.44x 6144 518.04 98.88 5.24x 6656 568.81 107.94 5.27x 7168 610.64 115.25 5.30x 7680 650.18 124.74 5.21x Code:
1-thr: 4-thr: speedup 8-thr: speedup 1024 49.45 13.10 3.77x 6.91 7.16x 1152 60.75 16.24 3.74x 8.74 6.95x 1280 66.72 17.22 3.87x 9.55 6.99x 1408 78.04 21.08 3.70x 11.00 7.09x 1536 80.59 21.65 3.72x 11.50 7.01x 1664 94.73 25.20 3.76x 13.25 7.15x 1792 97.68 26.09 3.74x 14.53 6.72x 1920 110.56 29.81 3.71x 15.95 6.93x 2048 109.44 30.09 3.64x 16.46 6.65x 2304 132.20 35.97 3.68x 19.40 6.81x 2560 146.44 39.08 3.75x 21.23 6.90x 2816 167.99 44.60 3.77x 24.05 6.99x 3072 175.99 48.20 3.65x 26.94 6.53x 3328 197.09 52.66 3.74x 28.65 6.88x 3584 206.07 56.96 3.62x 31.00 6.65x 3840 229.05 61.36 3.73x 33.57 6.82x 4096 240.88 64.82 3.72x 35.65 6.76x 4608 276.55 74.32 3.72x 40.67 6.80x 5120 300.46 81.62 3.68x 46.35 6.48x 5632 342.46 92.76 3.69x 51.91 6.60x 6144 380.17 102.56 3.72x 58.14 6.54x 6656 403.75 109.54 3.69x 61.66 6.55x 7168 433.80 117.78 3.68x 66.42 6.53x 7680 468.23 126.96 3.69x 71.11 6.58x |
|
|
|
|
|
|
#132 |
|
Banned
"Luigi"
Aug 2002
Team Italia
5×7×139 Posts |
Yes. Slower clock, (s)lower memory and no cooling surfaces (as well as tests with other software) put the PI at 60% - 70% of the Odroid.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Economic prospects for solar photovoltaic power | cheesehead | Science & Technology | 137 | 2018-06-26 15:46 |
| Which SIMD flag to use for Raspberry Pi | BrainStone | Mlucas | 14 | 2017-11-19 00:59 |
| compiler/assembler optimizations possible? | ixfd64 | Software | 7 | 2011-02-25 20:05 |
| Running 32-bit builds on a Win7 system | ewmayer | Programming | 34 | 2010-10-18 22:36 |
| SIMD string->int | fivemack | Software | 7 | 2009-03-23 18:15 |