![]() |
|
|
#155 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
117910 Posts |
Odroid-U2 (4x Cortex-A9, armv7)
17.1 compiled with GCC 6.3.0 4 thread (scalar) Code:
17.1
1024 msec/iter = 90.68 ROE[avg,max] = [0.247321429, 0.312500000] radices = 128 16 16 16 0 0 0 0 0 0
1152 msec/iter = 102.31 ROE[avg,max] = [0.225341797, 0.265625000] radices = 144 16 16 16 0 0 0 0 0 0
1280 msec/iter = 119.48 ROE[avg,max] = [0.245872280, 0.312500000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 128.28 ROE[avg,max] = [0.228543527, 0.281250000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 157.27 ROE[avg,max] = [0.238232422, 0.281250000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 151.83 ROE[avg,max] = [0.252692522, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 164.11 ROE[avg,max] = [0.231361607, 0.281250000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 182.04 ROE[avg,max] = [0.234856306, 0.281250000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 188.62 ROE[avg,max] = [0.250000000, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 214.94 ROE[avg,max] = [0.227878244, 0.250000000] radices = 144 8 8 8 16 0 0 0 0 0
2560 msec/iter = 252.43 ROE[avg,max] = [0.276897321, 0.343750000] radices = 160 8 8 8 16 0 0 0 0 0
2816 msec/iter = 267.55 ROE[avg,max] = [0.242633929, 0.281250000] radices = 176 8 8 8 16 0 0 0 0 0
3072 msec/iter = 315.78 ROE[avg,max] = [0.248883929, 0.312500000] radices = 192 8 8 8 16 0 0 0 0 0
3328 msec/iter = 319.69 ROE[avg,max] = [0.278794643, 0.343750000] radices = 208 8 8 8 16 0 0 0 0 0
3584 msec/iter = 343.25 ROE[avg,max] = [0.249330357, 0.281250000] radices = 224 8 8 8 16 0 0 0 0 0
3840 msec/iter = 400.92 ROE[avg,max] = [0.241594587, 0.281250000] radices = 240 8 8 8 16 0 0 0 0 0
4096 msec/iter = 414.05 ROE[avg,max] = [0.256026786, 0.312500000] radices = 256 8 8 8 16 0 0 0 0 0
4608 msec/iter = 493.21 ROE[avg,max] = [0.231989397, 0.281250000] radices = 288 8 8 8 16 0 0 0 0 0
5120 msec/iter = 583.92 ROE[avg,max] = [0.262723214, 0.312500000] radices = 160 8 8 16 16 0 0 0 0 0
5632 msec/iter = 628.49 ROE[avg,max] = [0.237304688, 0.281250000] radices = 176 8 8 16 16 0 0 0 0 0
6144 msec/iter = 731.52 ROE[avg,max] = [0.242550223, 0.281250000] radices = 192 8 8 16 16 0 0 0 0 0
6656 msec/iter = 739.47 ROE[avg,max] = [0.270982143, 0.312500000] radices = 208 8 8 16 16 0 0 0 0 0
7168 msec/iter = 799.27 ROE[avg,max] = [0.240229143, 0.281250000] radices = 224 8 8 16 16 0 0 0 0 0
7680 msec/iter = 894.03 ROE[avg,max] = [0.247879464, 0.312500000] radices = 240 8 8 16 16 0 0 0 0 0
|
|
|
|
|
|
#156 | |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Quote:
Oh, hey, could you also try the scalar-double (non-simd) of the pair of precompiled-on-odroid-c2 binaries I posted to the readme page a couple days ago and LMK if any issues (or notable timing differences vs your above ones) with that? Thanks. |
|
|
|
|
|
|
#157 |
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
ROC-RK3328-CC
Image: ROC-RK3328-CC_Ubuntu16.04_Arch64_20180309 GCC: 7.2.0 asimd They got their act together. It's now comparable to a pi3b. It should be noticeably better due to higher clocks, maybe the image is still not fully tailored to the hardware (lscpu does report 1392 for CPU max Mhz, don't know if it stays there under load). Doesn't look like the better on paper RAM has done anything for mlucas: Code:
17.1
1024 msec/iter = 62.34 ROE[avg,max] = [0.231563895, 0.281250000] radices = 64 32 16 16 0 0 0 0 0 0
1152 msec/iter = 67.97 ROE[avg,max] = [0.221044922, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 77.26 ROE[avg,max] = [0.264508929, 0.343750000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 90.94 ROE[avg,max] = [0.227343750, 0.265625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 102.08 ROE[avg,max] = [0.267187500, 0.343750000] radices = 48 32 32 16 0 0 0 0 0 0
1664 msec/iter = 110.82 ROE[avg,max] = [0.270758929, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 120.42 ROE[avg,max] = [0.220532663, 0.250000000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 131.74 ROE[avg,max] = [0.257756696, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 140.40 ROE[avg,max] = [0.223493304, 0.250000000] radices = 64 32 32 16 0 0 0 0 0 0
2304 msec/iter = 163.35 ROE[avg,max] = [0.248751395, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 179.19 ROE[avg,max] = [0.236908831, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0
2816 msec/iter = 208.29 ROE[avg,max] = [0.263392857, 0.312500000] radices = 176 32 16 16 0 0 0 0 0 0
3072 msec/iter = 231.04 ROE[avg,max] = [0.224818638, 0.251953125] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 251.91 ROE[avg,max] = [0.281250000, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0
3584 msec/iter = 272.49 ROE[avg,max] = [0.252343750, 0.312500000] radices = 224 32 16 16 0 0 0 0 0 0
3840 msec/iter = 295.94 ROE[avg,max] = [0.248437500, 0.343750000] radices = 240 32 16 16 0 0 0 0 0 0
4096 msec/iter = 307.20 ROE[avg,max] = [0.295089286, 0.343750000] radices = 128 32 32 16 0 0 0 0 0 0
4608 msec/iter = 356.76 ROE[avg,max] = [0.258928571, 0.312500000] radices = 144 32 32 16 0 0 0 0 0 0
5120 msec/iter = 390.13 ROE[avg,max] = [0.237137277, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 455.21 ROE[avg,max] = [0.256919643, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 512.26 ROE[avg,max] = [0.246651786, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 550.61 ROE[avg,max] = [0.262500000, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0
7168 msec/iter = 606.13 ROE[avg,max] = [0.224874442, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 664.46 ROE[avg,max] = [0.237053571, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0
|
|
|
|
|
|
#158 | |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Quote:
Last fiddled with by ewmayer on 2018-03-14 at 03:01 |
|
|
|
|
|
|
#159 |
|
Aug 2010
Republic of Belarus
2628 Posts |
Hello! I would like to share bechmarks for the ARMv8 MONSTER
with 96 cores (2x48)!CPU: Cavium ThunderX SoC (96 Physical Cores @ 2.0 GHz (2 × Cavium ThunderX)). RAM: 128 GB of DDR4 ECC RAM OS: CentOS 7 GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16) lscpu: Code:
Architecture: aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 2 NUMA node0 CPU(s): 0-47 NUMA node1 CPU(s): 48-95 processor : 0 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics CPU implementer : 0x43 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x0a1 CPU revision : 1 Code:
Mlucas 17.1
http://hogranch.com/mayer/README.html
ERROR: at line 1831 of file ../src/util.c
Assertion failed: #define USE_ARM_V8_SIMD invoked but no advanced-SIMD support detected on this CPU!
INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 4.8.5 20150623 (Red Hat 4.8.5-16).
./Mlucas -s m -iters 100 -nthread 96 >& selftest.log Code:
Mlucas 17.1
http://hogranch.com/mayer/README.html
INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing FFT radix tables...
Mlucas selftest running.....
/****************************************************************************/
INFO: Unable to find/open mlucas.cfg file in r+ mode ... creating from scratch.
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 1024 16 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
WARN: At line 425 of file ../src/radix1024_ditN_cy_dif1.c:
n_div_nwt%CY_THREADS != 0
Return with code ERR_ASSERT
Error detected - this radix set will not be used.
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 1024 32 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
WARN: At line 425 of file ../src/radix1024_ditN_cy_dif1.c:
n_div_nwt%CY_THREADS != 0
Return with code ERR_ASSERT
Error detected - this radix set will not be used.
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 256 8 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
WARN: At line 590 of file ../src/radix256_ditN_cy_dif1.c:
n_div_nwt%CY_THREADS != 0
Return with code ERR_ASSERT
Error detected - this radix set will not be used.
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 128 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
WARN: At line 451 of file ../src/radix128_ditN_cy_dif1.c:
n_div_nwt%CY_THREADS != 0
Return with code ERR_ASSERT
Error detected - this radix set will not be used.
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 16 16 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
Using 64 threads in carry step
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.211633301. MaxErr = 0.250000000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:01.126
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 32 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.209770857. MaxErr = 0.250000000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:01.176
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 64 8 8 8 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.224783761. MaxErr = 0.281250000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:01.085
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 32 16 32 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
Using 64 threads in carry step
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.275223214. MaxErr = 0.375000000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:01.383
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 32 32 32 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.268080357. MaxErr = 0.312500000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:01.577
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 32 8 8 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.296428571. MaxErr = 0.375000000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:01.341
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 16 32 32 32
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
Using 64 threads in carry step
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.221902902. MaxErr = 0.281250000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:02.362
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 16 8 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
100 iterations of M20000047 with FFT length 1048576 = 1024 K
Res64: DD61B3E031F1E0BA. AvgMaxErr = 0.230357143. MaxErr = 0.312500000. Program: E17.1
Res mod 2^36 = 837935290
Res mod 2^35 - 1 = 6238131189
Res mod 2^36 - 1 = 41735145962
Clocks = 00:00:02.175
NTHREADS = 96
M20000047: using FFT length 1024K = 1048576 8-byte floats.
this gives an average 19.073531150817871 bits per digit
Using complex FFT radices 8 16 16 16 16
mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance.
ERROR: at line 120 of file ../src/radix8_ditN_cy_dif1.c
Assertion failed: radix8_ditN_cy_dif1.c: CY_THREADS not a power of 2!
INFO: testing qfloat routines...
CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 4.8.5 20150623 (Red Hat 4.8.5-16).
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: System has 96 available processor cores.
Set affinity for the following 96 cores: 0.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68.69.70.71.72.73.74.75.76.77.78.79.80.81.82.83.84.85.86.87.88.89.90.91.92.93.94.95.
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
mers_mod_square: Init threadpool of 96 threads
[root@lorenzo2 src]#
./Mlucas -s m -iters 100 -nthread 60 >& selftest.log Code:
17.1
1024 msec/iter = 10.36 ROE[avg,max] = [0.224783761, 0.281250000] radices = 64 8 8 8 16 0 0 0 0 0
1152 msec/iter = 13.41 ROE[avg,max] = [0.209650530, 0.250000000] radices = 144 16 16 16 0 0 0 0 0 0
1280 msec/iter = 15.83 ROE[avg,max] = [0.223046875, 0.250000000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 16.36 ROE[avg,max] = [0.227852958, 0.250000000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 18.24 ROE[avg,max] = [0.234375000, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 18.33 ROE[avg,max] = [0.229310826, 0.281250000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 19.33 ROE[avg,max] = [0.221177455, 0.281250000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 22.15 ROE[avg,max] = [0.226757812, 0.250000000] radices = 60 16 32 32 0 0 0 0 0 0
2048 msec/iter = 22.12 ROE[avg,max] = [0.215150670, 0.250000000] radices = 128 16 16 32 0 0 0 0 0 0
2304 msec/iter = 24.11 ROE[avg,max] = [0.223395647, 0.250000000] radices = 144 16 16 32 0 0 0 0 0 0
Code:
17.1
1024 msec/iter = 8.75 ROE[avg,max] = [0.211633301, 0.250000000] radices = 64 16 16 32 0 0 0 0 0 0
1152 msec/iter = 11.57 ROE[avg,max] = [0.209650530, 0.250000000] radices = 144 16 16 16 0 0 0 0 0 0
1280 msec/iter = 13.96 ROE[avg,max] = [0.223046875, 0.250000000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 14.65 ROE[avg,max] = [0.227852958, 0.250000000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 15.97 ROE[avg,max] = [0.234375000, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0
1664 msec/iter = 17.98 ROE[avg,max] = [0.229310826, 0.281250000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 19.09 ROE[avg,max] = [0.221177455, 0.281250000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 21.41 ROE[avg,max] = [0.226757812, 0.250000000] radices = 60 16 32 32 0 0 0 0 0 0
2048 msec/iter = 20.34 ROE[avg,max] = [0.215150670, 0.250000000] radices = 128 16 16 32 0 0 0 0 0 0
2304 msec/iter = 21.62 ROE[avg,max] = [0.223395647, 0.250000000] radices = 144 16 16 32 0 0 0 0 0 0
2560 msec/iter = 26.38 ROE[avg,max] = [0.302678571, 0.375000000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 26.95 ROE[avg,max] = [0.266071429, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 30.10 ROE[avg,max] = [0.219042969, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 33.33 ROE[avg,max] = [0.290401786, 0.343750000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 35.76 ROE[avg,max] = [0.227008929, 0.281250000] radices = 224 8 8 8 16 0 0 0 0 0
3840 msec/iter = 39.92 ROE[avg,max] = [0.228404018, 0.257812500] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 41.12 ROE[avg,max] = [0.228041295, 0.312500000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 45.24 ROE[avg,max] = [0.233426339, 0.312500000] radices = 144 8 8 16 16 0 0 0 0 0
5120 msec/iter = 55.08 ROE[avg,max] = [0.260044643, 0.312500000] radices = 160 8 8 16 16 0 0 0 0 0
5632 msec/iter = 55.22 ROE[avg,max] = [0.218415179, 0.281250000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 61.61 ROE[avg,max] = [0.253236607, 0.312500000] radices = 192 8 8 16 16 0 0 0 0 0
6656 msec/iter = 68.61 ROE[avg,max] = [0.320982143, 0.375000000] radices = 208 8 8 16 16 0 0 0 0 0
7168 msec/iter = 73.92 ROE[avg,max] = [0.327455357, 0.375000000] radices = 224 8 8 16 16 0 0 0 0 0
7680 msec/iter = 81.06 ROE[avg,max] = [0.232477679, 0.281250000] radices = 240 8 8 16 16 0 0 0 0 0
./Mlucas -s l -iters 100 -nthread 48 >& selftest.log Code:
8192 msec/iter = 82.33 ROE[avg,max] = [0.326562500, 0.375000000] radices = 256 8 8 16 16 0 0 0 0 0
9216 msec/iter = 96.31 ROE[avg,max] = [0.248883929, 0.312500000] radices = 144 32 32 32 0 0 0 0 0 0
10240 msec/iter = 114.42 ROE[avg,max] = [0.288392857, 0.312500000] radices = 160 32 32 32 0 0 0 0 0 0
11264 msec/iter = 113.96 ROE[avg,max] = [0.217041016, 0.265625000] radices = 176 32 32 32 0 0 0 0 0 0
12288 msec/iter = 127.86 ROE[avg,max] = [0.232700893, 0.281250000] radices = 192 32 32 32 0 0 0 0 0 0
13312 msec/iter = 145.45 ROE[avg,max] = [0.236021205, 0.281250000] radices = 208 32 32 32 0 0 0 0 0 0
14336 msec/iter = 149.82 ROE[avg,max] = [0.232903181, 0.281250000] radices = 224 32 32 32 0 0 0 0 0 0
15360 msec/iter = 167.64 ROE[avg,max] = [0.266294643, 0.312500000] radices = 240 32 32 32 0 0 0 0 0 0
16384 msec/iter = 172.04 ROE[avg,max] = [0.233035714, 0.250000000] radices = 256 32 32 32 0 0 0 0 0 0
18432 msec/iter = 201.83 ROE[avg,max] = [0.215608433, 0.250000000] radices = 144 16 16 16 16 0 0 0 0 0
20480 msec/iter = 240.79 ROE[avg,max] = [0.285714286, 0.343750000] radices = 160 16 16 16 16 0 0 0 0 0
22528 msec/iter = 241.65 ROE[avg,max] = [0.234933036, 0.281250000] radices = 176 16 16 16 16 0 0 0 0 0
24576 msec/iter = 267.75 ROE[avg,max] = [0.237276786, 0.281250000] radices = 192 16 16 16 16 0 0 0 0 0
26624 msec/iter = 302.91 ROE[avg,max] = [0.256473214, 0.312500000] radices = 208 16 16 16 16 0 0 0 0 0
28672 msec/iter = 317.19 ROE[avg,max] = [0.216406250, 0.250000000] radices = 224 16 16 16 16 0 0 0 0 0
30720 msec/iter = 353.90 ROE[avg,max] = [0.245089286, 0.312500000] radices = 240 16 16 16 16 0 0 0 0 0
32768 msec/iter = 359.45 ROE[avg,max] = [0.326339286, 0.375000000] radices = 256 16 16 16 16 0 0 0 0 0
dmidecode Code:
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0x10FFF1E0000.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: G31FB12A
Release Date: 10/26/2016
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16384 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 5.11
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: FOXCONN
Product Name: R2-1221R-A4
Version: 1A21HH300-600-G
Serial Number: 7CE642P2KZ
UUID: 10000000-AE90-0958-D62D-70106FB9EAD0
Wake-up Type: Power Switch
SKU Number: C2U4N
Family: NULL
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: FOXCONN
Product Name: C2U4N_MB
Version: 1A42D1P00-600-G1
Serial Number: 1A42D1P00TX14700C
Asset Tag: NULL
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: REAR
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
Handle 0x0003, DMI type 3, 22 bytes
Chassis Information
Manufacturer: FOXCONN
Type: Other
Lock: Not Present
Version: C2U4N
Serial Number: 7CE642P0SE
Asset Tag: NULL
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000000
Height: 2 U
Number Of Power Cords: 2
Contained Elements: 0
SKU Number: NULL
Handle 0x0009, DMI type 9, 17 bytes
System Slot Information
Designation: PCIe Slot
Type: x8 PCI Express 3 x8
Current Usage: Available
Length: Short
ID: 1
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:00:00.0
Handle 0x0010, DMI type 11, 5 bytes
OEM Strings
String 1: NULL
Handle 0x0011, DMI type 13, 22 bytes
BIOS Language Information
Language Description Format: Long
Installable Languages: 1
en|US|iso8859-1
Currently Installed Language: en|US|iso8859-1
Handle 0x0012, DMI type 32, 11 bytes
System Boot Information
Status: No errors detected
Handle 0x0013, DMI type 41, 11 bytes
Onboard Device
Reference Designation: VGA
Type: Video
Status: Enabled
Type Instance: 1
Bus Address: 0004:15:00.0
Handle 0x0014, DMI type 38, 18 bytes
IPMI Device Information
Interface Type: SSIF (SMBus System Interface)
Specification Version: 2.0
I2C Slave Address: 0x10
NV Storage Device: Not Present
Base Address: 0x12 (SMBus)
Handle 0x0023, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: USB 3.0 Port 1
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0024, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: Rear Video
External Connector Type: DB-15 female
Port Type: Video Port
Handle 0x0025, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: 1GbE
External Connector Type: RJ-45
Port Type: Network Port
Handle 0x0026, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: 10G SFP+ 1
External Connector Type: Other
Port Type: Network Port
Handle 0x0027, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Specified
Internal Connector Type: None
External Reference Designator: 10G SFP+ 2
External Connector Type: Other
Port Type: Network Port
Handle 0x0028, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Mini SAS Port 0
Internal Connector Type: SAS/SATA Plug Receptacle
External Reference Designator: Not Specified
External Connector Type: None
Port Type: SAS
Handle 0x0029, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Mini SAS Port 1
Internal Connector Type: SAS/SATA Plug Receptacle
External Reference Designator: Not Specified
External Connector Type: None
Port Type: SAS
Handle 0x002A, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: SATA M.2
Internal Connector Type: Other
External Reference Designator: Not Specified
External Connector Type: None
Port Type: SATA
Handle 0x002B, DMI type 12, 5 bytes
System Configuration Options
Option 1: JP19(PASSWORD CLEAR) Jumper 1-2: Normal (Default), Jumper 2-3: Password Clear
Handle 0x002C, DMI type 4, 42 bytes
Processor Information
Socket Designation: SoC 0
Type: Central Processor
Family: ARM
Manufacturer: Cavium Inc.
ID: 11 0A 1F 43 00 00 00 00
Version: 2.1
Voltage: 1.0 V
External Clock: 50 MHz
Max Speed: 2000 MHz
Current Speed: 2000 MHz
Status: Populated, Enabled
Upgrade: None
L1 Cache Handle: 0x002D
L2 Cache Handle: 0x002F
L3 Cache Handle: 0x0000
Serial Number: 0180-1009-001E-31F1-8100-1000
Asset Tag: NULL
Part Number: CN8890H-2000BG2601-AAP-Y-G
Core Count: 48
Core Enabled: 48
Thread Count: 48
Characteristics:
64-bit capable
Multi-Core
Execute Protection
Enhanced Virtualization
Power/Performance Control
Handle 0x002D, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L1D Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 1536 kB
Maximum Size: 1536 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 32-way Set-associative
Handle 0x002E, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L1I Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 3744 kB
Maximum Size: 3744 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Instruction
Associativity: Other
Handle 0x002F, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L2 Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 16384 kB
Maximum Size: 16384 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: 16-way Set-associative
Handle 0x0030, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Single-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 8
Handle 0x0031, DMI type 19, 31 bytes
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x00FFFFFFFFF
Range Size: 64 GB
Physical Array Handle: 0x0030
Partition Width: 4
Handle 0x0032, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_A0
Bank Locator: SoC 0
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C5A43
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x0033, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x003FFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x0032
Memory Array Mapped Address Handle: 0x0031
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x0034, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_A1
Bank Locator: SoC 0
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0035, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_B0
Bank Locator: SoC 0
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C5727
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x0036, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00400000000
Ending Address: 0x007FFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x0035
Memory Array Mapped Address Handle: 0x0031
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x0037, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_B1
Bank Locator: SoC 0
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0038, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_C0
Bank Locator: SoC 0
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C5726
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x0039, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00800000000
Ending Address: 0x00BFFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x0038
Memory Array Mapped Address Handle: 0x0031
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x003A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_C1
Bank Locator: SoC 0
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x003B, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_D0
Bank Locator: SoC 0
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C5058
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x003C, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x00C00000000
Ending Address: 0x00FFFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x003B
Memory Array Mapped Address Handle: 0x0031
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x003D, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0030
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_D1
Bank Locator: SoC 0
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x003E, DMI type 4, 42 bytes
Processor Information
Socket Designation: SoC 1
Type: Central Processor
Family: ARM
Manufacturer: Cavium Inc.
ID: 11 0A 1F 43 00 00 00 00
Version: 2.1
Voltage: 1.0 V
External Clock: 50 MHz
Max Speed: 2000 MHz
Current Speed: 2000 MHz
Status: Populated, Enabled
Upgrade: None
L1 Cache Handle: 0x003F
L2 Cache Handle: 0x0041
L3 Cache Handle: 0x0000
Serial Number: 0160-2803-001E-31F1-8020-1000
Asset Tag: NULL
Part Number: CN8890H-2000BG2601-AAP-Y-G
Core Count: 48
Core Enabled: 48
Thread Count: 48
Characteristics:
64-bit capable
Multi-Core
Execute Protection
Enhanced Virtualization
Power/Performance Control
Handle 0x003F, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L1D Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 1536 kB
Maximum Size: 1536 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 32-way Set-associative
Handle 0x0040, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L1I Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 3744 kB
Maximum Size: 3744 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Instruction
Associativity: Other
Handle 0x0041, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L2 Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 16384 kB
Maximum Size: 16384 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: 16-way Set-associative
Handle 0x0042, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Single-bit ECC
Maximum Capacity: 256 GB
Error Information Handle: Not Provided
Number Of Devices: 8
Handle 0x0043, DMI type 19, 31 bytes
Memory Array Mapped Address
Starting Address: 0x01000000000
Ending Address: 0x02FFFFFFFFF
Range Size: 128 GB
Physical Array Handle: 0x0042
Partition Width: 4
Handle 0x0044, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_E0
Bank Locator: SoC 1
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C56CB
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x0045, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x01000000000
Ending Address: 0x013FFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x0044
Memory Array Mapped Address Handle: 0x0043
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x0046, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_E1
Bank Locator: SoC 1
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0047, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_F0
Bank Locator: SoC 1
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C5723
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x0048, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x01400000000
Ending Address: 0x017FFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x0047
Memory Array Mapped Address Handle: 0x0043
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x0049, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_F1
Bank Locator: SoC 1
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x004A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_G0
Bank Locator: SoC 1
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C5A49
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x004B, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x01800000000
Ending Address: 0x01BFFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x004A
Memory Array Mapped Address Handle: 0x0043
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x004C, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_G1
Bank Locator: SoC 1
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x004D, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: Unknown
Locator: DIMM_H0
Bank Locator: SoC 1
Type: DDR4
Type Detail: Registered (Buffered)
Speed: 2400 MHz
Manufacturer: Samsung
Serial Number: #021631330C51F1
Asset Tag: None
Part Number: M393A2G40EB1-CRC
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Handle 0x004E, DMI type 20, 35 bytes
Memory Device Mapped Address
Starting Address: 0x01C00000000
Ending Address: 0x01FFFFFFFFF
Range Size: 16 GB
Physical Device Handle: 0x004D
Memory Array Mapped Address Handle: 0x0043
Partition Row Position: 1
Interleave Position: Unknown
Interleaved Data Depth: Unknown
Handle 0x004F, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: Unknown
Locator: DIMM_H1
Bank Locator: SoC 1
Type: DDR4
Type Detail: Unknown
Speed: Unknown
Manufacturer: NO DIMM
Serial Number: NO DIMM
Asset Tag: None
Part Number: NO DIMM
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown
Handle 0x0050, DMI type 127, 4 bytes
End Of Table
|
|
|
|
|
|
#160 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Hi, Lorenzo, and thanks for the build data on this interesting manycore ARM implementation. Couple of notes:
[1] You are highly unlikely to get decent parallelism beyond 16 cores. Moreover, the self-test error messages you get for 48 and 96 cores reflect limitations in how much ||ism can be obtained for the specific-radix carry routines in question. (The reasons for these limitations are technical ... the self-test will print the rror message and skip to the next FFT radix combination in such cases, but when #threads gets large as in your attempts there will be few radix combos which can run that many threads.) So I suggest for now limiting yourself to fewer threads, say try the following core counts in your self-tests: -cpu 0:3 [when tests done, mover mlucas.cfg to mlucas.cfg.4] -cpu 0:7 [when tests done, mover mlucas.cfg to mlucas.cfg.8] -cpu 0:16 [when tests done, mover mlucas.cfg to mlucas.cfg.16] Those should tell us roughly where the #threads 'sweet spot' is. Once we find it, you would - if you were going to do actual production GIMPS work on this arch - run multiple jobs, each having that many threads, assigned to disjoint core sets, e.g. if 8-core is best, one job running with -cpu 0:7, a second with -cpu 8:15, etc. [2] For the SIMD build, I suggest for now simply commenting out the ASSERT at util.c:1831, recompile that file and relinking the binary (the SIMD obj-files and binary should be in a separate directory from those for the non-SIMD build), and see if the self-test now runs. If so, sun the same moderate-threadcount self-tests as in [1], then we can compare the 2 sets of cfg-files to see if SIMD gives the expected boost. I may have to change the SIMD-available checking code in util.c to read the /proc/cpuinfo file directly, since the current way which calls getauxval(AT_HWCAP) does not appear to be as portable as I'd been led to believe. Last fiddled with by ewmayer on 2018-03-20 at 01:12 |
|
|
|
|
|
#161 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
As I can't attach to a PM
![]() Ernst, in some interaction between your ||(ism) and my screen, I saw these symbols as "jism."
|
|
|
|
|
|
#162 |
|
"Composite as Heck"
Oct 2017
95010 Posts |
With the latest image the Renegade now performs roughly as you'd expect, it may get slightly more optimised but this is my last bench of it honest. About 10% quicker than a stock pi3b when we get up to size 4096 and beyond.
Image: ROC-RK3328-CC_Ubuntu16.04_Arch64_20180315 GCC: 7.2.0 Code:
17.1
1024 msec/iter = 56.53 ROE[avg,max] = [0.254687500, 0.312500000] radices = 256 8 16 16 0 0 0 0 0 0
1152 msec/iter = 61.35 ROE[avg,max] = [0.221044922, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0
1280 msec/iter = 68.82 ROE[avg,max] = [0.264508929, 0.343750000] radices = 160 16 16 16 0 0 0 0 0 0
1408 msec/iter = 81.40 ROE[avg,max] = [0.227343750, 0.265625000] radices = 176 16 16 16 0 0 0 0 0 0
1536 msec/iter = 91.96 ROE[avg,max] = [0.267187500, 0.343750000] radices = 48 32 32 16 0 0 0 0 0 0
1664 msec/iter = 98.80 ROE[avg,max] = [0.270758929, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0
1792 msec/iter = 106.67 ROE[avg,max] = [0.220532663, 0.250000000] radices = 224 16 16 16 0 0 0 0 0 0
1920 msec/iter = 115.22 ROE[avg,max] = [0.257756696, 0.312500000] radices = 240 16 16 16 0 0 0 0 0 0
2048 msec/iter = 123.77 ROE[avg,max] = [0.236921038, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0
2304 msec/iter = 143.72 ROE[avg,max] = [0.248751395, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0
2560 msec/iter = 159.73 ROE[avg,max] = [0.236908831, 0.312500000] radices = 160 32 16 16 0 0 0 0 0 0
2816 msec/iter = 186.83 ROE[avg,max] = [0.263392857, 0.312500000] radices = 176 32 16 16 0 0 0 0 0 0
3072 msec/iter = 206.21 ROE[avg,max] = [0.224818638, 0.251953125] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 227.02 ROE[avg,max] = [0.281250000, 0.375000000] radices = 208 32 16 16 0 0 0 0 0 0
3584 msec/iter = 245.98 ROE[avg,max] = [0.252343750, 0.312500000] radices = 224 32 16 16 0 0 0 0 0 0
3840 msec/iter = 267.59 ROE[avg,max] = [0.248437500, 0.343750000] radices = 240 32 16 16 0 0 0 0 0 0
4096 msec/iter = 279.72 ROE[avg,max] = [0.295089286, 0.343750000] radices = 128 32 32 16 0 0 0 0 0 0
4608 msec/iter = 324.20 ROE[avg,max] = [0.258928571, 0.312500000] radices = 144 32 32 16 0 0 0 0 0 0
5120 msec/iter = 354.06 ROE[avg,max] = [0.237137277, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 407.10 ROE[avg,max] = [0.256919643, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0
6144 msec/iter = 457.47 ROE[avg,max] = [0.246651786, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 492.90 ROE[avg,max] = [0.262500000, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0
7168 msec/iter = 542.15 ROE[avg,max] = [0.224874442, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 592.53 ROE[avg,max] = [0.237053571, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0
|
|
|
|
|
|
#163 |
|
I moo ablest echo power!
May 2013
3·619 Posts |
I haven't got a Pi3b+, but contact the person who made the Gentoo image (sakaki). They were very responsive, polite, and helpful.
|
|
|
|
|
|
#164 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
32×131 Posts |
ARM Cortex-A76 announced
Some key features: Performance orientated design OoO (Out of Order) 4-wide (=wider than the previous A57, A72, A73, A75) Dual-128bit ASIMD/FP execution pipelines Increased memory bandwidth/ lower latency throughout the caches/memory https://www.anandtech.com/show/12785...7nm-powerhouse Last fiddled with by VictordeHolland on 2018-06-14 at 14:56 Reason: removed extra newlines |
|
|
|
|
|
#165 |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Thanks for the link, Victor. I've been hearing about Apple's working on a custom high-perf ARM-based chip for the future PCs, and I surmised that 256-bit-wide vectors would be a key part of that. Let's just hope that the various 256-bit CPUs are code-compatible.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Economic prospects for solar photovoltaic power | cheesehead | Science & Technology | 137 | 2018-06-26 15:46 |
| Which SIMD flag to use for Raspberry Pi | BrainStone | Mlucas | 14 | 2017-11-19 00:59 |
| compiler/assembler optimizations possible? | ixfd64 | Software | 7 | 2011-02-25 20:05 |
| Running 32-bit builds on a Win7 system | ewmayer | Programming | 34 | 2010-10-18 22:36 |
| SIMD string->int | fivemack | Software | 7 | 2009-03-23 18:15 |