View Single Post
 2021-02-17, 20:31 #11 Lorenzo     Aug 2010 Republic of Belarus 2·89 Posts Hello!) I just want to share my experience with Apple M1 CPU. Compiled smoothly without issues (i have included -DUSE_ARM_V8_SIMD flag according with the README page). Code: CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 12.0.0 (clang-1200.0.32.29). INFO: Build uses ARMv8 advanced-SIMD instruction set. CPU extensions: Code: m1@599160f8-fb7f-41df-adc2-2b7f4da1aac7 src % sysctl hw.optional hw.optional.floatingpoint: 1 hw.optional.watchpoint: 4 hw.optional.breakpoint: 6 hw.optional.neon: 1 hw.optional.neon_hpfp: 1 hw.optional.neon_fp16: 1 hw.optional.armv8_1_atomics: 1 hw.optional.armv8_crc32: 1 hw.optional.armv8_2_fhm: 1 hw.optional.armv8_2_sha512: 1 hw.optional.armv8_2_sha3: 1 hw.optional.amx_version: 2 hw.optional.ucnormal_mem: 1 hw.optional.arm64: 1 ./Mlucas -s m -cpu 0:7 Code: 19.1 2048 msec/iter = 3.32 ROE[avg,max] = [0.215347133, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0 2304 msec/iter = 4.00 ROE[avg,max] = [0.193772149, 0.250000000] radices = 144 32 16 16 0 0 0 0 0 0 2560 msec/iter = 4.28 ROE[avg,max] = [0.178074945, 0.234375000] radices = 160 32 16 16 0 0 0 0 0 0 2816 msec/iter = 4.98 ROE[avg,max] = [0.194841334, 0.281250000] radices = 176 32 16 16 0 0 0 0 0 0 3072 msec/iter = 5.27 ROE[avg,max] = [0.208759866, 0.312500000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 5.94 ROE[avg,max] = [0.324307345, 0.406250000] radices = 208 32 16 16 0 0 0 0 0 0 3584 msec/iter = 6.01 ROE[avg,max] = [0.198822084, 0.250000000] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 6.54 ROE[avg,max] = [0.187369624, 0.250000000] radices = 60 32 32 32 0 0 0 0 0 0 4096 msec/iter = 6.88 ROE[avg,max] = [0.176231022, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 7.91 ROE[avg,max] = [0.206297821, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0 5120 msec/iter = 8.42 ROE[avg,max] = [0.193601628, 0.250000000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 9.70 ROE[avg,max] = [0.221504510, 0.281250000] radices = 352 32 16 16 0 0 0 0 0 0 6144 msec/iter = 10.67 ROE[avg,max] = [0.183728153, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 11.75 ROE[avg,max] = [0.176554163, 0.218750000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 11.84 ROE[avg,max] = [0.213558111, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 13.14 ROE[avg,max] = [0.211455481, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0 8192 msec/iter = 13.50 ROE[avg,max] = [0.243920143, 0.312500000] radices = 256 16 32 32 0 0 0 0 0 0 9216 msec/iter = 15.44 ROE[avg,max] = [0.256431218, 0.343750000] radices = 288 16 32 32 0 0 0 0 0 0 10240 msec/iter = 16.75 ROE[avg,max] = [0.293991624, 0.375000000] radices = 160 32 32 32 0 0 0 0 0 0 11264 msec/iter = 18.68 ROE[avg,max] = [0.222417407, 0.281250000] radices = 352 16 32 32 0 0 0 0 0 0 12288 msec/iter = 21.39 ROE[avg,max] = [0.219849010, 0.281250000] radices = 192 32 32 32 0 0 0 0 0 0 13312 msec/iter = 23.73 ROE[avg,max] = [0.258116543, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0 14336 msec/iter = 23.98 ROE[avg,max] = [0.231325382, 0.281250000] radices = 224 32 32 32 0 0 0 0 0 0 15360 msec/iter = 26.76 ROE[avg,max] = [0.235138002, 0.281250000] radices = 240 32 32 32 0 0 0 0 0 0 16384 msec/iter = 26.98 ROE[avg,max] = [0.230396011, 0.312500000] radices = 256 32 32 32 0 0 0 0 0 0 18432 msec/iter = 31.07 ROE[avg,max] = [0.276530284, 0.375000000] radices = 288 32 32 32 0 0 0 0 0 0 20480 msec/iter = 35.91 ROE[avg,max] = [0.229381947, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0 22528 msec/iter = 37.85 ROE[avg,max] = [0.235262715, 0.296875000] radices = 352 32 32 32 0 0 0 0 0 0 24576 msec/iter = 42.70 ROE[avg,max] = [0.238062530, 0.375000000] radices = 768 16 32 32 0 0 0 0 0 0 26624 msec/iter = 60.50 ROE[avg,max] = [0.254043170, 0.312500000] radices = 208 16 16 16 16 0 0 0 0 0 ./Mlucas -s m -cpu 0:3 Looks like threads with heavy load assigned automatically to faster cores. Code: 19.1 2048 msec/iter = 3.88 ROE[avg,max] = [0.215133698, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0 2304 msec/iter = 4.84 ROE[avg,max] = [0.194502305, 0.281250000] radices = 144 32 16 16 0 0 0 0 0 0 2560 msec/iter = 5.00 ROE[avg,max] = [0.184244498, 0.250000000] radices = 40 32 32 32 0 0 0 0 0 0 2816 msec/iter = 6.03 ROE[avg,max] = [0.193770639, 0.250000000] radices = 176 32 16 16 0 0 0 0 0 0 3072 msec/iter = 6.17 ROE[avg,max] = [0.209568299, 0.281250000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 7.15 ROE[avg,max] = [0.221850838, 0.281250000] radices = 52 32 32 32 0 0 0 0 0 0 3584 msec/iter = 7.12 ROE[avg,max] = [0.199199621, 0.281250000] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 7.90 ROE[avg,max] = [0.187449630, 0.250000000] radices = 60 32 32 32 0 0 0 0 0 0 4096 msec/iter = 8.21 ROE[avg,max] = [0.174905238, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 9.57 ROE[avg,max] = [0.205330823, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0 5120 msec/iter = 10.01 ROE[avg,max] = [0.193377434, 0.250000000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 11.74 ROE[avg,max] = [0.221915271, 0.281250000] radices = 352 32 16 16 0 0 0 0 0 0 6144 msec/iter = 12.89 ROE[avg,max] = [0.183260259, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 14.32 ROE[avg,max] = [0.176914974, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 14.40 ROE[avg,max] = [0.213720200, 0.281250000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 16.16 ROE[avg,max] = [0.211763551, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0 Perfomance looks awesome for mobile CPU. Just to compare timings with AXV-2 on i3-8100 (4 cores): M1 much faster. AXV-2 on i3-8100: Code: 19.1 2048 msec/iter = 4.75 ROE[avg,max] = [0.167383863, 0.218750000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 5.44 ROE[avg,max] = [0.182823637, 0.218750000] radices = 144 16 16 32 0 0 0 0 0 0 2560 msec/iter = 6.29 ROE[avg,max] = [0.224905364, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 6.63 ROE[avg,max] = [0.183906382, 0.230468750] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 7.42 ROE[avg,max] = [0.252202803, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 7.52 ROE[avg,max] = [0.225825548, 0.281250000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 8.12 ROE[avg,max] = [0.260567010, 0.375000000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 9.15 ROE[avg,max] = [0.200714048, 0.281250000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 10.92 ROE[avg,max] = [0.165220469, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 11.15 ROE[avg,max] = [0.192892739, 0.250000000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 12.18 ROE[avg,max] = [0.229244523, 0.312500000] radices = 160 32 32 16 0 0 0 0 0 0 5632 msec/iter = 13.47 ROE[avg,max] = [0.187610146, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 16.09 ROE[avg,max] = [0.209471649, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 16.86 ROE[avg,max] = [0.196862667, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 17.38 ROE[avg,max] = [0.196444104, 0.250000000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 23.23 ROE[avg,max] = [0.239954494, 0.343750000] radices = 240 32 32 16 0 0 0 0 0 0 8192 msec/iter = 19.79 ROE[avg,max] = [0.272732764, 0.375000000] radices = 256 32 32 16 0 0 0 0 0 0 9216 msec/iter = 23.01 ROE[avg,max] = [0.242732915, 0.281250000] radices = 288 32 32 16 0 0 0 0 0 0 10240 msec/iter = 27.24 ROE[avg,max] = [0.271287049, 0.375000000] radices = 320 32 32 16 0 0 0 0 0 0 11264 msec/iter = 28.87 ROE[avg,max] = [0.271818621, 0.375000000] radices = 352 32 32 16 0 0 0 0 0 0 12288 msec/iter = 32.04 ROE[avg,max] = [0.259570478, 0.312500000] radices = 768 16 16 32 0 0 0 0 0 0 13312 msec/iter = 37.85 ROE[avg,max] = [0.254703482, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0 14336 msec/iter = 40.34 ROE[avg,max] = [0.234003331, 0.296875000] radices = 224 32 32 32 0 0 0 0 0 0 15360 msec/iter = 43.84 ROE[avg,max] = [0.245504855, 0.312500000] radices = 960 16 16 32 0 0 0 0 0 0 16384 msec/iter = 45.62 ROE[avg,max] = [0.272600878, 0.375000000] radices = 256 32 32 32 0 0 0 0 0 0 18432 msec/iter = 53.16 ROE[avg,max] = [0.236424995, 0.281250000] radices = 288 32 32 32 0 0 0 0 0 0 20480 msec/iter = 62.92 ROE[avg,max] = [0.237479031, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0 22528 msec/iter = 66.03 ROE[avg,max] = [0.228240432, 0.312500000] radices = 352 32 32 32 0 0 0 0 0 0 24576 msec/iter = 69.49 ROE[avg,max] = [0.261424145, 0.343750000] radices = 768 16 32 32 0 0 0 0 0 0 Look forward for their desktop's M1X. Good job.