View Single Post
Old 2021-02-17, 20:31   #11
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·89 Posts
Default

Hello!) I just want to share my experience with Apple M1 CPU.

Compiled smoothly without issues (i have included -DUSE_ARM_V8_SIMD flag according with the README page).

Code:
CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 12.0.0 (clang-1200.0.32.29).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
CPU extensions:

Code:
m1@599160f8-fb7f-41df-adc2-2b7f4da1aac7 src % sysctl hw.optional
hw.optional.floatingpoint: 1
hw.optional.watchpoint: 4
hw.optional.breakpoint: 6
hw.optional.neon: 1
hw.optional.neon_hpfp: 1
hw.optional.neon_fp16: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_crc32: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.amx_version: 2
hw.optional.ucnormal_mem: 1
hw.optional.arm64: 1
./Mlucas -s m -cpu 0:7
Code:
19.1
      2048  msec/iter =    3.32  ROE[avg,max] = [0.215347133, 0.312500000]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =    4.00  ROE[avg,max] = [0.193772149, 0.250000000]  radices = 144 32 16 16  0  0  0  0  0  0
      2560  msec/iter =    4.28  ROE[avg,max] = [0.178074945, 0.234375000]  radices = 160 32 16 16  0  0  0  0  0  0
      2816  msec/iter =    4.98  ROE[avg,max] = [0.194841334, 0.281250000]  radices = 176 32 16 16  0  0  0  0  0  0
      3072  msec/iter =    5.27  ROE[avg,max] = [0.208759866, 0.312500000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =    5.94  ROE[avg,max] = [0.324307345, 0.406250000]  radices = 208 32 16 16  0  0  0  0  0  0
      3584  msec/iter =    6.01  ROE[avg,max] = [0.198822084, 0.250000000]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =    6.54  ROE[avg,max] = [0.187369624, 0.250000000]  radices =  60 32 32 32  0  0  0  0  0  0
      4096  msec/iter =    6.88  ROE[avg,max] = [0.176231022, 0.218750000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =    7.91  ROE[avg,max] = [0.206297821, 0.281250000]  radices = 288 32 16 16  0  0  0  0  0  0
      5120  msec/iter =    8.42  ROE[avg,max] = [0.193601628, 0.250000000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =    9.70  ROE[avg,max] = [0.221504510, 0.281250000]  radices = 352 32 16 16  0  0  0  0  0  0
      6144  msec/iter =   10.67  ROE[avg,max] = [0.183728153, 0.250000000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =   11.75  ROE[avg,max] = [0.176554163, 0.218750000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =   11.84  ROE[avg,max] = [0.213558111, 0.312500000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =   13.14  ROE[avg,max] = [0.211455481, 0.281250000]  radices = 240 16 32 32  0  0  0  0  0  0
      8192  msec/iter =   13.50  ROE[avg,max] = [0.243920143, 0.312500000]  radices = 256 16 32 32  0  0  0  0  0  0
      9216  msec/iter =   15.44  ROE[avg,max] = [0.256431218, 0.343750000]  radices = 288 16 32 32  0  0  0  0  0  0
     10240  msec/iter =   16.75  ROE[avg,max] = [0.293991624, 0.375000000]  radices = 160 32 32 32  0  0  0  0  0  0
     11264  msec/iter =   18.68  ROE[avg,max] = [0.222417407, 0.281250000]  radices = 352 16 32 32  0  0  0  0  0  0
     12288  msec/iter =   21.39  ROE[avg,max] = [0.219849010, 0.281250000]  radices = 192 32 32 32  0  0  0  0  0  0
     13312  msec/iter =   23.73  ROE[avg,max] = [0.258116543, 0.312500000]  radices = 208 32 32 32  0  0  0  0  0  0
     14336  msec/iter =   23.98  ROE[avg,max] = [0.231325382, 0.281250000]  radices = 224 32 32 32  0  0  0  0  0  0
     15360  msec/iter =   26.76  ROE[avg,max] = [0.235138002, 0.281250000]  radices = 240 32 32 32  0  0  0  0  0  0
     16384  msec/iter =   26.98  ROE[avg,max] = [0.230396011, 0.312500000]  radices = 256 32 32 32  0  0  0  0  0  0
     18432  msec/iter =   31.07  ROE[avg,max] = [0.276530284, 0.375000000]  radices = 288 32 32 32  0  0  0  0  0  0
     20480  msec/iter =   35.91  ROE[avg,max] = [0.229381947, 0.312500000]  radices = 320 32 32 32  0  0  0  0  0  0
     22528  msec/iter =   37.85  ROE[avg,max] = [0.235262715, 0.296875000]  radices = 352 32 32 32  0  0  0  0  0  0
     24576  msec/iter =   42.70  ROE[avg,max] = [0.238062530, 0.375000000]  radices = 768 16 32 32  0  0  0  0  0  0
     26624  msec/iter =   60.50  ROE[avg,max] = [0.254043170, 0.312500000]  radices = 208 16 16 16 16  0  0  0  0  0
./Mlucas -s m -cpu 0:3
Looks like threads with heavy load assigned automatically to faster cores.
Code:
19.1
      2048  msec/iter =    3.88  ROE[avg,max] = [0.215133698, 0.312500000]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =    4.84  ROE[avg,max] = [0.194502305, 0.281250000]  radices = 144 32 16 16  0  0  0  0  0  0
      2560  msec/iter =    5.00  ROE[avg,max] = [0.184244498, 0.250000000]  radices =  40 32 32 32  0  0  0  0  0  0
      2816  msec/iter =    6.03  ROE[avg,max] = [0.193770639, 0.250000000]  radices = 176 32 16 16  0  0  0  0  0  0
      3072  msec/iter =    6.17  ROE[avg,max] = [0.209568299, 0.281250000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =    7.15  ROE[avg,max] = [0.221850838, 0.281250000]  radices =  52 32 32 32  0  0  0  0  0  0
      3584  msec/iter =    7.12  ROE[avg,max] = [0.199199621, 0.281250000]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =    7.90  ROE[avg,max] = [0.187449630, 0.250000000]  radices =  60 32 32 32  0  0  0  0  0  0
      4096  msec/iter =    8.21  ROE[avg,max] = [0.174905238, 0.218750000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =    9.57  ROE[avg,max] = [0.205330823, 0.281250000]  radices = 288 32 16 16  0  0  0  0  0  0
      5120  msec/iter =   10.01  ROE[avg,max] = [0.193377434, 0.250000000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =   11.74  ROE[avg,max] = [0.221915271, 0.281250000]  radices = 352 32 16 16  0  0  0  0  0  0
      6144  msec/iter =   12.89  ROE[avg,max] = [0.183260259, 0.250000000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =   14.32  ROE[avg,max] = [0.176914974, 0.250000000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =   14.40  ROE[avg,max] = [0.213720200, 0.281250000]  radices = 224 16 32 32  0  0  0  0  0  0
       7680  msec/iter =   16.16  ROE[avg,max] = [0.211763551, 0.281250000]  radices = 240 16 32 32  0  0  0  0  0  0
Perfomance looks awesome for mobile CPU. Just to compare timings with AXV-2 on i3-8100 (4 cores): M1 much faster.

AXV-2 on i3-8100:
Code:
19.1
      2048  msec/iter =    4.75  ROE[avg,max] = [0.167383863, 0.218750000]  radices = 128 16 16 32  0  0  0  0  0  0
      2304  msec/iter =    5.44  ROE[avg,max] = [0.182823637, 0.218750000]  radices = 144 16 16 32  0  0  0  0  0  0
      2560  msec/iter =    6.29  ROE[avg,max] = [0.224905364, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =    6.63  ROE[avg,max] = [0.183906382, 0.230468750]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =    7.42  ROE[avg,max] = [0.252202803, 0.312500000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =    7.52  ROE[avg,max] = [0.225825548, 0.281250000]  radices = 208 16 16 32  0  0  0  0  0  0
      3584  msec/iter =    8.12  ROE[avg,max] = [0.260567010, 0.375000000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =    9.15  ROE[avg,max] = [0.200714048, 0.281250000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =   10.92  ROE[avg,max] = [0.165220469, 0.218750000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =   11.15  ROE[avg,max] = [0.192892739, 0.250000000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =   12.18  ROE[avg,max] = [0.229244523, 0.312500000]  radices = 160 32 32 16  0  0  0  0  0  0
      5632  msec/iter =   13.47  ROE[avg,max] = [0.187610146, 0.250000000]  radices = 352 16 16 32  0  0  0  0  0  0
      6144  msec/iter =   16.09  ROE[avg,max] = [0.209471649, 0.281250000]  radices = 192 32 32 16  0  0  0  0  0  0
      6656  msec/iter =   16.86  ROE[avg,max] = [0.196862667, 0.250000000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =   17.38  ROE[avg,max] = [0.196444104, 0.250000000]  radices = 224 32 32 16  0  0  0  0  0  0
      7680  msec/iter =   23.23  ROE[avg,max] = [0.239954494, 0.343750000]  radices = 240 32 32 16  0  0  0  0  0  0
      8192  msec/iter =   19.79  ROE[avg,max] = [0.272732764, 0.375000000]  radices = 256 32 32 16  0  0  0  0  0  0
      9216  msec/iter =   23.01  ROE[avg,max] = [0.242732915, 0.281250000]  radices = 288 32 32 16  0  0  0  0  0  0
     10240  msec/iter =   27.24  ROE[avg,max] = [0.271287049, 0.375000000]  radices = 320 32 32 16  0  0  0  0  0  0
     11264  msec/iter =   28.87  ROE[avg,max] = [0.271818621, 0.375000000]  radices = 352 32 32 16  0  0  0  0  0  0
     12288  msec/iter =   32.04  ROE[avg,max] = [0.259570478, 0.312500000]  radices = 768 16 16 32  0  0  0  0  0  0
     13312  msec/iter =   37.85  ROE[avg,max] = [0.254703482, 0.312500000]  radices = 208 32 32 32  0  0  0  0  0  0
     14336  msec/iter =   40.34  ROE[avg,max] = [0.234003331, 0.296875000]  radices = 224 32 32 32  0  0  0  0  0  0
     15360  msec/iter =   43.84  ROE[avg,max] = [0.245504855, 0.312500000]  radices = 960 16 16 32  0  0  0  0  0  0
     16384  msec/iter =   45.62  ROE[avg,max] = [0.272600878, 0.375000000]  radices = 256 32 32 32  0  0  0  0  0  0
     18432  msec/iter =   53.16  ROE[avg,max] = [0.236424995, 0.281250000]  radices = 288 32 32 32  0  0  0  0  0  0
     20480  msec/iter =   62.92  ROE[avg,max] = [0.237479031, 0.312500000]  radices = 320 32 32 32  0  0  0  0  0  0
     22528  msec/iter =   66.03  ROE[avg,max] = [0.228240432, 0.312500000]  radices = 352 32 32 32  0  0  0  0  0  0
     24576  msec/iter =   69.49  ROE[avg,max] = [0.261424145, 0.343750000]  radices = 768 16 32 32  0  0  0  0  0  0
Look forward for their desktop's M1X. Good job.
Lorenzo is offline   Reply With Quote