Compiled it on the usual c5d.9xlarge with 18 cores and 36 threads:
gcc -c -O3 -march=skylake-avx512 -DUSE_AVX512 -DUSE_THREADS ../src/*.c >& build.log
grep -i error build.log
[Assuming above grep comes up empty]
gcc -o Mlucas *.o -lm -lpthread -lrt
-DCARRY_16_WAY is not needed in v18 right?
This time all 18 cores was fastest for some reason.
Code:
18.0
./Mlucas -fftlen 4608 -iters 10000 -nthread 36
4608 msec/iter = 3.24 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 34
4608 msec/iter = 3.18 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 32
4608 msec/iter = 3.15 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 30
4608 msec/iter = 3.07 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 28
4608 msec/iter = 3.03 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 26
4608 msec/iter = 3.08 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:17
4608 msec/iter = 2.96 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:16
4608 msec/iter = 3.12 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:15
4608 msec/iter = 3.09 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:14
4608 msec/iter = 4.05 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:13
4608 msec/iter = 4.18 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 18:35
4608 msec/iter = 3.00 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:34:2
4608 msec/iter = 4.27 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
From the README.html should this be
-cpu 0:n-1 ?
Quote:
Hyperthreaded x86 CPUs: If Intel, use -cpu 0:n, where n is the number of physical cores on your system
|