![]() |
|
|
#1 |
|
∂2ω=0
Sep 2002
República de California
2DEC16 Posts |
Mlucas v18 has gone live. Use this thread to report bugs, build issues, and for any other related discussion.
Last fiddled with by ewmayer on 2019-03-06 at 21:55 |
|
|
|
|
|
#2 |
|
Jul 2009
Germany
2C216 Posts |
I always wanted to try it out, but unfortunately I can not compile multi-threaded, because I still use windows 7 professional. Would be great if someone would upload an exe file for the AMD K -10 architecture.
|
|
|
|
|
|
#3 |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
Must be my birthday :)
|
|
|
|
|
|
#4 |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
It compiles and doesn't seg fault on the Samsung S7, well done on fixing the Arm issues this is great.
|
|
|
|
|
|
#5 |
|
Jul 2009
Germany
2×353 Posts |
|
|
|
|
|
|
#6 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
|
|
|
|
|
|
#7 | ||
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
Quote:
Sounds like your phone has a Snapdragon 415 which is a 28nm 4xA53 4xA53. It should work but unfortunately doesn't come close in efficiency to an S7's 14nm 4xM1 4xA53. It should handily beat a raspberry pi 3's 40nm in efficiency and throughput and slot somewhere behind the 20nm 10 core Helio X25 ( https://www.mersenneforum.org/showpo...8&postcount=83 ). Attached is the v18 ARM asimd binary from the S7 on the offchance you find it useful, AFAIK you need a rooted phone to run it and if you have a rooted phone you could easily build mlucas from source yourself but there it is. Quote:
I'll try and create an APK tomorrow, there's a chance it works where the v17.1 failed as there were clobber-related error messages like this: Code:
/home/u18/AndroidStudioProjects/MlucasAPK/app/src/main/cpp/mi64.c:813:19: error: unknown register name 'rax' in asm
: "cc","memory","rax","rbx","rcx","rsi","r10","r11" /* Clobbered registers */\
|
||
|
|
|
|
|
#8 | ||
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#9 |
|
Jul 2009
Germany
2×353 Posts |
|
|
|
|
|
|
#10 | |
|
Einyen
Dec 2003
Denmark
22×863 Posts |
Compiled it on the usual c5d.9xlarge with 18 cores and 36 threads:
gcc -c -O3 -march=skylake-avx512 -DUSE_AVX512 -DUSE_THREADS ../src/*.c >& build.log grep -i error build.log [Assuming above grep comes up empty] gcc -o Mlucas *.o -lm -lpthread -lrt -DCARRY_16_WAY is not needed in v18 right? This time all 18 cores was fastest for some reason. Code:
18.0
./Mlucas -fftlen 4608 -iters 10000 -nthread 36
4608 msec/iter = 3.24 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 34
4608 msec/iter = 3.18 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 32
4608 msec/iter = 3.15 ROE[avg,max] = [0.246743758, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 30
4608 msec/iter = 3.07 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 28
4608 msec/iter = 3.03 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -nthread 26
4608 msec/iter = 3.08 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:17
4608 msec/iter = 2.96 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:16
4608 msec/iter = 3.12 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:15
4608 msec/iter = 3.09 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:14
4608 msec/iter = 4.05 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:13
4608 msec/iter = 4.18 ROE[avg,max] = [0.246727988, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 18:35
4608 msec/iter = 3.00 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
./Mlucas -fftlen 4608 -iters 10000 -cpu 0:34:2
4608 msec/iter = 4.27 ROE[avg,max] = [0.246740330, 0.312500000] radices = 144 16 32 32 0 0 0 0 0 0 10000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 13BB5C9DDF0CD3D6, 15982066709, 51703797107
From the README.html should this be -cpu 0:n-1 ? Quote:
Last fiddled with by ATH on 2019-02-22 at 12:07 |
|
|
|
|
|
|
#11 | ||
|
∂2ω=0
Sep 2002
República de California
101101111011002 Posts |
Correct - if you open platform.h and search for CARRY_16_WAY you'll see it's now on by default for avx-512 builds.
Quote:
Quote:
From a job-management perspective it's of course easier to just run 1 job using all the physical cores, and as long as n <= 4 one won't sacrifice much total throughput by doing so. So on both my non-HT Intel quad Haswell and my quad-ARM64-core Odroid C2 I use -cpu 0:3, as I do on my HT-enabled dual-core Intel Broadwell NUC because there I want to use 2-threads-per-physical-core and a single 4-thread job gives me nearly the same throughput as separate jobs using -cpu 0,2 and -cpu 1,3. I need to carefully re-read the README.html page to try to catch remaining such ,-versus-: mixups, because they are easy to overlook. |
||
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Mlucas version 17.1 | ewmayer | Mlucas | 96 | 2019-10-16 12:55 |
| Mlucas on ubuntu | Damian | Mlucas | 17 | 2017-11-13 18:12 |
| Mlucas version 17 | ewmayer | Mlucas | 3 | 2017-06-17 11:18 |
| MLucas on IBM Mainframe | Lorenzo | Mlucas | 52 | 2016-03-13 08:45 |
| mlucas on sun | delta_t | Mlucas | 14 | 2007-10-04 05:45 |