![]() |
All integer Mersenne prime checker for ARM
[url]https://github.com/ncw/iprime[/url]
|
How does the performance compare to mlucas?
|
[QUOTE=SELROC;491983][url]https://github.com/ncw/iprime[/url][/QUOTE]
One should ask Nick if he used armV8 vectorized instructions... BTW, Ernst Mayer ported his multithreading asm-aware code to Arm as well. It works like a charm on Raspberries and Ernst is working on implementing prp testing as well... |
[QUOTE=ET_;491985]One should ask Nick if he used armV8 vectorized instructions...[/quote]
That's ARMv5T code (for StrongARM as far as I could see). So 32-bit and no SIMD. OTOH I expect the code to be very carefully tuned, Nick is an excellent programmer :smile: |
Look at the dates, the files are mostly from 5 years ago, a few from 3 years ago.
Seems to be abandonware, unfortunately. |
[QUOTE=GP2;492010]Look at the dates, the files are mostly from 5 years ago, a few from 3 years ago.
Seems to be abandonware, unfortunately.[/QUOTE] Dec 2015 for iprime is more recent than the last announced release (not prerelease or beta) of: CUDALucas 2.05.1 Feb 2015 CUDAPm1 0.20 Nov 2013 mfaktc 0.21 Oct 2014 mfakto 0.14 Nov 2014 gpulucas 0.9.4 Feb 2012 and almost clLucas at Jan 2016. Some of these are in heavy current use, because they adequately fulfill the function. Iprime unfortunately seems not to have save files implemented, per the road map in the read.md. |
It actually would be useful to compare timings between Nick's non-SIMD integer code vs Mlucas on ARMv8, if someone has bandwidth to test them both on the same system, I would be appreciative.
|
[QUOTE=ewmayer;492424]It actually would be useful to compare timings between Nick's non-SIMD integer code vs Mlucas on ARMv8, if someone has bandwidth to test them both on the same system, I would be appreciative.[/QUOTE]
1) Would an RPi Model 3B+ suffice? 2) What timings do you need/want? |
[QUOTE=wombatman;492429]1) Would an RPi Model 3B+ suffice?
2) What timings do you need/want?[/QUOTE] That's an ARMv8 CPU, yes? If so, build Mlucas (as per online [url=http://www.mersenneforum.org/mayer/README.html]readme page[/url]) for v8/simd, then: 1. If the integer checker supports multithreading, run Mlucas self-tests on all 4 cores via './Mlucas -s m -iters 100 -cpu 0:3', afterward have a look at the mlucas.cfg file and run the integer code at the nearest FFT lengths to those in whatever short-length timing mode it supports; 2. If the integer checker does not support multithreading, do as in [1] but run Mlucas self-tests on just 2 core via './Mlucas -s m -iters 100'. Since the 2 codes will likely permit appreciably different max-exponents at any given transform length, the timing comparisons will need to be interepreted in that light, i.e. in "timing for comparable exponent" fashion. Thanks! |
Looks like the iprime code is single-threaded, so I ran mlucas single-thread as well. I only did one test of iprime, and here's why:
iprime: Testing 2**20000047-1 with fft size 2**20 for 100 iterations Residue 0xDD61B3E031F1E0BA That took 4m23.108015142s for 100 iterations which is 2.631080151s per iteration mlucas: 1024 msec/iter = 159.98 (used the exponent here for the iprime test) So mlucas is ~10x faster than iprime for the same FFT size and exponent. And iprime crashed trying to run p = 49005071. |
Thanks, Wombatman! A quick determination of "it's hopeless" is arguably as useful as the "this looks promising" ones.
I got only a modest 1.5x per-cycle boost from SIMD-assembly-versus-not on ARMv8 (I suspect because, unlike x86, the ARM is designed to share as many underlying functional units between SIMD and non-SIMD instructions as possible, i.e. both SIMD and basic C-code access the same number of hardware resources, but careful SIMD coding makes better use of same), so the non-SIMD using nature of the integer code would seem to serve as only a small mitigation. |
| All times are UTC. The time now is 17:44. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.