mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   All integer Mersenne prime checker for ARM (https://www.mersenneforum.org/showthread.php?t=23511)

SELROC 2018-07-17 10:49

All integer Mersenne prime checker for ARM
 
[url]https://github.com/ncw/iprime[/url]

henryzz 2018-07-17 11:00

How does the performance compare to mlucas?

ET_ 2018-07-17 11:04

[QUOTE=SELROC;491983][url]https://github.com/ncw/iprime[/url][/QUOTE]

One should ask Nick if he used armV8 vectorized instructions...

BTW, Ernst Mayer ported his multithreading asm-aware code to Arm as well. It works like a charm on Raspberries and Ernst is working on implementing prp testing as well...

ldesnogu 2018-07-17 11:11

[QUOTE=ET_;491985]One should ask Nick if he used armV8 vectorized instructions...[/quote]
That's ARMv5T code (for StrongARM as far as I could see). So 32-bit and no SIMD.


OTOH I expect the code to be very carefully tuned, Nick is an excellent programmer :smile:

GP2 2018-07-17 17:10

Look at the dates, the files are mostly from 5 years ago, a few from 3 years ago.

Seems to be abandonware, unfortunately.

kriesel 2018-07-17 23:27

[QUOTE=GP2;492010]Look at the dates, the files are mostly from 5 years ago, a few from 3 years ago.

Seems to be abandonware, unfortunately.[/QUOTE]
Dec 2015 for iprime is more recent than the last announced release (not prerelease or beta) of:
CUDALucas 2.05.1 Feb 2015
CUDAPm1 0.20 Nov 2013
mfaktc 0.21 Oct 2014
mfakto 0.14 Nov 2014
gpulucas 0.9.4 Feb 2012
and almost clLucas at Jan 2016.
Some of these are in heavy current use, because they adequately fulfill the function.

Iprime unfortunately seems not to have save files implemented, per the road map in the read.md.

ewmayer 2018-07-24 20:30

It actually would be useful to compare timings between Nick's non-SIMD integer code vs Mlucas on ARMv8, if someone has bandwidth to test them both on the same system, I would be appreciative.

wombatman 2018-07-24 23:50

[QUOTE=ewmayer;492424]It actually would be useful to compare timings between Nick's non-SIMD integer code vs Mlucas on ARMv8, if someone has bandwidth to test them both on the same system, I would be appreciative.[/QUOTE]

1) Would an RPi Model 3B+ suffice?
2) What timings do you need/want?

ewmayer 2018-07-25 20:53

[QUOTE=wombatman;492429]1) Would an RPi Model 3B+ suffice?
2) What timings do you need/want?[/QUOTE]

That's an ARMv8 CPU, yes? If so, build Mlucas (as per online [url=http://www.mersenneforum.org/mayer/README.html]readme page[/url]) for v8/simd, then:

1. If the integer checker supports multithreading, run Mlucas self-tests on all 4 cores via './Mlucas -s m -iters 100 -cpu 0:3', afterward have a look at the mlucas.cfg file and run the integer code at the nearest FFT lengths to those in whatever short-length timing mode it supports;

2. If the integer checker does not support multithreading, do as in [1] but run Mlucas self-tests on just 2 core via './Mlucas -s m -iters 100'.

Since the 2 codes will likely permit appreciably different max-exponents at any given transform length, the timing comparisons will need to be interepreted in that light, i.e. in "timing for comparable exponent" fashion.

Thanks!

wombatman 2018-07-27 04:05

Looks like the iprime code is single-threaded, so I ran mlucas single-thread as well. I only did one test of iprime, and here's why:

iprime:

Testing 2**20000047-1 with fft size 2**20 for 100 iterations
Residue 0xDD61B3E031F1E0BA
That took 4m23.108015142s for 100 iterations which is 2.631080151s per iteration

mlucas:

1024 msec/iter = 159.98 (used the exponent here for the iprime test)

So mlucas is ~10x faster than iprime for the same FFT size and exponent. And iprime crashed trying to run p = 49005071.

ewmayer 2018-07-27 22:13

Thanks, Wombatman! A quick determination of "it's hopeless" is arguably as useful as the "this looks promising" ones.

I got only a modest 1.5x per-cycle boost from SIMD-assembly-versus-not on ARMv8 (I suspect because, unlike x86, the ARM is designed to share as many underlying functional units between SIMD and non-SIMD instructions as possible, i.e. both SIMD and basic C-code access the same number of hardware resources, but careful SIMD coding makes better use of same), so the non-SIMD using nature of the integer code would seem to serve as only a small mitigation.


All times are UTC. The time now is 17:44.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.