View Single Post
Old 2020-03-07, 11:54   #6
kuratkull
 
kuratkull's Avatar
 
Mar 2007
Estonia

2×67 Posts
Default

(running all this on a Skylake i7)

Using zero-padded FMA3 FFT length 384K for both:
4 threads: 39547695*2^3664022-1 is not prime. LLR Res64: DA225779C3F421FD Time : 1409.882 sec.
3 threads: 39547695*2^3664034-1 is not prime. LLR Res64: 7F70D51694B8E6AF Time : 1645.871 sec.
384 / 4 = 96
384 / 3 = 128
I would have expected it to work better with 3 threads. Will have to look into optimal settings/recommendations for the user.

Also I updated my code so it should be at least as fast as LLR64, most likely faster (due to me not using checks and safety features in the core loop). On my CPU RPT seems to be about 1% faster.

39547695*2^506636-1 has 50KB fft size:

./rpt 39547695 506636 1 took 68820ms
./rpt 39547695 506636 2 took 65739ms
./rpt 39547695 506636 3 took 57495ms
./rpt 39547695 506636 4 took 63035ms

./llr64 -d -q"39547695*2^506636-1" -t1 -> Time : 69.294 sec.
./llr64 -d -q"39547695*2^506636-1" -t2 -> Time : 66.765 sec.
./llr64 -d -q"39547695*2^506636-1" -t3 -> Time : 59.057 sec.
./llr64 -d -q"39547695*2^506636-1" -t4 -> Time : 63.470 sec.

And comparing larger n's:
4 threads: 39547695*2^3664022-1 is not prime. LLR Res64: DA225779C3F421FD Time : 1409.882 sec.
4 threads: rpt 1397.022 seconds
I pushed the new binary (0.0.2) to Github releases: https://github.com/dkull/rpt/releases/tag/v0.0.2
kuratkull is offline   Reply With Quote