mersenneforum.org APRCL implementations comparison
 Register FAQ Search Today's Posts Mark Forums Read

2015-09-13, 15:46   #1
ldesnogu

Jan 2008
France

24×3×11 Posts
APRCL implementations comparison

A post from Dana pushed me to time both mpz_aprcl and Pari/GP more thoroughly.

Quote:
 Originally Posted by danaj It was testing on an otherwise idle 4.3GHz 4770K running Linux, using Pari 2.7.0 and mpz_aprcl 1.1, At those sizes averaging 5 random n-digit primes. At 500 digits Pari was 20.8s, mpz_aprcl 13.3s. At 600 digits, Pari 35.4s, mpz_aprcl 27.8s. My spreadsheet shows no size where Pari came out faster, so it wasn't just a lucky few numbers. At 2000 digits it was 5039s for Pari, 4374s for mpz_aprcl. GMP 6.0.0a made a big difference, but you have recent versions. I did the timing in May 2014.
Attached is a spreadsheet showing timing from 100 to 1000 digits. Starting at 600 digits Pari/GP is faster (except for 700 digits). I also added the chosen t parameter which seems to be the main difference between both implementations.

The primes were generated using Pari/GP:
Code:
setrand(2)
for (i=1, 10, for (j=1, 10, a=0; until(a > 10^(i*100-1)-1, a=randomprime(10^(i*100))); print(a)))
The environment is as follows:
• 4770K with HT enabled and no OC except RAM @2400.
• Pari git abee4fd1a75
• gmp hg 16822
• mpz_aprcl 1.2
Attached Files
 mpz-aprcl-vs-pari.ods.gz (17.4 KB, 138 views)

2015-09-14, 02:39   #2
danaj

"Dana Jacobsen"
Feb 2011
Bangkok, TH

2·11·41 Posts

I just reran everything on my machine. The inputs are a bit different, but that shouldn't have a really big impact. I get nearly identical times for mpz_aprcl. I get substantially similar ties for dj-ecpp. I get much faster times for Pari than I did back in early 2014 (right after 2.7.0 was released). I'm not sure why, as there is nothing substantially faster about today's HEAD vs. 2.7.0 in APR-CL.

Attaching a similar spreadsheet. Sheet 1 is Idesnogu, Sheet 2 is mine.

Pari has some odd timing behavior. The times for 500 digit are more than double that for 400 digit, but 600 digit is just barely more than 500. 700 is a huge jump, then 800 a small one and 900 almost no difference.

I'd like to see this extended a little father. My results are mpz_aprcl faster to 500 digits, slower at 600, faster at 700 and 800, then slower at 900 and substantially slower at 1000. I'd also like to improve the speed of my ECPP implementation at the larger sizes, but that's another topic.
Attached Files
 mpz-aprcl-vs-pari.ods.gz (60.5 KB, 79 views)

 2015-09-14, 17:45 #3 ldesnogu     Jan 2008 France 24×3×11 Posts Thanks for checking. Your mpz_aprcl results are about ~20% faster than mine, while the Pari/GP ones vary from 10% slower to 60% faster, that's strange. I need to cleanup the changes I made to mpz_aprcl, they bring me a 10%-20% over mpz_aprcl-1.2. What's blocking me is getting a good way to build the list of t to use (this probably requires estimating the cost of computing the Jacobi sums), which would bring mpz_aprcl closer to Pari/GP (I get the same time for one of 1000-digit primes for mpz_aprcl and Pari/GP, while mpz_aprcl-1.2 is 60% slower).
 2015-09-15, 00:26 #4 CRGreathouse     Aug 2006 586310 Posts Did you tune your PARI? Sounds like some cutoffs are misplaced.
2015-09-15, 00:39   #5
danaj

"Dana Jacobsen"
Feb 2011
Bangkok, TH

11100001102 Posts

Quote:
 Originally Posted by CRGreathouse Did you tune your PARI? Sounds like some cutoffs are misplaced.
I did not -- just ./Configure, make gp. I'm running ./Configure --tune now and will see what happens.

 2015-09-15, 03:36 #6 danaj   "Dana Jacobsen" Feb 2011 Bangkok, TH 16068 Posts make clean; ./Configure --tune churned away for an hour and made a tune.h file. The resulting gp produces essentially identical times. My CPU is faster, but memory maybe a little slower (unless you don't have dual channel). That it is faster doesn't surprise me, but the differences in Pari times aren't clear. Nice to hear about your mpz_aprcl speedups.
2015-09-16, 07:58   #7
ldesnogu

Jan 2008
France

24×3×11 Posts

Quote:
 Originally Posted by danaj make clean; ./Configure --tune churned away for an hour and made a tune.h file. The resulting gp produces essentially identical times.
I re-tuned too.

Quote:
 My CPU is faster, but memory maybe a little slower (unless you don't have dual channel). That it is faster doesn't surprise me, but the differences in Pari times aren't clear.
I have dual channel. The difference in Pari/GP times comes from a stupid mistake on my side: my benchmark script used the default Pari/GP of my system... Now I'm 10% behind you which is what I expect given your OC. That leaves the 20% to explain on mpz_aprcl.

Quote:
There's still a lot of work to make it available. I could perhaps put it on Github (David gave me his approval, provided I change the name).

2015-09-20, 23:56   #8
WraithX

Mar 2006

1110110002 Posts

Quote:
 Originally Posted by ldesnogu I re-tuned too. I have dual channel. The difference in Pari/GP times comes from a stupid mistake on my side: my benchmark script used the default Pari/GP of my system... Now I'm 10% behind you which is what I expect given your OC. That leaves the 20% to explain on mpz_aprcl. There's still a lot of work to make it available. I could perhaps put it on Github (David gave me his approval, provided I change the name).
I'd be happy to update mpz_aprcl to include any more speed-ups you have developed. I could even update the copyright notice to indicate that you are a developer of the code, if you like. I would prefer to go this route rather than having multiple/similar projects available online. I'd like this so that everyone would have just one place to go to get the latest/greatest version of mpz_aprcl that is available. Would you be interested in this idea, or would you still like to go the Github route? Either way is fine with me.

Also, I once had some code to generate t values. However, once I incorporated Jason Moxam's Jacobi sums, I was locked in to that particular set of t values. I started writing code to generate new Jacobi sums based on a different set of t values, but was getting decent performance with the original set, so never really finished it. If you're interested, I can pick that back up and try to finish it so that we can test different sets of t values.

2015-09-23, 13:02   #9
ldesnogu

Jan 2008
France

24×3×11 Posts

Quote:
 Originally Posted by WraithX I'd be happy to update mpz_aprcl to include any more speed-ups you have developed. I could even update the copyright notice to indicate that you are a developer of the code, if you like. I would prefer to go this route rather than having multiple/similar projects available online. I'd like this so that everyone would have just one place to go to get the latest/greatest version of mpz_aprcl that is available. Would you be interested in this idea, or would you still like to go the Github route? Either way is fine with me.
My main interest is in learning how to efficiently implement APRCL, it's not mpz_aprcl in itself (and don't take this as a criticism or whatever, I really like the code ). So I think the best thing to do for the community would be that I give you all what I have already done. The problem is that the code has been heavily re-indented and "extended", so the work to get it back into mpz_aprcl.c might be a real pain. I can send you a copy of my git repository if you want to take a look before we decide how to proceed.

Quote:
 Also, I once had some code to generate t values. However, once I incorporated Jason Moxam's Jacobi sums, I was locked in to that particular set of t values. I started writing code to generate new Jacobi sums based on a different set of t values, but was getting decent performance with the original set, so never really finished it. If you're interested, I can pick that back up and try to finish it so that we can test different sets of t values.
I have such an implementation that I pushed up to 18401055938125660800. I also have everything in place to generate the tables needed by mpz_aprcl.

What I miss is a way to select a good value. This is dependent on the underlying hardware, libraries and APRCL implementation performance. Pari/GP is slightly better here, but not perfect (which explains why it loses sometimes against mpz_aprcl). It looks like FLINT APRCL is even faster for some of the ranges (I'll post results later). I also guess these various implementations offer various optimizations missing in the others.

2015-09-23, 13:53   #10
ldesnogu

Jan 2008
France

52810 Posts

Here are corrected results plus new FLINT APRCL results.

The FLINT implementation is always faster than both mpz_aprcl and Pari/GP.
Attached Files
 mpz-aprcl-vs-pari.ods.gz (47.8 KB, 73 views)

 2015-10-28, 00:36 #11 vladgl   "Vladimir Glazachev" Oct 2015 Russia, Saint-Petersburg 1 Posts Hi! I worked on the implementation of APRCL in FLINT this summer during GSoC2015. I wrote some blogposts, so maybe someone find it useful - link. My code is still not merged with FLINT, so that it is not checked very carefully. Also I have ideas what can be done to speed up. Thanks for your timings, thats interesting.

 Similar Threads Thread Thread Starter Forum Replies Last Post CRGreathouse Factoring 3 2018-02-05 14:55 Stargate38 Computer Science & Computational Number Theory 4 2014-11-02 19:25 wblipp Operation Billion Digits 0 2012-11-24 06:33 henryzz Conjectures 'R Us 37 2010-02-19 07:42 devarajkandadai Miscellaneous Math 22 2005-06-10 11:13

All times are UTC. The time now is 22:28.

Fri Aug 7 22:28:08 UTC 2020 up 21 days, 18:14, 1 user, load averages: 1.73, 1.92, 1.93