mersenneforum.org > YAFU AVX-ECM
 Register FAQ Search Today's Posts Mark Forums Read

2020-11-01, 17:02   #56
wombatman
I moo ablest echo power!

May 2013

6CD16 Posts

Quote:
 Originally Posted by bsquared Looks like you just need to use more makefile options. You need at least NFS=1 to bring in the nfs stuff and SKYLAKEX=1 for the avx512 vector arithmetic stuff. And if you use SKYLAKEX=1 then you also need USE_AVX2=1. My typical build line looks like this: make NFS=1 USE_AVX2=1 USE_BMI2=1 SKYLAKEX=1 also, ECM=1 is a msieve thing, not a yafu thing, so you can leave that off.
Sorry, I should have been clearer. The CPU is an Ivy Bridge-E, so it can't do any of the AVX-ECM stuff (to my understanding).

2020-11-01, 17:06   #57
bsquared

"Ben"
Feb 2007

D2B16 Posts

Quote:
 Originally Posted by wombatman Sorry, I should have been clearer. The CPU is an Ivy Bridge-E, so it can't do any of the AVX-ECM stuff (to my understanding).
Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.

2020-11-01, 17:18   #58
wombatman
I moo ablest echo power!

May 2013

1,741 Posts

Quote:
 Originally Posted by bsquared Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.
No worries! If it's just a simple makefile change, I'm good with doing that.

2020-11-01, 17:26   #59
bsquared

"Ben"
Feb 2007

64538 Posts

Quote:
 Originally Posted by bsquared Ah, ok. I have only checked that things work *with* avx512, did not stop to consider that things might break without it. Thanks for the report... stay tuned.
Ok, try r394.

2020-11-01, 17:33   #60
wombatman
I moo ablest echo power!

May 2013

174110 Posts

Quote:
 Originally Posted by bsquared Ok, try r394.
Builds successfully and seems to be working well using factor(rsa(300)) to check both ECM and SIQS (my primary usage). Thanks very much for the quick response!

 2020-11-02, 18:49 #61 bsquared     "Ben" Feb 2007 D2B16 Posts addmod and submod can also be simplified for Mersenne inputs and this has a measurable improvement... about 3% faster in stage 1 and 7% in stage 2. Both yafu and avx-ecm repos have been updated. Also, I've seen that avx-ecm is about 10-20% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large. I've done some testing, but if anyone is able to test either package on finding known Mersenne factors, that'd be great. Last fiddled with by bsquared on 2020-11-02 at 18:50
2020-11-02, 19:34   #62
bsquared

"Ben"
Feb 2007

3,371 Posts

Quote:
 Originally Posted by bsquared Also, I've seen that avx-ecm is about 10-20% faster than yafu. I'm not sure why, the only difference that I can see is that yafu has a lot more code around it and I can't see how the overhead would be that large.
Ok, most of that is explained by different compiler options. avx-ecm was using -O3 and yafu -O2. With -O3, yafu is much better, but avx-ecm is still 5% faster or so.

Given all of the above, for 2^1277-1, AVX-ECM is 6.2 times faster than GMP-ECM now. I hope I have GMP-ECM configured optimally for this system for a fair comparison... here is what it shows for a run with -v:

Code:
 echo "2^1277-1" | ~/ecm-704-linux/ecm -v 7000000
GMP-ECM 7.0.4 [configured with GMP 6.2.0, --enable-asm-redc] [ECM]
Tuned for x86_64/k8/params.h
Running on cpu
Input number is 2^1277-1 (385 digits)
Using special division for factor of 2^1277-1
Using B1=7000000, B2=17125579390, polynomial Dickson(12), sigma=0:1088740413510573297
dF=16384, k=6, d=158340, d2=11, i0=34
Expected number of curves to find a factor of n digits:
35      40      45      50      55      60      65      70      75      80
167     1024    7351    60402   558826  5744532 6.5e+07 7.8e+08 1.1e+10 2.8e+11
Step 1 took 60676ms
B2 takes 32 sec vs. 55 sec for 8 curves (@ B2=100*B1)

2020-11-02, 22:15   #63
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

10010010001012 Posts

Quote:
 Originally Posted by bsquared B2 takes 32 sec vs. 55 sec for 8 curves (@ B2=100*B1)
Which one is which for this comparison? Are both GMP-ECM and avx-ecm running the same B2, or is the longer time related to GMP-ECM running its standard B2 level?

Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMP-ECM at all; I haven't quite decided which way to go for stage 2.

Also, if I'm reading the start of this thread correctly, the avx-ECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMP-ECM stage 1 speed, while on special mersenne numbers it's 5x or more?

My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMP-ECM stage 2, while production work may call for using the stage1-save option and GMP-ECM for stage 2.

2020-11-02, 22:36   #64
bsquared

"Ben"
Feb 2007

3,371 Posts

Quote:
 Originally Posted by VBCurtis Which one is which for this comparison? Are both GMP-ECM and avx-ecm running the same B2, or is the longer time related to GMP-ECM running its standard B2 level? Your stage 1 might be so fast that it calls for running the lesser stage 2 and not using GMP-ECM at all; I haven't quite decided which way to go for stage 2. Also, if I'm reading the start of this thread correctly, the avx-ECM speedup on generic smallish (say, 1000 bits or under) speedup is around double GMP-ECM stage 1 speed, while on special mersenne numbers it's 5x or more? My use cases are M1277 in the short run, and production work around 900 bits with no special form in the long run. Seems maybe M1277 calls for no GMP-ECM stage 2, while production work may call for using the stage1-save option and GMP-ECM for stage 2.
gmp-ecm ran its larger stage 2 in 32 seconds while avx-ecm ran its smaller stage 2 in 55 seconds, or 6.8 sec/curve.
gmp-ecm B2 was B2=17125579390
avx-ecm B2 was 700000000 (24.4 times smaller)

This B2 size disparity will grow the larger B1 gets.
Also complicating matters is that gmp-ecm uses the Brent-Suyama extension for stage 2, which increases the odds of finding a factor slightly. avx-ecm doesn't do that.

So unfortunately it is a complicated tradeoff that has to consider factor-finding probabilities as well as speed. I don't think there is enough data for avx-ecm yet to compare probability of finding a factor per unit time (or per curve), compared to gmp-ecm. Not sure I would even know how to do that analysis if I had the data, although maybe I could figure it out.

ATH's data set here did show avx-ecm finding *more* factors per curve than gmp-ecm did, but stuff like that will happen for randomized algorithms and small data sets.

After thinking about it a bit, I think we could extract the probabilities from gmp-ecm -v, while running with power 1 (no brent-suyama) and forcing B2=100B1.

Last fiddled with by bsquared on 2020-11-02 at 22:37

2020-11-02, 22:47   #65
bsquared

"Ben"
Feb 2007

3,371 Posts

Quote:
 Originally Posted by bsquared After thinking about it a bit, I think we could extract the probabilities from gmp-ecm -v, while running with power 1 (no brent-suyama) and forcing B2=100B1.
Here's what gmp-ecm spit out:

Input number is (2^1277-1) (385 digits)
Using special division for factor of 2^1277-1
Using B1=10000000000, B2=637998273712822, polynomial Dickson(30), sigma=0:6343133439254916928
dF=4194304, k=3, d=48498450, d2=23, i0=184
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
7 19 52 161 545 1993 7815 32642 144421 673682

Input number is (2^1277-1) (385 digits)
Using special division for factor of 2^1277-1
Using B1=10000000000, B2=1000000000000, polynomial x^1, sigma=0:11834000506611778745
dF=131072, k=6, d=1345890, d2=11, i0=7420
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
16 44 134 442 1578 6046 24696 106998 489087 2349846

The 75-digit number of curves ratio is 3.38, which is essentially a multiplier in favor of gmp-ecm curves.

So if avx-ecm is 6.2 times higher throughput, but 3.38 times less likely to find a factor at t75, then the net gain for avx-ecm is about 1.83 for 2^1277-1.

Similar calculations would have to be applied to generic inputs at particular B1's, I think. gmp-ecm's advantage is its vastly superior stage 2. So the avx-ecm stage 1 followed by gmp-ecm stage 2 recipe is probably a winner in a lot of cases. But for 2^1277-1 it looks like pure avx-ecm wins.

Thoughts?

 2020-11-03, 01:47 #66 VBCurtis     "Curtis" Feb 2005 Riverside, CA 467710 Posts When GMP-ECM is faster for Stage 2 with a higher B2, there is only advantage from using it. Save time, get more work- what's not to like? The difficult tradeoff would be on larger numbers, when the avx-ecm stage 2 is likely faster than GMP-ECM but a smaller B2. That leads to some need for study.