Thanks, it worked. I tested it in an EC2 instance with 1 core / 2 threads (Xeon Cascadelake) on a 402 bit prime number:

__160 curves at B1=1,000,000:__
GMP-ECM:

**300.6 sec**
AVX-ECM 1 thread:

**179.5 sec**
AVX-ECM 2 threads:

**144.8 sec**
I created a 402 bit composite number with a 30 digit (97 bit) factor and tested it 10 times with B1=1,000,000 (35 digit level) to see how fast the factor was found with GMP-ECM and AVX-ECM:

Code:

GMP-ECM AVX-ECM
curve 23 curves 0-15
curve 34 curves 16-31
curve 48 curves 32-47
curve 81 curves 48-63 (found twice)
curve 89 curves 64-79
curve 129 curves 144-159
curve 135 curves 144-159
curve 168 curves 160-175 (found twice)
curve 195 curves 384-399
curve 264 curves 416-431

So based on the speed from the first test GMP-ECM found the factor 10 times in 2191 seconds and AVX-ECM found the factor 10 (12) times in 1419 seconds.

I took the 12 sigmas values for which AVX-ECM found the factor and tested in GMP-ECM, and it found the factor as well with all the 12 sigmas.