![]() |
![]() |
#1 |
"Ben"
Feb 2007
23·163 Posts |
![]()
As mentioned here, the microecm/tinyecm code in yafu appears to be helpful in the ggnfs lattice sievers as a replacement for mpqs when splitting cofactors remaining after trial divison.
I pulled the lasieve4_64 code into yafu with the changes to use ecm. I've built it on two different linux systems and with gcc-9, gcc-11, and icc. The gcc-9 build was on WSL and I needed to add -no-pie to CFLAGS, but other than that it went smoothly. Here are some comparative timings on a few inputs. Obviously much more testing is needed, but take this as a good sign that it could be an improvement across the board. 300 bits, I11e, -f 650000 -c 10000 Code:
====== uecm version ====== Total yield: 51520 0/0 mpqs failures, 2821/28 vain mpqs milliseconds total: Sieve 5820 Sched 0 medsched 6400 TD 4910 (Init 1030, MPQS 400) Sieve-Change 17040 34.392u 0.233s 0:35.11 98.6% 0+0k 800+6848io 1pf+0w ====== mpqs version ====== Total yield: 51520 0/0 mpqs failures, 2821/28 vain mpqs milliseconds total: Sieve 5730 Sched 0 medsched 6430 TD 8190 (Init 1180, MPQS 3470) Sieve-Change 16720 37.289u 0.233s 0:37.95 98.8% 0+0k 800+6848io 1pf+0w Code:
====== uecm version ====== Total yield: 112189 0/0 mpqs failures, 77770/22906 vain mpqs milliseconds total: Sieve 23770 Sched 0 medsched 16730 TD 16400 (Init 2270, MPQS 2370) Sieve-Change 23750 80.383u 0.891s 1:21.86 99.2% 0+0k 0+16224io 0pf+0w ====== mpqs version ====== Total yield: 112189 0/0 mpqs failures, 77770/22906 vain mpqs milliseconds total: Sieve 23390 Sched 0 medsched 16300 TD 36800 (Init 2510, MPQS 22050) Sieve-Change 23520 99.705u 0.929s 1:41.19 99.4% 0+0k 8+16224io 0pf+0w Code:
====== uecm version ====== Total yield: 28932 0/0 mpqs failures, 23373/6029 vain mpqs milliseconds total: Sieve 51640 Sched 0 medsched 21420 TD 13870 (Init 1480, MPQS 560) Sieve-Change 19980 106.987u 1.517s 1:48.85 99.6% 0+0k 712+4528io 1pf+0w ====== mpqs version ====== Total yield: 28932 0/0 mpqs failures, 23373/6029 vain mpqs milliseconds total: Sieve 54140 Sched 0 medsched 21630 TD 18650 (Init 1280, MPQS 5680) Sieve-Change 20280 114.655u 1.626s 1:56.85 99.5% 0+0k 0+4528io 0pf+0w Code:
====== uecm version ====== Total yield: 18051 0/0 mpqs failures, 15358/4583 vain mpqs milliseconds total: Sieve 46570 Sched 0 medsched 22940 TD 11160 (Init 760, MPQS 570) Sieve-Change 30600 112.513u 1.480s 1:54.54 99.5% 0+0k 0+2976io 0pf+0w ====== mpqs version ====== Total yield: 18051 0/0 mpqs failures, 15358/4583 vain mpqs milliseconds total: Sieve 48100 Sched 0 medsched 21990 TD 14530 (Init 960, MPQS 3910) Sieve-Change 30460 116.308u 1.485s 1:59.34 98.6% 0+0k 0+2976io 0pf+0w 420 bits, I13e, -f 6500000 -c 2000 Code:
====== uecm version ====== Total yield: 2766 0/0 mpqs failures, 2193/633 vain mpqs milliseconds total: Sieve 21200 Sched 0 medsched 11240 TD 4060 (Init 280, MPQS 100) Sieve-Change 22970 63.047u 0.608s 1:04.10 99.2% 0+0k 0+488io 0pf+0w ====== mpqs version ====== Total yield: 2766 0/0 mpqs failures, 2193/633 vain mpqs milliseconds total: Sieve 21910 Sched 0 medsched 11290 TD 4310 (Init 330, MPQS 560) Sieve-Change 22660 63.718u 0.637s 1:05.06 98.8% 0+0k 8+488io 0pf+0w Contrived case, with three large primes on rational side of I think it was 2^523-1 with 14e. lpbr/a 30, mfbr 90, mfba 60, rlambda 3.6 alambda 2.6 Code:
====== uecm version ====== Total yield: 3442 0/0 mpqs failures, 68120/2701 vain mpqs milliseconds total: Sieve 41680 Sched 0 medsched 16230 TD 110120 (Init 1670, MPQS 88770) Sieve-Change 48560 218.973u 1.240s 3:40.75 99.7% 0+0k 696+704io 1pf+0w ====== mpqs version ====== Total yield: 3590 0/0 mpqs failures, 68138/146190 vain mpqs milliseconds total: Sieve 42420 Sched 0 medsched 15620 TD 134580 (Init 1690, MPQS 114620) Sieve-Change 50040 245.104u 1.206s 4:07.07 99.6% 0+0k 8+736io 0pf+0w That's all for now. Any more successful builds and testing would be great to hear about! Last fiddled with by bsquared on 2022-10-01 at 03:35 Reason: duplicated a code block |
![]() |
![]() |
![]() |
#2 |
"Ben"
Feb 2007
23·163 Posts |
![]()
Just to note, building with CFLAGS=-DGGNFS_MPQS during the final make should enable the original mpqs code.
|
![]() |
![]() |
![]() |
#3 |
Apr 2020
2·52·19 Posts |
![]()
I know the CADO developers don't like to include other people's code, but it's likely that your new ECM code is also faster than CADO's internal ECM that it uses for splitting cofactors. Might be worth dropping them a line just in case some of them are interested.
|
![]() |
![]() |
![]() |
#4 | |
"Ben"
Feb 2007
23×163 Posts |
![]() Quote:
But I've communicated with Paul Z. before, I can see if he's interested. |
|
![]() |
![]() |
![]() |
#5 |
"Ben"
Feb 2007
23×163 Posts |
![]()
I've been studying and working with the ggnfs lattice siever lately. The assembly code is pretty much unassailable, but there are C equivalents for most of the assembly code. I've made some progress understanding these... enough to be able to do some vectorized re-writes using AVX512. The first one I picked is the Sieve-Change section (lasieve_setup). Example timings for the orignial code shown here for a random 512-bit semiprime:
Code:
time ./gnfs-lasieve4I14e -v -f 15300000 -c 2000 -a nfs512_15300k_I14.job gnfs-lasieve4I14e (with asm64): L1_BITS=15, SVN $Revision: 430 $ Warning: lowering FB_bound to 15299999. FBsize 990256+0 (deg 5), 1892784+0 (deg 1) total yield: 3136, q=15302017 (0.05278 sec/rel) ETA 0h00m) 108 Special q, 158 reduction iterations reports: 19855429->2233021->1979662->1025589->638030->385088 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 3136 0/0 mpqs failures, 2729/1443 vain mpqs milliseconds total: Sieve 74220 Sched 0 medsched 23260 TD 13650 (Init 420, MPQS 1030) Sieve-Change 54390 TD side 0: init/small/medium/large/search: 1730 870 610 2010 1100 sieve: init/small/medium/large/search: 1140 15470 920 20890 1290 TD side 1: init/small/medium/large/search: 1980 640 730 1850 510 sieve: init/small/medium/large/search: 830 10280 620 22300 480 171.995u 1.947s 2:54.71 99.5% 0+0k 848+624io 1pf+0w Code:
time ./gnfs-lasieve4I14e -v -f 15300000 -c 2000 -a nfs512_15300k_I14.job gnfs-lasieve4I14e (with asm64): L1_BITS=15, SVN $Revision: 430 $ Warning: lowering FB_bound to 15299999. FBsize 990256+0 (deg 5), 1892784+0 (deg 1) total yield: 3136, q=15302017 (0.04164 sec/rel) ETA 0h00m) 108 Special q, 158 reduction iterations reports: 19853879->2232572->1979660->1025587->638029->385088 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 3136 0/0 mpqs failures, 2729/1442 vain mpqs milliseconds total: Sieve 73830 Sched 0 medsched 25360 TD 12910 (Init 380, MPQS 190) Sieve-Change 18470 TD side 0: init/small/medium/large/search: 1560 880 630 2070 1090 sieve: init/small/medium/large/search: 920 16610 710 20290 1310 TD side 1: init/small/medium/large/search: 1930 520 790 2010 720 sieve: init/small/medium/large/search: 560 11060 810 21380 180 136.897u 1.872s 2:19.38 99.5% 0+0k 840+624io 1pf+0w I'll be maintaining and working with the ggnfs lattice siever in the yafu github. These new changes are available there. Build process is the same, except add CFLAGS=-DAVX512_LASIEVE_SETUP when making the gnfs-lasieve4I executables (after building the liblasieve libraries in the asm/ directory, the same as before). Other sections of the code are looking approachable as well, I hope there is more to come. Last fiddled with by bsquared on 2022-09-30 at 13:55 |
![]() |
![]() |
![]() |
#6 |
I moo ablest echo power!
May 2013
73F16 Posts |
![]()
I think I'm compiling this incorrectly, but how do I compile without AVX-512 instructions? I'm getting a number of similar errors like this:
Code:
recurrence6.c:327:17: error: incompatible types when assigning to type ‘__m512i’ {aka ‘__vector(8) long long int’} from type ‘int’ 327 | k = _mm512_mask_div_epu32(k, m, b, c); |
![]() |
![]() |
![]() |
#7 | |
"Ben"
Feb 2007
23×163 Posts |
![]() Quote:
Last fiddled with by bsquared on 2022-10-02 at 19:32 |
|
![]() |
![]() |
![]() |
#8 |
I moo ablest echo power!
May 2013
111001111112 Posts |
![]() |
![]() |
![]() |
![]() |
#9 | |
"Ben"
Feb 2007
23×163 Posts |
![]() Quote:
Probably -DAVX512_LASIEVE_SETUP will only work with the Intel compiler right now, because I use the _mm512_div_epu32 and _mm512_rem_epu32 intrinsics which I think only Intel has definitions of. Trying to make statically fails because the static cilkrts library doesn't exist... looking into including a copy of the cilkrts library with OpenCilk. Last fiddled with by bsquared on 2022-10-03 at 14:53 |
|
![]() |
![]() |
![]() |
#10 | |
I moo ablest echo power!
May 2013
5×7×53 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#11 |
"Ben"
Feb 2007
23·163 Posts |
![]()
The large prime bucket sieve (lasched) and trial division (MMX-TD) benefitted from AVX512 as well. The changes were actually very similar to things I did in yafu's SIQS.
Here is a summary of a few tests. Test cases: time ./gnfs-lasieve4I11e -v -f 650000 -c 10000 -a nfs300_650k_I11.job time ./gnfs-lasieve4I12e -v -f 900000 -c 5000 -a nfs330_900k_I12.job time ./gnfs-lasieve4I13e -v -f 1600000 -c 5000 -a nfs360_1600k_I13.job time ./gnfs-lasieve4I13e -v -f 2700000 -c 2000 -a nfs390_2700k_I13.job time ./gnfs-lasieve4I13e -v -f 6500000 -c 2000 -a nfs420_6500k_I13.job time ./gnfs-lasieve4I14e -v -f 15300000 -c 2000 -a nfs512_15300k_I14.job time ./gnfs-lasieve4I16e -v -f 50000000 -c 500 -a nfs631_50000k_I16.job The first 6 .jobs were produced by yafu with default parameters for the indicated semiprime sizes. The last one I pulled from the msieve poly request thread on a GNFS C190 and then used yafu to fill with default (probably bad) parameters. Code:
Original GGNFS: size time (s) 300 41 330 53 360 121 390 44 420 59 512 170 631 613 With AVX512 versions of lasched, lasieve_setup, and MMX-TD size time (s) 300 24 330 32 360 92 390 32 420 41 512 121 631 473 The build process is slightly different now. To make with all of the new code do: cd asm make liblasieve.a AVX512_TD=1 make liblasieveI11.a AVX512_TD=1 make liblasieveI12.a AVX512_TD=1 make liblasieveI13.a AVX512_TD=1 make liblasieveI14.a AVX512_TD=1 make liblasieveI15.a AVX512_TD=1 make liblasieveI16.a AVX512_TD=1 cp *.a .. cd .. make all AVX512_LASCHED=1 AVX512_LASETUP=1 AVX512_TD=1 LASTATS=1 I think the new stuff might only build with icc. That's all I've tested with for now, anyway. If you have it, add CC=icc to the make lines. As before, there is a small speedup with tinyecm if all you have is gcc and a non-avx512 system. To verify it all still works, I factored 2^523-1 using the new sievers. Last fiddled with by bsquared on 2022-10-05 at 17:59 Reason: added snfs job results. fixed the build line |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
yafu ignoring yafu.ini | chris2be8 | YAFU | 9 | 2022-02-17 17:52 |
YAFU + GGNFS Confirmation | nivek000 | YAFU | 1 | 2021-12-10 22:35 |
Running YAFU via Aliqueit doesn't find yafu.ini | EdH | YAFU | 8 | 2018-03-14 17:22 |
GGNFS or something better? | Zeta-Flux | Factoring | 1 | 2007-08-07 22:40 |
ggnfs | ATH | Factoring | 3 | 2006-08-12 22:50 |