mersenneforum.org > YAFU ggnfs improvements
 Register FAQ Search Today's Posts Mark Forums Read

2022-10-13, 21:54   #34
bsquared

"Ben"
Feb 2007

E9516 Posts

Quote:
 Originally Posted by bsquared Seems to be working for me, so the precompiled binary probably needs the dynamic library -cilkrts. But I'm surprised it doesn't just error out for you. You could try the opencilk git repository. Here's what I get: Code: time ./gnfs-lasieve4I15e -v -f 100000000 -c 1000 -r snfs_100000k_I15.txt gnfs-lasieve4I15e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1): L1_BITS=15 Warning: lowering FB_bound to 99999999. FBsize 14496980+0 (deg 6), 5761454+0 (deg 1) total yield: 3248, q=100001029 (0.11645 sec/rel) ETA 0h00m) 54 Special q, 422 reduction iterations reports: 132797745->29057200->27213568->21932573->8129900->4480475 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 3248 0/0 mpqs failures, 44983/2850 vain mpqs milliseconds total: Sieve 66710 Sched 110120 medsched 4950 TD 117090 (Init 3730, MPQS 62140) Sieve-Change 40, lasieve_setup 79330 TD side 0: init/small/medium/large/search: 1540 10910 1100 8060 10180 sieve: init/small/medium/large/search: 1220 16450 1090 11410 4920 TD side 1: init/small/medium/large/search: 820 7180 890 5550 4550 sieve: init/small/medium/large/search: 1210 19470 1050 8430 1460 421.525u 4.794s 7:12.06 98.6% 0+0k 0+736io 0pf+0w Here's the original for comparison: Code: time ./gnfs-lasieve4I15e-mpqs -v -f 100000000 -c 1000 -r snfs_100000k_I15.txt gnfs-lasieve4I15e (with asm64): L1_BITS=15, SVN $Revision: 430$ Warning: lowering FB_bound to 99999999. FBsize 14496980+0 (deg 6), 5761454+0 (deg 1) total yield: 3357, q=100001029 (0.17463 sec/rel) ETA 0h00m) 54 Special q, 422 reduction iterations reports: 132797299->29063205->27210474->21929840->8129762->4480483 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 3357 0/0 mpqs failures, 44987/87347 vain mpqs milliseconds total: Sieve 66740 Sched 132560 medsched 12840 TD 147830 (Init 3470, MPQS 93390) Sieve-Change 30, lasieve_setup 226240 TD side 0: init/small/medium/large/search: 2170 10000 1090 7830 9740 sieve: init/small/medium/large/search: 1260 17260 1110 11360 4220 TD side 1: init/small/medium/large/search: 1560 7200 1050 5340 4460 sieve: init/small/medium/large/search: 1060 19700 840 8760 1170 630.200u 4.517s 10:35.57 99.8% 0+0k 784+760io 1pf+0w Tinyecm is missing some of the 3LP's. I should look into putting mpqs back in there as a fallback if tinyecm fails to factor.
I improved the tinyecm 3LP factorization strategy a bit, and made a 104-bit vectorized version of it. This helps 3LP jobs.

Code:
time ./gnfs-lasieve4I15e -v -f 100000000 -c 1000 -r snfs_100000k_I15.txt
gnfs-lasieve4I15e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1,avx-512 ecm): L1_BITS=15
Warning:  lowering FB_bound to 99999999.
FBsize 14496980+0 (deg 6), 5761454+0 (deg 1)
total yield: 3292, q=100001029 (0.09862 sec/rel) ETA 0h00m)
54 Special q, 422 reduction iterations
reports: 132797745->29057200->27213568->21932573->8129900->4480475
Number of relations with k rational and l algebraic primes for (k,l)=:

Total yield: 3292
0/0 mpqs failures, 44981/2888 vain mpqs
milliseconds total: Sieve 138570 Sched 0 medsched 34580
TD 77570 (Init 3540, MPQS 25900) Sieve-Change 20, lasieve_setup 73940
TD side 0: init/small/medium/large/search: 1630 10550 970 7180 9370
sieve: init/small/medium/large/search: 1480 17190 930 45940 4380
TD side 1: init/small/medium/large/search: 520 6840 1030 5140 4570
sieve: init/small/medium/large/search: 1130 20930 990 44170 1430
321.303u 4.323s 5:26.59 99.7%   0+0k 2840+752io 1pf+0w
Also updated the static binaries.

Last fiddled with by bsquared on 2022-10-13 at 21:55

 2022-10-14, 04:23 #35 VBCurtis     "Curtis" Feb 2005 Riverside, CA 33×11×19 Posts I recognise that you're not offering to become the official ggnfs dev, but your tinyecm speed enhancements make ggnfs massively more interesting than it was a few months ago for cutting-edge work. The cutting edge would benefit from the 16e siever working properly with -J 16 flag, which would make it effectively 16.5e. This flag works on 15e -J 15, and sometimes works as 16e -J 16 but sometimes crashes. There's a small chance those crashes can be fixed, and even a chance your new code happens to remedy the code path that caused the intermittent crashing. If -J 16 can be used, we in principle could factor SNFS-350 with ggnfs, or GNFS-235ish. Of course, we can just use CADO for extra-large sieve regions... but a new ggnfs revision holds out hope to be BOINCified to extend the life of the big nfs@home queue.
2022-10-14, 10:21   #36
henryzz
Just call me Henry

"David"
Sep 2007
Liverpool (GMT/BST)

24·13·29 Posts

Quote:
 Originally Posted by VBCurtis I recognise that you're not offering to become the official ggnfs dev, but your tinyecm speed enhancements make ggnfs massively more interesting than it was a few months ago for cutting-edge work. The cutting edge would benefit from the 16e siever working properly with -J 16 flag, which would make it effectively 16.5e. This flag works on 15e -J 15, and sometimes works as 16e -J 16 but sometimes crashes. There's a small chance those crashes can be fixed, and even a chance your new code happens to remedy the code path that caused the intermittent crashing. If -J 16 can be used, we in principle could factor SNFS-350 with ggnfs, or GNFS-235ish. Of course, we can just use CADO for extra-large sieve regions... but a new ggnfs revision holds out hope to be BOINCified to extend the life of the big nfs@home queue.
On a side note to this:
Ben is rewriting much of the asm code as it is. How much more difficult would it be to create a version that would support I>16 compared with the work he is currently doing? Would we expect this to use an insane amount of memory(2x + another 2x for 17e vs 16e is my guess) or run loads slower(1.5x to 2x is my guess)?

Multithreading is the other improvement that would be seriously useful especially if it doesn't increase memory useage much over a single thread(and reduces it over multiple parallel runs).

Both of these ideas would probably be quite a bit of work but it appears that the ggnfs siever still has some milage vs the CADO siever.

Last fiddled with by henryzz on 2022-10-14 at 10:22

2022-10-14, 13:16   #37
bsquared

"Ben"
Feb 2007

3,733 Posts

Quote:
 Originally Posted by VBCurtis I recognise that you're not offering to become the official ggnfs dev, but your tinyecm speed enhancements make ggnfs massively more interesting than it was a few months ago for cutting-edge work. The cutting edge would benefit from the 16e siever working properly with -J 16 flag, which would make it effectively 16.5e. This flag works on 15e -J 15, and sometimes works as 16e -J 16 but sometimes crashes. There's a small chance those crashes can be fixed, and even a chance your new code happens to remedy the code path that caused the intermittent crashing. If -J 16 can be used, we in principle could factor SNFS-350 with ggnfs, or GNFS-235ish. Of course, we can just use CADO for extra-large sieve regions... but a new ggnfs revision holds out hope to be BOINCified to extend the life of the big nfs@home queue.
I can look into a fix, sure. Do you know if the crashes are repeatable? And if so, do you have a test case?

2022-10-14, 13:35   #38
wreck

"Bo Chen"
Oct 2005
Wuhan,China

3×61 Posts

The new binary seems a bit faster again, but relations sieved still less than official ggnfs.

533 version
Code:
chenbo@chenbo:~/D/chenbo/my/math/cpp/yafu/yafu533/yafu-master/factor/lasieve4_64$./gnfs-lasieve4I16e -v -f 316000000 -c 1000 -o R1340L_16e_a_316000000_316001000.out -a R1340L_poly.txt -R gnfs-lasieve4I16e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1): L1_BITS=15 Resuming with -f 316000000 -c 1000 Warning: lowering FB_bound to 315999999. FBsize 17068601+0 (deg 8), 26355865+0 (deg 1) total yield: 1016, q=316001009 (0.74686 sec/rel) ETA 0h00m) 48 Special q, 369 reduction iterations reports: 196252933->18535990->16754616->14918491->7199150->2604600 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 1016 0/0 mpqs failures, 615/5273 vain mpqs milliseconds total: Sieve 153464 Sched 322285 medsched 342 TD 150804 (Init 2311, MPQS 47217) Sieve-Change 34, lasieve_setup 131883 TD side 0: init/small/medium/large/search: 2054 25496 442 18705 8146 sieve: init/small/medium/large/search: 3440 40229 355 22448 7287 TD side 1: init/small/medium/large/search: 2634 18516 440 17998 6464 sieve: init/small/medium/large/search: 1880 52551 356 23756 1162 chenbo@chenbo:~/D/chenbo/my/math/cpp/yafu/yafu533/yafu-master/factor/lasieve4_64$
535 version
Code:
chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$./cmd_R1340L_31.sh ./gnfs-lasieve4I16e_bsquared2 -v -f 316000000 -c 1000 -o R1340L_16e_a_316000000_316001000.out -a R1340L_poly.txt -R gnfs-lasieve4I16e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1): L1_BITS=15 Resuming with -f 316000000 -c 1000 Warning: lowering FB_bound to 315999999. FBsize 17068601+0 (deg 8), 26355865+0 (deg 1) total yield: 1046, q=316001009 (0.70359 sec/rel) ETA 0h00m) 48 Special q, 369 reduction iterations reports: 196252933->18535990->16754616->14918491->7199150->2604600 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 1046 0/0 mpqs failures, 620/5273 vain mpqs milliseconds total: Sieve 383352 Sched 0 medsched 98730 TD 121986 (Init 2296, MPQS 17974) Sieve-Change 35, lasieve_setup 131862 TD side 0: init/small/medium/large/search: 2081 25563 445 18850 8096 sieve: init/small/medium/large/search: 3398 40614 366 136886 7334 TD side 1: init/small/medium/large/search: 2637 18572 436 18108 6545 sieve: init/small/medium/large/search: 1959 52980 368 138268 1179 chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$

official 441 version (compiled by myself using Edh's method)
Code:
chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$./cmd_R1340L_31.sh ./gnfs-lasieve4I16e_boc_L15 -v -f 316000000 -c 1000 -o R1340L_16e_a_316000000_316001000.out -a R1340L_poly.txt -R gnfs-lasieve4I16e (with asm64): L1_BITS=15, SVN$Revision: 430 $Resuming with -f 316000000 -c 1000 Warning: lowering FB_bound to 315999999. FBsize 17068601+0 (deg 8), 26355865+0 (deg 1) total yield: 1242, q=316001009 (0.92359 sec/rel) ETA 0h00m) 48 Special q, 369 reduction iterations reports: 196265020->18544215->16754559->14918439->7199123->2604605 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 1242 3/0 mpqs failures, 59841/21024 vain mpqs milliseconds total: Sieve 533510 Sched 0 medsched 91584 TD 166418 (Init 2157, MPQS 55174) Sieve-Change 355597 TD side 0: init/small/medium/large/search: 5852 24449 412 17979 7413 sieve: init/small/medium/large/search: 3872 44722 347 207775 7318 TD side 1: init/small/medium/large/search: 11239 17804 415 17312 5834 sieve: init/small/medium/large/search: 2917 56320 347 208746 1147 chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$
used polynomial file attached.
Attached Files
 R1340L_poly.txt (903 Bytes, 38 views)

2022-10-14, 13:52   #39
bsquared

"Ben"
Feb 2007

E9516 Posts

Quote:
 Originally Posted by wreck The new binary seems a bit faster again, but relations sieved still less than official ggnfs. 535 version Code: chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$./cmd_R1340L_31.sh ./gnfs-lasieve4I16e_bsquared2 -v -f 316000000 -c 1000 -o R1340L_16e_a_316000000_316001000.out -a R1340L_poly.txt -R gnfs-lasieve4I16e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1): L1_BITS=15 Resuming with -f 316000000 -c 1000 Warning: lowering FB_bound to 315999999. FBsize 17068601+0 (deg 8), 26355865+0 (deg 1) total yield: 1046, q=316001009 (0.70359 sec/rel) ETA 0h00m) 48 Special q, 369 reduction iterations reports: 196252933->18535990->16754616->14918491->7199150->2604600 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 1046 0/0 mpqs failures, 620/5273 vain mpqs milliseconds total: Sieve 383352 Sched 0 medsched 98730 TD 121986 (Init 2296, MPQS 17974) Sieve-Change 35, lasieve_setup 131862 TD side 0: init/small/medium/large/search: 2081 25563 445 18850 8096 sieve: init/small/medium/large/search: 3398 40614 366 136886 7334 TD side 1: init/small/medium/large/search: 2637 18572 436 18108 6545 sieve: init/small/medium/large/search: 1959 52980 368 138268 1179 chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$ official 441 version (compiled by myself using Edh's method) Code: chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$./cmd_R1340L_31.sh ./gnfs-lasieve4I16e_boc_L15 -v -f 316000000 -c 1000 -o R1340L_16e_a_316000000_316001000.out -a R1340L_poly.txt -R gnfs-lasieve4I16e (with asm64): L1_BITS=15, SVN$Revision: 430 $Resuming with -f 316000000 -c 1000 Warning: lowering FB_bound to 315999999. FBsize 17068601+0 (deg 8), 26355865+0 (deg 1) total yield: 1242, q=316001009 (0.92359 sec/rel) ETA 0h00m) 48 Special q, 369 reduction iterations reports: 196265020->18544215->16754559->14918439->7199123->2604605 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 1242 3/0 mpqs failures, 59841/21024 vain mpqs milliseconds total: Sieve 533510 Sched 0 medsched 91584 TD 166418 (Init 2157, MPQS 55174) Sieve-Change 355597 TD side 0: init/small/medium/large/search: 5852 24449 412 17979 7413 sieve: init/small/medium/large/search: 3872 44722 347 207775 7318 TD side 1: init/small/medium/large/search: 11239 17804 415 17312 5834 sieve: init/small/medium/large/search: 2917 56320 347 208746 1147 chenbo@chenbo:~/D/chenbo/my/math/ggnfs/Repunit/R1340L/R1340L_3$ used polynomial file attached.
Thank you for the data. It's true that you will get fewer relations when using this ECM approach; that is one of the tradeoffs. The amount of effort ECM uses can be adjusted (in theory), but even so it will likely not find every 3LP relation that mpqs does. The good news is that if ECM doesn't find a factor that MPQS would have found, a probable reason is that the primes involved in that 3LP are larger, in which case they are less likely to participate in an eventual cycle. On the other hand given the amount of work it takes to discover each potential relation, it seems wasteful to throw some away that could be found with just a little more work.

So. The real answer is that I don't know to what degree this will impact an overall factorization quite yet. It will depend on the parameters chosen.

If someone can suggest an easier candidate number that uses 3LP, I can try to run some tests with a variety of parameter choices.

2022-10-14, 15:32   #40
bsquared

"Ben"
Feb 2007

3,733 Posts

Quote:
 Originally Posted by wreck The new binary seems a bit faster again, but relations sieved still less than official ggnfs.
After further review of this job I realized there is an error in the current implementation. I did not allow for the possibility that the residue after tinyecm finds a 1st factor could be larger than 64 bits. When lpbr/a are larger than 32, this can definitely happen. After fixing that the situation is better:

Code:
time ./gnfs-lasieve4I16e -v -f 316000000 -c 1000 -a R1340L_poly.txt -o R1340L_16e_a_316000000_316001000.out.2
gnfs-lasieve4I16e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1,avx-512 ecm): L1_BITS=15
Warning:  lowering FB_bound to 315999999.
FBsize 26351441+0 (deg 8), 26355865+0 (deg 1)
total yield: 1142, q=316001009 (0.83914 sec/rel) ETA 0h00m)
48 Special q, 369 reduction iterations
reports: 239715573->22542070->20471524->18368663->7200755->2605199
Number of relations with k rational and l algebraic primes for (k,l)=:

Total yield: 1142
0/0 mpqs failures, 697/5271 vain mpqs
milliseconds total: Sieve 210380 Sched 414030 medsched 1040
TD 156050 (Init 3290, MPQS 25890) Sieve-Change 50, lasieve_setup 176760
TD side 0: init/small/medium/large/search: 2660 31640 1260 22950 13090
sieve: init/small/medium/large/search: 4160 50670 960 35690 11850
TD side 1: init/small/medium/large/search: 3150 22390 1310 21520 5860
sieve: init/small/medium/large/search: 3900 66380 1120 33230 2420
945.404u 15.583s 16:01.43 99.9% 0+0k 2840+288io 1pf+0w
I will get the new code checked in right away. Thanks again for your post!

Also, FYI, when using tinyecm there is no longer a 96-bit limitation on mfbr/a. With AVX512_ECM the limit is 104 and otherwise it is 128. Maybe this is helpful for 3LP jobs with lpbr/a > 32.

2022-10-14, 15:54   #41
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

33·11·19 Posts

Quote:
 Originally Posted by bsquared I can look into a fix, sure. Do you know if the crashes are repeatable? And if so, do you have a test case?
When it crashes, the crash is repeatable. I'm busy today and tomorrow but I should be able to generate a test case in a couple days.

2022-10-14, 16:17   #42
bsquared

"Ben"
Feb 2007

3,733 Posts

Quote:
 Originally Posted by henryzz On a side note to this: Ben is rewriting much of the asm code as it is. How much more difficult would it be to create a version that would support I>16 compared with the work he is currently doing? Would we expect this to use an insane amount of memory(2x + another 2x for 17e vs 16e is my guess) or run loads slower(1.5x to 2x is my guess)? Multithreading is the other improvement that would be seriously useful especially if it doesn't increase memory useage much over a single thread(and reduces it over multiple parallel runs). Both of these ideas would probably be quite a bit of work but it appears that the ggnfs siever still has some milage vs the CADO siever.
If L1_BITS is set to 16 then it's possible to build and run gnfs-lasieve4I17e... but it fails with a complaint right away:

Code:
time ./gnfs-lasieve4I17e -v -f 316000000 -c 500 -a R1340L_poly.txt -o R1340L_17e_a_316000000_316000500.out.1
gnfs-lasieve4I17e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1,avx-512 ecm): L1_BITS=16
Warning:  lowering FB_bound to 315999999.
FBsize 26351441+0 (deg 8), 26355865+0 (deg 1)
Recurrence init: A=65536 exceeds 65535
2.490u 0.193s 0:02.92 91.7%     0+0k 2088+0io 1pf+0w
All sorts of code uses uint16_t's. It looks pretty difficult to change that, even if the asm code can be ignored with the new updates. I don't know the memory impact exactly but 2x sounds like a good estimate. Speed impact is harder to estimate. I don't think it will be 2x slower because a good chunk of the time is spent in bucket sieving which already uses 32-bit arithmetic and only depends on L1_BITS, which can remain at 16. Anyway the work involved is too much to contemplate now... especially if 16.5e can be made to work as a stopgap.

 2022-10-15, 00:44 #43 wreck     "Bo Chen" Oct 2005 Wuhan,China 3·61 Posts 1. Validity check. For safety reason, I write a small program and have verified that for the small test relation set (R1340L, q=316M, -c 1000), the avx512 ggnfs version's relations are all belong to the official ggnfs version's. 2. Other run result. Another two persons run the 533 binary under Linux, all failed. Linux on AMD-computer: AVX512-error Linux on Intel-computer: illegal command. I am curious if without using avx512 instruction set, whether ggnfs is still faster when using ecm-tiny. And if there is a method to detect the CPU not has avx512, it would be better to use native code automatically. 3. About the local build. When I build gnfs-lasieve4I16e, it pop some errors, I change the Makefile under lasieve_64/asm, add a line "CC=icc", then build again, the gnfs-lasieve4I16e could build successfully, and also could run smoothly. When building, there is a warning says a library has no static version and it run using dynamic version.
2022-10-15, 01:16   #44
charybdis

Apr 2020

929 Posts

Quote:
 Originally Posted by bsquared Also, FYI, when using tinyecm there is no longer a 96-bit limitation on mfbr/a. With AVX512_ECM the limit is 104 and otherwise it is 128. Maybe this is helpful for 3LP jobs with lpbr/a > 32.
This limitation was already removed in lasieve5. Have you been modifying lasieve4 or lasieve5?

 Similar Threads Thread Thread Starter Forum Replies Last Post chris2be8 YAFU 9 2022-02-17 17:52 nivek000 YAFU 1 2021-12-10 22:35 EdH YAFU 8 2018-03-14 17:22 Zeta-Flux Factoring 1 2007-08-07 22:40 ATH Factoring 3 2006-08-12 22:50

All times are UTC. The time now is 07:18.

Tue Feb 7 07:18:50 UTC 2023 up 173 days, 4:47, 1 user, load averages: 1.00, 1.07, 0.99