mersenneforum.org > YAFU ggnfs improvements
 Register FAQ Search Today's Posts Mark Forums Read

2022-10-15, 03:34   #45
bsquared

"Ben"
Feb 2007

3,733 Posts

Quote:
 Originally Posted by wreck 1. Validity check. For safety reason, I write a small program and have verified that for the small test relation set (R1340L, q=316M, -c 1000), the avx512 ggnfs version's relations are all belong to the official ggnfs version's.
Thank you! I have also done similar tests.

Quote:
 Originally Posted by wreck 2. Other run result. Another two persons run the 533 binary under Linux, all failed. Linux on AMD-computer: AVX512-error Linux on Intel-computer: illegal command. I am curious if without using avx512 instruction set, whether ggnfs is still faster when using ecm-tiny. And if there is a method to detect the CPU not has avx512, it would be better to use native code automatically.
Putting some usability checks in there is something I've been meaning to do, to make sure the cpu has the required instructions before proceeding. Looks like sooner rather than later would be good.

Quote:
 Originally Posted by wreck 3. About the local build. When I build gnfs-lasieve4I16e, it pop some errors, I change the Makefile under lasieve_64/asm, add a line "CC=icc", then build again, the gnfs-lasieve4I16e could build successfully, and also could run smoothly. When building, there is a warning says a library has no static version and it run using dynamic version.
Yes, you can also add CC=icc on the make line. Here is how I build:

cd asm/
make liblasieve.a CC=icc AVX512_TD=1
make liblasieveI11.a CC=icc AVX512_TD=1
make liblasieveI12.a CC=icc AVX512_TD=1
make liblasieveI13.a CC=icc AVX512_TD=1
make liblasieveI14.a CC=icc AVX512_TD=1
make liblasieveI15.a CC=icc AVX512_TD=1
make liblasieveI16.a CC=icc AVX512_TD=1
cd ..
cp asm/liblasieve*.a .
make all CC=icc AVX512_ALL=1 LASTATS=1

That last one is optional; it will provide timing for lasched and more accurate timings for the other categories if you run with -v.

Quote:
 Originally Posted by charybdis This limitation was already removed in lasieve5. Have you been modifying lasieve4 or lasieve5?
4

2022-10-15, 12:16   #46
charybdis

Apr 2020

3A116 Posts

Quote:
 Originally Posted by bsquared 4
Would it be possible to make similar changes to lasieve5, as that's what NFS@Home 16e uses? It's already slightly faster than lasieve4.

2022-10-15, 15:04   #47
bsquared

"Ben"
Feb 2007

3,733 Posts

Quote:
 Originally Posted by charybdis Would it be possible to make similar changes to lasieve5, as that's what NFS@Home 16e uses? It's already slightly faster than lasieve4.
I assume it's similar enough that the changes would also apply, but I've never seen it so I can't say for sure.

Is this it? Or is there somewhere else?

 2022-10-16, 17:06 #48 bsquared     "Ben" Feb 2007 3,733 Posts I discovered that the missing factors in tinyecm processing of lpbr/a > 32 jobs are all a fairly specific class of inputs... namely 2LP's that are composed of two factors >= 32 bits, such that the input large factor is greater than 64 bits but <= lpbr/a*2 in size. Fortunately, these are easy to identify and split using either mpqs or more effort in tinyecm. Now we find almost all the factors that pure mpqs does, still at a small fraction of the effort. Very large 3LP's may still be missed here and there, but I expect this factor finding rate should largely hold. Code: time ./gnfs-lasieve4I16e -v -f 316000000 -c 1000 -a R1340L_poly.txt -o R1340L_16e_a_316000000_316001000.out.12 gnfs-lasieve4I16e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1,avx-512 ecm): L1_BITS=15 Warning: lowering FB_bound to 315999999. FBsize 26351441+0 (deg 8), 26355865+0 (deg 1) total yield: 1242, q=316001009 (0.77841 sec/rel) ETA 0h00m) 48 Special q, 369 reduction iterations reports: 239715573->22542070->20471524->18368663->7200755->2605199 Number of relations with k rational and l algebraic primes for (k,l)=: Total yield: 1242 0/0 mpqs failures, 1108/20196 vain mpqs milliseconds total: Sieve 210330 Sched 416710 medsched 840 TD 161120 (Init 4220, MPQS 30740) Sieve-Change 30, lasieve_setup 177760 TD side 0: init/small/medium/large/search: 2420 32510 900 22630 12730 sieve: init/small/medium/large/search: 3370 50940 1320 34640 11040 TD side 1: init/small/medium/large/search: 3110 22690 1120 21560 5470 sieve: init/small/medium/large/search: 3810 68060 1130 33790 2230 953.632u 15.924s 16:10.01 99.9% 0+0k 2104+312io 1pf+0w New code has been checked in.
 2022-10-16, 18:11 #49 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 22×32×37 Posts Would you mind trying to build it as C99, so that your compiler complains about implicit declarations (maybe with -Werr)? I then can give it a try with ICX since it will eliminate a lot of guesswork. Thanks.
2022-10-16, 18:27   #50
bsquared

"Ben"
Feb 2007

3,733 Posts

Quote:
 Originally Posted by kruoli Would you mind trying to build it as C99, so that your compiler complains about implicit declarations (maybe with -Werr)? I then can give it a try with ICX since it will eliminate a lot of guesswork. Thanks.
I am in the middle of doing that now with a newly installed icx from here. Lots of implicit function declarations here, oof.

 2022-10-16, 18:45 #51 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 101001101002 Posts At least for the YAFU code, I should have resolved the vast majority of them in my other thread. Great to hear you trying this!
 2022-10-16, 23:07 #52 bsquared     "Ben" Feb 2007 3,733 Posts The sievers should now build with CC=icx with all of the new AVX512 code. To others that may not know, icx can be downloaded for free from Intel. I was not aware of this until a few days ago. If you wouldn't mind doing some sanity checking by comparing small runs with the new versions against the old/original sievers I would appreciate it.
2022-10-17, 14:25   #53
bsquared

"Ben"
Feb 2007

E9516 Posts

Quote:
 Originally Posted by VBCurtis I recognise that you're not offering to become the official ggnfs dev, but your tinyecm speed enhancements make ggnfs massively more interesting than it was a few months ago for cutting-edge work. The cutting edge would benefit from the 16e siever working properly with -J 16 flag, which would make it effectively 16.5e. This flag works on 15e -J 15, and sometimes works as 16e -J 16 but sometimes crashes. There's a small chance those crashes can be fixed, and even a chance your new code happens to remedy the code path that caused the intermittent crashing. If -J 16 can be used, we in principle could factor SNFS-350 with ggnfs, or GNFS-235ish. Of course, we can just use CADO for extra-large sieve regions... but a new ggnfs revision holds out hope to be BOINCified to extend the life of the big nfs@home queue.
I tried to run wreck's poly using -J 16 and it immediately errors out:

Code:
./gnfs-lasieve4I16e -v -f 316000000 -c 1000 -a R1340L_poly.txt -J 16 -o R1340L_16e_a_316000000_316001000.out
gnfs-lasieve4I16e (with asm64,avx-512 mmx-td,avx-512 lasetup,avx-512 lasched,avx-512 sieve1,avx-512 ecm): L1_BITS=15
Warning:  lowering FB_bound to 315999999.
FBsize 26351441+0 (deg 8), 26355865+0 (deg 1)
Recurrence init: ub=32768 exceeds 16384
Maybe this is one of the things lasieve5 resolves?

2022-10-17, 14:52   #54
wreck

"Bo Chen"
Oct 2005
Wuhan,China

B716 Posts

Quote:
 Originally Posted by bsquared I assume it's similar enough that the changes would also apply, but I've never seen it so I can't say for sure. Is this it? Or is there somewhere else?
I think it is, this source needs cweb command, perhaps it means command ctangle should could run normally.

I could build that source 5 years ago, but it is a little strange that now I cann't compile it.

Also there is a github Greg commit, (search lasieve5 on github, you will found Greg's github), perhaps it is
newer, Greg's github still not support degree 8, but I think he already finish the code, since NFS@home
could tackle degree 8 normally.

2022-10-17, 16:05   #55
charybdis

Apr 2020

16418 Posts

Quote:
 Originally Posted by bsquared I assume it's similar enough that the changes would also apply, but I've never seen it so I can't say for sure. Is this it? Or is there somewhere else?
I can't remember where I got it from. I do recall having trouble compiling it; I have a feeling that I couldn't get the code in that post to build, even with the changes in that thread. I think I found some corrected code somewhere else on the forum. Sorry I can't be more helpful.

 Similar Threads Thread Thread Starter Forum Replies Last Post chris2be8 YAFU 9 2022-02-17 17:52 nivek000 YAFU 1 2021-12-10 22:35 EdH YAFU 8 2018-03-14 17:22 Zeta-Flux Factoring 1 2007-08-07 22:40 ATH Factoring 3 2006-08-12 22:50

All times are UTC. The time now is 10:19.

Mon Feb 6 10:19:35 UTC 2023 up 172 days, 7:48, 1 user, load averages: 0.97, 1.11, 1.06