20221108, 18:58  #78 
"Ben"
Feb 2007
111010110100_{2} Posts 
Thank you very much!
Found and fixed, uploaded just now. Code:
time ./gnfslasieve4I12e k v n0 a c1032.job.T1 o test.out gnfslasieve4I12e (with asm64,avx512 mmxtd,lasetup,lasched,sieve1,ecm,tds0,search0,tdsched): L1_BITS=15 FBsize 100777+0 (deg 4), 171534+0 (deg 1) Sorted factor base on side 0: 1: 32495 2: 32397 Sorted factor base on side 1: 1: 168023 total yield: 155958, q=1327517 (0.00051 sec/rel) ETA 0h00m) 937 Special q, 1411 reduction iterations reports: 165365787>20138278>18040597>11058350>11056552>10304931>321750 (5306059) Total yield: 155958 milliseconds total: Sieve 24660 Sched 8950 medsched 6040 TD 25660 (Init 970, MPQS 3550) SieveChange 13800 TD side 0: init/small/medium/large/search: 790 2510 930 1140 6040 sieve: init/small/medium/large/search: 1470 10160 750 860 1170 TD side 1: init/small/medium/large/search: 130 2460 1010 810 4910 sieve: init/small/medium/large/search: 640 6140 990 1180 1300 aborts: 0 0 Expected yield/cost: 4.15e+04 0 p1: 0 tests, 0 successes ecm: 0 tests, 0 successes MPQSAUX 0 COF: 280255 tests, 0 ecm, 0 aux: 0 mpqs, 0 mpqs3, 0 ecm, 0 too big 77.763u 1.444s 1:19.53 99.5% 0+0k 1648+23208io 1pf+0w 
20221109, 16:38  #79  
Sep 2009
4627_{8} Posts 
Why does c1032.job.T1 contain:
Quote:
Code:
print OUTF "m: $M\n" if ($M); Code:
print OUTF "n: $N\nm: $M\n"; 

20221109, 16:56  #80  
"Ben"
Feb 2007
2^{2}×941 Posts 
Quote:
It was an inexact translation of #defines like this to AVX512 in lasieve_prepn: #define A1MOD0(p) ((aux= absa1%p)> 0 ? paux : 0 ) where I neglected to account for the "else" case (where absa1%p == 0) 

20221118, 16:14  #81 
"Bo Chen"
Oct 2005
Wuhan,China
2×3×31 Posts 
lpb34 crash
When sieve R942 using lpb34, the avx512lasieve5 crash immediately,
while the original lasieve5 could work properly. command is ./gnfslasieve4I16e v f 380000000 c 1000 o R942_16e_r_380000000_380001000.out r R942_poly.txt R R942_poly.txt file attached. 
20221120, 16:04  #82 
"Ben"
Feb 2007
2^{2}×941 Posts 
Fixed and checked in! Thanks for the report!

20221125, 12:53  #83 
"Bo Chen"
Oct 2005
Wuhan,China
BA_{16} Posts 
Thanks for the fix.
Another confusion is that , when set the rlim and alim to 500000000, the version 551 (newest version) is about 10% slower than the 550 version (second newest version). I'm not sure if change the _mm512_set1_epi32(ij_ub) to _mm512_set1_epu32(ij_ub) or such modification could resolve the slower speed. 
20230210, 21:48  #84  
Jul 2003
So Cal
101001010111_{2} Posts 
Quote:


20230210, 22:33  #85  
"Ben"
Feb 2007
2^{2}×941 Posts 
Quote:


20230211, 01:24  #86 
Jul 2003
So Cal
2,647 Posts 
Just those two SVML intrinsics? Then let's see if we can replace them without a performance hit. It'll make it much easier to work with if it's not tied to the Intel compilers.

20230211, 01:31  #87 
"Ben"
Feb 2007
2^{2}·941 Posts 
Yep pretty sure just those. I started to do that, based on code I wrote a long time ago that used the fast reciprocal intrinsic (_mm512_rcp14_pd, I think), followed by a couple rounds of newton iteration. But I didn't finish it to the point where it works. I will try to find it.

20230216, 21:02  #88 
"Ben"
Feb 2007
7264_{8} Posts 
Also needed a replacement for _mm512_rem_epu64, which was a taller order.
I put in a 52bit vector Barrett multiplier to do the job of the _mm512_rem_epu64 SVML intrinsic (in modmul32_16). Replacing the 32bit div/rem SVML intrinsics was fairly straightforward using doubleprecision floating point divides instead. So the good news is that the lasieve5_64 project will now compile with GCC (tested with gcc 11.1.0)! The bad news is that some of the AVX512 routines cause segfaults when built with GCC. Testing them one at a time, these seem to work: ECM, LASETUP, TDS0. LASCHED, SIEVE1, TDSCHED, SEARCH, and TD all lead to segfaults I initially suspected alignment issues, since icc/icx is very lenient on alignment (mallocs all get an alignment to 64B boundaries automatically, I think, and load/store get compiled to loadu/storeu since there is no difference in speed). But so far, working with LASCHED, that isn't helping. I'll keep working at it as I get time. Meanwhile, building with AVX512_LASETUP=1 AVX512_TDS0=1 AVX512_ECM=1 works so far and gives some speedup over no AVX512 [edit] Tests with larger factor bases are failing, so this isn't quite ready yet. Last fiddled with by bsquared on 20230216 at 22:11 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
yafu ignoring yafu.ini  chris2be8  YAFU  9  20220217 17:52 
YAFU + GGNFS Confirmation  nivek000  YAFU  1  20211210 22:35 
Running YAFU via Aliqueit doesn't find yafu.ini  EdH  YAFU  8  20180314 17:22 
GGNFS or something better?  ZetaFlux  Factoring  1  20070807 22:40 
ggnfs  ATH  Factoring  3  20060812 22:50 