mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > YAFU

Reply
 
Thread Tools
Old 2022-11-08, 18:58   #78
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

1110101101002 Posts
Default

Thank you very much!

Found and fixed, uploaded just now.

Code:
time ./gnfs-lasieve4I12e -k -v -n0 -a c103-2.job.T1 -o test.out
gnfs-lasieve4I12e (with asm64,avx-512 mmx-td,lasetup,lasched,sieve1,ecm,tds0,search0,tdsched): L1_BITS=15
FBsize 100777+0 (deg 4), 171534+0 (deg 1)
Sorted factor base on side 0: 1: 32495 2: 32397
Sorted factor base on side 1: 1: 168023
total yield: 155958, q=1327517 (0.00051 sec/rel) ETA 0h00m)
937 Special q, 1411 reduction iterations

reports: 165365787->20138278->18040597->11058350->11056552->10304931->321750 (5306059)

Total yield: 155958
milliseconds total: Sieve 24660 Sched 8950 medsched 6040
TD 25660 (Init 970, MPQS 3550) Sieve-Change 13800
TD side 0: init/small/medium/large/search: 790 2510 930 1140 6040
sieve: init/small/medium/large/search: 1470 10160 750 860 1170
TD side 1: init/small/medium/large/search: 130 2460 1010 810 4910
sieve: init/small/medium/large/search: 640 6140 990 1180 1300
aborts: 0 0
Expected yield/cost: 4.15e+04  0
p-1: 0 tests, 0 successes  ecm: 0 tests, 0 successes
MPQS-AUX 0
COF: 280255 tests, 0 ecm, 0 aux:
       0 mpqs, 0 mpqs3, 0 ecm, 0 too big
77.763u 1.444s 1:19.53 99.5%    0+0k 1648+23208io 1pf+0w
bsquared is offline   Reply With Quote
Old 2022-11-09, 16:38   #79
chris2be8
 
chris2be8's Avatar
 
Sep 2009

46278 Posts
Default

Why does c103-2.job.T1 contain:
Quote:
m:
My version of factMsieve.pl contains (shortly after "sub makeJobFile"):
Code:
print OUTF "m: $M\n" if ($M);
The original version contains:
Code:
print OUTF "n: $N\nm: $M\n";
So I think that's a bug I fixed some time ago. Try removing m: from the job file and re-testing to be sure if that is the issue.
chris2be8 is offline   Reply With Quote
Old 2022-11-09, 16:56   #80
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×941 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
Why does c103-2.job.T1 contain:


My version of factMsieve.pl contains (shortly after "sub makeJobFile"):
Code:
print OUTF "m: $M\n" if ($M);
The original version contains:
Code:
print OUTF "n: $N\nm: $M\n";
So I think that's a bug I fixed some time ago. Try removing m: from the job file and re-testing to be sure if that is the issue.
That could be a separate issue, but there really was a problem that I fixed.

It was an inexact translation of #defines like this to AVX512 in lasieve_prepn:
#define A1MOD0(p) ((aux= absa1%p)> 0 ? p-aux : 0 )

where I neglected to account for the "else" case (where absa1%p == 0)
bsquared is offline   Reply With Quote
Old 2022-11-18, 16:14   #81
wreck
 
wreck's Avatar
 
"Bo Chen"
Oct 2005
Wuhan,China

2×3×31 Posts
Default lpb34 crash

When sieve R942 using lpb34, the avx512-lasieve5 crash immediately,
while the original lasieve5 could work properly.

command is
./gnfs-lasieve4I16e -v -f 380000000 -c 1000 -o R942_16e_r_380000000_380001000.out -r R942_poly.txt -R

R942_poly.txt file attached.
Attached Files
File Type: txt R942_poly.txt (899 Bytes, 81 views)
wreck is online now   Reply With Quote
Old 2022-11-20, 16:04   #82
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×941 Posts
Default

Quote:
Originally Posted by wreck View Post
When sieve R942 using lpb34, the avx512-lasieve5 crash immediately,
while the original lasieve5 could work properly.

command is
./gnfs-lasieve4I16e -v -f 380000000 -c 1000 -o R942_16e_r_380000000_380001000.out -r R942_poly.txt -R

R942_poly.txt file attached.
Fixed and checked in! Thanks for the report!
bsquared is offline   Reply With Quote
Old 2022-11-25, 12:53   #83
wreck
 
wreck's Avatar
 
"Bo Chen"
Oct 2005
Wuhan,China

BA16 Posts
Default

Thanks for the fix.

Another confusion is that ,
when set the rlim and alim to 500000000,
the version 551 (newest version) is about 10% slower than the 550 version (second newest version).

I'm not sure if change the _mm512_set1_epi32(ij_ub) to _mm512_set1_epu32(ij_ub)
or such modification could resolve the slower speed.
wreck is online now   Reply With Quote
Old 2023-02-10, 21:48   #84
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010010101112 Posts
Default

Quote:
Originally Posted by bsquared View Post
What I should have done is make the changes to the cweb files in lasieve 5 instead of working with the ctangled code. That is a chore. I was porting from ctangled code in lasieve4, so that's what made sense at the time. I might circle back and see if I can do that.

Probably next on the agenda is seeing about a windows build, and doing some more testing.
I spent some time today looking at the changes and starting a diff relative to the Windows branch in my repository. Did you have a chance to make the changes in the .w files? If not, I'll just make a separate branch for this that uses the .c and .h files rather than the .w files. Also, does this compile with gcc, or does it need icc/icx? Thanks!
frmky is offline   Reply With Quote
Old 2023-02-10, 22:33   #85
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×941 Posts
Default

Quote:
Originally Posted by frmky View Post
I spent some time today looking at the changes and starting a diff relative to the Windows branch in my repository. Did you have a chance to make the changes in the .w files? If not, I'll just make a separate branch for this that uses the .c and .h files rather than the .w files. Also, does this compile with gcc, or does it need icc/icx? Thanks!
I basically stopped working on it as of my last post here; overcome by events. So the latest is in the .c/.h files and it needs icc/icx because of the _mm512_div_epu32/_mm512_rem_epu32 macro-intrinsics that gcc doesn't implement. I would still love to make it usable for windows, any help along those lines would be wonderful!
bsquared is offline   Reply With Quote
Old 2023-02-11, 01:24   #86
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,647 Posts
Default

Just those two SVML intrinsics? Then let's see if we can replace them without a performance hit. It'll make it much easier to work with if it's not tied to the Intel compilers.
frmky is offline   Reply With Quote
Old 2023-02-11, 01:31   #87
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22·941 Posts
Default

Quote:
Originally Posted by frmky View Post
Just those two SVML intrinsics? Then let's see if we can replace them without a performance hit. It'll make it much easier to work with if it's not tied to the Intel compilers.
Yep pretty sure just those. I started to do that, based on code I wrote a long time ago that used the fast reciprocal intrinsic (_mm512_rcp14_pd, I think), followed by a couple rounds of newton iteration. But I didn't finish it to the point where it works. I will try to find it.
bsquared is offline   Reply With Quote
Old 2023-02-16, 21:02   #88
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

72648 Posts
Default

Also needed a replacement for _mm512_rem_epu64, which was a taller order.

I put in a 52-bit vector Barrett multiplier to do the job of the _mm512_rem_epu64 SVML intrinsic (in modmul32_16). Replacing the 32-bit div/rem SVML intrinsics was fairly straightforward using double-precision floating point divides instead.

So the good news is that the lasieve5_64 project will now compile with GCC (tested with gcc 11.1.0)!

The bad news is that some of the AVX512 routines cause segfaults when built with GCC. Testing them one at a time, these seem to work: ECM, LASETUP, TDS0. LASCHED, SIEVE1, TDSCHED, SEARCH, and TD all lead to segfaults

I initially suspected alignment issues, since icc/icx is very lenient on alignment (mallocs all get an alignment to 64B boundaries automatically, I think, and load/store get compiled to loadu/storeu since there is no difference in speed). But so far, working with LASCHED, that isn't helping. I'll keep working at it as I get time. Meanwhile, building with AVX512_LASETUP=1 AVX512_TDS0=1 AVX512_ECM=1 works so far and gives some speedup over no AVX512

[edit]
Tests with larger factor bases are failing, so this isn't quite ready yet.

Last fiddled with by bsquared on 2023-02-16 at 22:11
bsquared is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
yafu ignoring yafu.ini chris2be8 YAFU 9 2022-02-17 17:52
YAFU + GGNFS Confirmation nivek000 YAFU 1 2021-12-10 22:35
Running YAFU via Aliqueit doesn't find yafu.ini EdH YAFU 8 2018-03-14 17:22
GGNFS or something better? Zeta-Flux Factoring 1 2007-08-07 22:40
ggnfs ATH Factoring 3 2006-08-12 22:50

All times are UTC. The time now is 14:19.


Fri Jun 9 14:19:40 UTC 2023 up 295 days, 11:48, 0 users, load averages: 0.88, 0.92, 0.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔