mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2019-07-12, 23:41   #133
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I will set up and run my old siever this weekend. It gives such details. However,
it will differ from ggnfs.

My code does NOT do 'trial factoring'. Once it identifies lattice points that might
be smooth it finds the actual factorizations by running the sieve again, but saving
the primes that hit the 'might be smooth' locations. It then tries to split the
cofactors after the factor base primes have been pulled out.
OK. Here is a sample output from my siever.

The sample number is 11^223 + 9^223. The factor bases both have 1.8 million
primes [Bound ~ 29 million]. Both sides use 2LP with LPB = 735 million.

Total time to process 100501 : 7.653997
Total sieve time = 3.383201
Int line: 0.291134
Alg line: 0.255823
Int vector: 1.416120
Alg vector: 1.420123
Total resieve time = 0.966726
Int resieve: 0.645097
Alg resieve: 0.321629
Trial Int time: 0.038185
Trial Alg time: 0.899352
Find startpts time: 1.231500
Alg scan time: 0.413940
Lattice reduce time: 0.000000
QS/Squfof time: 0.485843
Prepare regions time: 0.137046
Inverse time = 0.421117
Prime Test time = 0.141093

Meaning: The "100501" means that the 100501'st prime in the factor base in being
used as the special Q. I line sieve the smallest primes; those that are guaranteed
to take several hits in along a single line within the lattice. The output gives
the time to run the line sieve for both sides, the vector sieve for both sides, the time
to do the resieve, the time to find the starting points in the lattice for this special q,
the time to scan the lattice for potential smooth values after sieving, the time
running QS to split the cofactors, the time to compute the boundaries of the sieve
region for all of the primes, the time spent computing modular inverses during
initialization and the time spent prime testing cofactors before they get split by QS.
Total time to run this special Q: 7.6 seconds

Here is the ouput count for special Q from 100001 to 100003:

100503
260 0 12 9 10 13 51 49 56 60

This shows the last q processed, the total relations, the FF, the FP, PF, PPF, FPP, PP
QPP PPQ PPPP

where e.g. FPP means the left side was fully factored, the right side had 2 LP,
QPP means the left side had 1 LP, the rhs had 2LP etc.
R.D. Silverman is offline   Reply With Quote
Old 2019-07-12, 23:45   #134
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
OK. Here is a sample output from my siever.

The sample number is 11^223 + 9^223. The factor bases both have 1.8 million
primes [Bound ~ 29 million]. Both sides use 2LP with LPB = 735 million.

Total time to process 100501 : 7.653997
Total sieve time = 3.383201
Int line: 0.291134
Alg line: 0.255823
Int vector: 1.416120
Alg vector: 1.420123
Total resieve time = 0.966726
Int resieve: 0.645097
Alg resieve: 0.321629
Trial Int time: 0.038185
Trial Alg time: 0.899352
Find startpts time: 1.231500
Alg scan time: 0.413940
Lattice reduce time: 0.000000
QS/Squfof time: 0.485843
Prepare regions time: 0.137046
Inverse time = 0.421117
Prime Test time = 0.141093

Meaning: The "100501" means that the 100501'st prime in the factor base in being
used as the special Q. I line sieve the smallest primes; those that are guaranteed
to take several hits in along a single line within the lattice. The output gives
the time to run the line sieve for both sides, the vector sieve for both sides, the time
to do the resieve, the time to find the starting points in the lattice for this special q,
the time to scan the lattice for potential smooth values after sieving, the time
running QS to split the cofactors, the time to compute the boundaries of the sieve
region for all of the primes, the time spent computing modular inverses during
initialization and the time spent prime testing cofactors before they get split by QS.
Total time to run this special Q: 7.6 seconds

Here is the ouput count for special Q from 100001 to 100003:

100503
260 0 12 9 10 13 51 49 56 60

This shows the last q processed, the total relations, the FF, the FP, PF, PPF, FPP, PP
QPP PPQ PPPP

where e.g. FPP means the left side was fully factored, the right side had 2 LP,
QPP means the left side had 1 LP, the rhs had 2LP etc.
And the sieve region was 9K x 18K.
R.D. Silverman is offline   Reply With Quote
Old 2019-07-13, 03:19   #135
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×3×293 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
By taking e.g. GCD(Pi, N) where N is the norm and Pi is the product of 8 small primes???
One still needs to separate the primes in Pi.

Actually because I do not sieve with respect to the smallest primes (aka Small Primes),
I do perform division with them. Sieving is done with respect to their powers, e.g.
256, 81, 125, etc. These are primes up to 31.
No, it is just more assembly coding, this time with SIMD vectors. Candidate offsets are broadcast into avx2 registers where I can vector subtract them from the final sieve progressions, then vector divide the differences by vectors of factor base primes. The divides are really multiplies using some precomputed constants.

Resieving is also vectorized. Actually most of the sieve steps are vectorized. I've always wondered how much of the vectorization I've developed is transferrable to NFS. I've never had time to understand nfs well enough to find out.
bsquared is offline   Reply With Quote
Old 2019-07-13, 15:52   #136
chris2be8
 
chris2be8's Avatar
 
Sep 2009

22·523 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
OK. Here is a sample output from my siever.

(snip)
Thanks for that. I'll have to read it carefully and think about it, but it should give me a better idea how to tune the sievers.

Chris
chris2be8 is offline   Reply With Quote
Old 2019-07-13, 20:05   #137
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Roll-up of links to relevant code:

Batch factoring in Msieve
(this works pretty nicely but hasn't been tried on truly large problems)

My modifications to Bob's NFS siever

(uses 3 large primes, uses GGNFS relation output, uses optimized SIQS, includes MSVC project file, includes general cleanup). Note that the modifications are not enough to allow sieving for large problems, the statically sized arrays of things are not big enough to hold tons of sieve reports and I got a crash the one time I tested it. You will need GMP or MPIR for the new code.

There are many sources for inline assembler, Msieve includes a fair amount of it and we have several experts here who can help.

Brian Gladman maintains the MSVC projects that allow many of our factoring libraries and applications to compile in Visual Studio. You will need a fairly recent version of the tools though.
jasonp is offline   Reply With Quote
Old 2019-07-13, 20:38   #138
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by jasonp View Post
Roll-up of links to relevant code:

Batch factoring in Msieve
(this works pretty nicely but hasn't been tried on truly large problems)

My modifications to Bob's NFS siever

(uses 3 large primes, uses GGNFS relation output, uses optimized SIQS, includes MSVC project file, includes general cleanup). Note that the modifications are not enough to allow sieving for large problems, the statically sized arrays of things are not big enough to hold tons of sieve reports and I got a crash the one time I tested it. You will need GMP or MPIR for the new code.

There are many sources for inline assembler, Msieve includes a fair amount of it and we have several experts here who can help.

Brian Gladman maintains the MSVC projects that allow many of our factoring libraries and applications to compile in Visual Studio. You will need a fairly recent version of the tools though.
A lot of the referred code is from an older version of mine. And it contains at least
one (bad) bug. clockticks.c needs to save & restore the eax & edx registers. The
given code does not.
R.D. Silverman is offline   Reply With Quote
Old 2019-07-14, 09:48   #139
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2×743 Posts
Default

Quote:
Originally Posted by jasonp View Post
Roll-up of links to relevant code:

Batch factoring in Msieve
(this works pretty nicely but hasn't been tried on truly large problems)
An easy improvement over Bernstein's factorization algorithm (maybe it was not in my code, but I've known it for years):

Say you know H,H1 and you want to get the prime divisors set of N2:
Code:
  N     the prime divisors sets:    H
 / \                               / \
N1  N2                            H1  H2
Since we know that N=N1*N2 it follows that H=H1 union H2,
so H-H1 is in H2
that means that you need to consider only H1 instead of the larger H, when you want to get the prime set of N2,
which makes the algorithm faster.
Of course if you get G, then H2=G union (H-H1) [here this is a disjoint union!].
R. Gerbicz is offline   Reply With Quote
Old 2019-07-14, 13:32   #140
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,181 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
A lot of the referred code is from an older version of mine. And it contains at least
one (bad) bug. clockticks.c needs to save & restore the eax & edx registers. The
given code does not.
If you are amenable, I can PM you my current email and would be willing to port my changes to your latest siever code.

Robert, upon reflection I use Bernstein's batch GCD and not his batch factoring algorithm, as a preprocessing step to reject most relations before trying to use expensive methods to compute their largest few factors. The code can efficiently find the 1-2% of relations whose unfactored part has alll factors less than a large bound, allowing QS variants to effectively run 100x faster. But of course there's a limit to how large that bound can be, which controls the size of the input batch.

Last fiddled with by jasonp on 2019-07-14 at 13:43
jasonp is offline   Reply With Quote
Old 2019-07-14, 14:51   #141
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by jasonp View Post
If you are amenable, I can PM you my current email and would be willing to port my changes to your latest siever code.

Robert, upon reflection I use Bernstein's batch GCD and not his batch factoring algorithm, as a preprocessing step to reject most relations before trying to use expensive methods to compute their largest few factors. The code can efficiently find the 1-2% of relations whose unfactored part has all factors less than a large bound, allowing QS variants to effectively run 100x faster. But of course there's a limit to how large that bound can be, which controls the size of the input batch.
I am having a stupidity attack. After resieve I identify the potentially smooth relations.
I know which lattice points have norms that are potentially smooth by the value of
the accumulated logarithm. I therefore know (approximately) how large the cofactor
will be after removal of the factor base primes. I only split cofactors that have a chance
to be the product of two primes, each larger than the largest factor in the factor base
and smaller than the large prime limit. One can, of course, extend this to 3 primes.
Sometimes after I divide out the FB primes the cofactor IS a little too big to bother
to try splitting it, but this is rare.

What does your batch GCD code add to this procedure? How does it make it faster?

Doesn't Bernstein's method require having ALL prime factors known in advance?
There are a LOT of large primes (larger than the factor base) that need to be known.
[e.g. for 31 bit large primes it needs another ~ 1/2Gbyte of storage]
This is a lot of storage. Doesn't it need to multiply all of the candidates together?
That is going to be a very large integer, requiring yet more space......With a sieve
area of ~200M and a 2% potential success rate, that is 4 million norms, each
with ~4-5 limbs..... another 100Mbytes or so. NFS is already strained for space.

When using SIQS or squfof with 3 primes, does your extension apply SIQS directly
to the 96-bit (or so) cofactor or does it try to pull out one of the primes first? If so,
how?

And the storage problem only gets worse with 3LP.

I plan to completely re-write my sieve code. I will reuse portions of it of course.

BTW, two simple optimizations that I should have done years ago:

When computing the sieve start points, one must compute a modular inverse for
each prime. [There is no 'SI' variant for NFS]. However, when computing
1/a, 1/b, 1/c mod q it is faster to compute 1/(abc) then multiply by bc, ac, and ab. etc
A single precision multiplication is much faster than a mod inverse!
[It is clear brain damage that I did not put this in the original code]

Similarly, when dividing out known factor base primes that divide a given norm one
can compute N/(a*b) rather than N/a followed by (N/a)/b. provided ab fits in a
single uint64. [again, more brain damage]

The biggest impact that I foresee will come from eliminating the sieve boundary
computations in my current code , going to a linear array to store lattice points,
and improving the bucket management. If you look at the sample output that I
gave my code spends only 6% of the time running QS and less than 2% checking
primality of cofactors. While my cofactor splitting can be improved, the payoff
will be less than a 4% improvement even if speed doubles.

BTW, bsquared's ECM code looks awesome, but I will need to port the gcc
assembler to MASM, which means learning the gcc syntax. I run windoze
on my home PC for a variety of reasons.
R.D. Silverman is offline   Reply With Quote
Old 2019-07-14, 14:58   #142
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

746010 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
I am having a stupidity attack. After resieve I identify the potentially smooth relations.
I know which lattice points have norms that are potentially smooth by the value of
the accumulated logarithm. I therefore know (approximately) how large the cofactor
will be after removal of the factor base primes. I only split cofactors that have a chance
to be the product of two primes, each larger than the largest factor in the factor base
and smaller than the large prime limit. One can, of course, extend this to 3 primes.
Sometimes after I divide out the FB primes the cofactor IS a little too big to bother
to try splitting it, but this is rare.

What does your batch GCD code add to this procedure? How does it make it faster?

Doesn't Bernstein's method require having ALL prime factors known in advance?
There are a LOT of large primes (larger than the factor base) that need to be known.
[e.g. for 31 bit large primes it needs another ~ 1/2Gbyte of storage]
This is a lot of storage. Doesn't it need to multiply all of the candidates together?
That is going to be a very large integer, requiring yet more space......With a sieve
area of ~200M and a 2% potential success rate, that is 4 million norms, each
with ~4-5 limbs..... another 100Mbytes or so. NFS is already strained for space.

When using SIQS or squfof with 3 primes, does your extension apply SIQS directly
to the 96-bit (or so) cofactor or does it try to pull out one of the primes first? If so,
how?

And the storage problem only gets worse with 3LP.

I plan to completely re-write my sieve code. I will reuse portions of it of course.

BTW, two simple optimizations that I should have done years ago:

When computing the sieve start points, one must compute a modular inverse for
each prime. [There is no 'SI' variant for NFS]. However, when computing
1/a, 1/b, 1/c mod q it is faster to compute 1/(abc) then multiply by bc, ac, and ab. etc
A single precision multiplication is much faster than a mod inverse!
[It is clear brain damage that I did not put this in the original code]

Similarly, when dividing out known factor base primes that divide a given norm one
can compute N/(a*b) rather than N/a followed by (N/a)/b. provided ab fits in a
single uint64. [again, more brain damage]

The biggest impact that I foresee will come from eliminating the sieve boundary
computations in my current code , going to a linear array to store lattice points,
and improving the bucket management. If you look at the sample output that I
gave my code spends only 6% of the time running QS and less than 2% checking
primality of cofactors. While my cofactor splitting can be improved, the payoff
will be less than a 4% improvement even if speed doubles.

BTW, bsquared's ECM code looks awesome, but I will need to port the gcc
assembler to MASM, which means learning the gcc syntax. I run windoze
on my home PC for a variety of reasons.

BTW, is mpir.lib available for 64 bit windoze? I may switch to MPIR from my
current 32-bit MP code, or I may port my 32 bit MP code to 64 bits. This will
come AFTER the other work.
R.D. Silverman is offline   Reply With Quote
Old 2019-07-14, 15:45   #143
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
BTW, is mpir.lib available for 64 bit windoze? I may switch to MPIR from my
current 32-bit MP code, or I may port my 32 bit MP code to 64 bits. This will
come AFTER the other work.
Do I have to install git to retrieve MPIR? I do not have git currently. I am running VS2010.
I prefer not to spend the big $$ to buy a newer VS compiler.

I have looked at Brian's github repository. I see a lot of updated files, but no obvious
download of a Windows .lib file. Do I need to download the entire source and compile it?
Do I need to download GMP first?
R.D. Silverman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Factoring Mersenne numbers paulunderwood Miscellaneous Math 18 2017-08-27 14:56
Factoring Big Numbers In c# ShridharRasal Factoring 10 2008-03-20 17:17
Question about factoring code Peter Nelson Software 9 2005-03-30 18:28
Factoring Fermat Numbers Axel Fox Software 14 2003-07-04 18:57
Questions about SSE2 code and Factoring Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 08:57.


Mon Aug 2 08:57:27 UTC 2021 up 10 days, 3:26, 0 users, load averages: 1.06, 1.27, 1.35

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.