mersenneforum.org > YAFU Explanation for yafu output
 Register FAQ Search Today's Posts Mark Forums Read

 2021-04-05, 07:13 #1 bur   Aug 2020 167 Posts Explanation for yafu output There are some things I wonder about, is there a simple explanation what's going on? I know this isn't strictly yafu, but I hope this is still the most fitting subforum. Feel free to explain any single line of the output I pasted, when waiting for a factorization to finish it's interesting to know what is going on. :) My knowledge in math is limited though. How is the "need at least x relations" for NFS and SIQS chosen? Code: found 542503 relations, need at least 4468504 . SIQS mentions full and partial relations, what are those? It's mentioned in the Pollard paper, but not in a way that I can understand. . After nfs found the polynomial it accumulates relations: Code: total yield: 13895, q=1150861 (0.00246 sec/rel; ETA 0h02m) What is q? When does it stop for a specific q? . After that it says Code: 326 Special q, 488 reduction iterations reports: 99084057->10611292->9562226->4168473->4168470->3934638 Number of relations with k rational and l algebraic primes for (k,l)=: Can this output be explained in simple terms? . Afterwards several of these Code: Total yield: 70275 0/0 mpqs failures, 32943/35804 vain mpqs milliseconds total: Sieve 39943 Sched 0 medsched 23512 TD 72687 (Init 3342, MPQS 45683) Sieve-Change 57558 TD side 0: init/small/medium/large/search: 859 3317 1029 1114 6297 come up. It seems this yields some more relations. Once more, can it be explained in simple terms? . In the linear algebra/filtering step it says Code: keeping 5478273 ideals with weight <= 200, target excess is 37555 commencing in-memory singleton removal begin with 4730654 relations and 5478273 unique ideals reduce to 1339923 relations and 1328963 ideals in 22 passes max relations containing the same ideal: 85 nfs: raising min_rels by 5.00 percent to 4997472 What is the "target excess"? And what is the goal here? It seems like it wasn't reached, as min_rels was increased and sieving started again. This sometimes happens over and over. In this case it stopped when "... same ideal: 77". . Code: found 313232 cycles, need 308046 weight of 308046 cycles is about 21812975 (70.81/cycle) distribution of cycle lengths: 1 relations: 32845 2 relations: 31392 3 relations: 31314 4 relations: 28921 5 relations: 26046 6 relations: 22902 7 relations: 20085 8 relations: 17671 9 relations: 15533 10+ relations: 81337 heaviest cycle: 26 relations commencing cycle optimization start with 2114433 relations pruned 48198 relations Can this part be explained briefly? . Now it seems like the matrix is build Code: matrix is 307866 x 308046 (93.8 MB) with weight 29784334 (96.69/col) sparse part has weight 20889229 (67.81/col) filtering completed in 2 passes matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col) sparse part has weight 20875519 (67.97/col) matrix starts at (0, 0) matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col) sparse part has weight 20875519 (67.97/col) saving the first 48 matrix rows for later matrix includes 64 packed rows matrix is 306901 x 307129 (89.6 MB) with weight 23610028 (76.87/col) sparse part has weight 20425605 (66.50/col) using block size 65536 for processor cache size 4096 kB commencing Lanczos iteration (4 threads) Does the weight per col have to be large enough or why is it goind through several iterations? A large weight equals small matrix size? . Then I think it is looking for the gcd Code: commencing square root phase reading relations for dependency 1 read 153621 cycles cycles contain 571496 unique relations read 571496 relations multiplying 571496 relations multiply complete, coefficients have about 23.13 million bits initial square root is modulo 4407917 This commences until gcd is differente from 1 or N. Each time the dependency increases, what does that mean? And is there a specific reason why the size of the coefficients is mentioned? It seems to fluctuate slightly with each dependency.
 2021-04-05, 07:42 #2 LaurV Romulan Interpreter     Jun 2011 Thailand 24E716 Posts Generally, you have to solve a system with m equations, with n variables. As long as you have independent equations, and more equations than variables, you can solve the system. All the "sieving" phase is to find those equations. Then the linear algebra will solve the system. Say you want to find out x, y, and z, and you ask your friends and one tells you that 2x+3y=7, you still don't know them. Another friend will tell you than x+y+z=6, you still don't know them. Now you ask a third friend and he tells you that 2x+2x+2z=12, you kick his ass, because this is not useful to you, it is not independent from the second, and then he says sorry and tells you 2y-z+w=5. Now, you don't need w, but you can't get more from this guy, you already pulled his fingernails and squeezed his nose.. but you continue asking, ans somebody else tells you that 5x-w=1. Now you have two "partial" relations which you can add together to get 5x+2y-z=6, and you can solve the system, and get x, y, and z. Factoring is the same, but your variables are small primes, the coefficients in front are powers of these primes, and the addition is multiplication. Now you know... With SIQS the things are simple, you start with a number of "variables" (primes) that is dependent of the size of the number you want to factor, larger number, more variables, and you kinda know "exactly" how many equations you will need. With NFS the waters are murky... Five bucks for who explains it simpler.. (and don't kick me hard if the system is not solvable, I made it up as I was writing it, I think it is ok...) Last fiddled with by LaurV on 2021-04-05 at 07:45
 2021-04-05, 12:51 #3 Dr Sardonicus     Feb 2017 Nowhere 10001110000002 Posts If memory serves, for quadratic sieves the result you need here is that a homogeneous system of linear equations with more equations than variables automatically contains a linear dependency. (A system is homogeneous when there are no constant terms. A homogeneous system is thus automatically consistent, since setting all the variables equal to 0 gives a solution. The application to factoring the composite number N is to find a non-trivial solution to x^2 == y^2 (mod N) (i.e. where neither x - y nor x + y is divisible by N). As I recall, the basic idea is to select a bound B, and then look for squares (mod N) whose prime factors are all less than B (i.e are "B-smooth"). Say there are m such primes, p1 to pm. Say we find x^2 == p1^e1*...* pm^em. Since we're only interested in squareness, we can reduce the exponents (mod 2), so the e's are all either 0 or 1. Multiplying x-values corresponds to adding the exponents (which can then be reduced mod 2). So we can replace our x-values by rows of m 0's and 1's (mod 2). If we have k such x-values, we can form a k by m matrix of 0's and 1's (mod 2). Adding rows corresponds to multiplying x-values. The idea is that if k > m (and there are no duplicate rows) there will automatically be a non-trivial dependency among the rows, which means a nontrivial relation x^2 == y^2 (mod N). Luckily, the integers mod 2 obey the same rules of addition and multiplication as the rational or real numbers (a "field") so dependencies can be found by doing Gaussian elimination on the matrix (linear algebra). The success of the method depends on selecting B appropriately, and finding enough "B-smooth" squares (mod N). There are ways to fiddle the B-smoothness requirement slightly, but it's been ages since I looked at this closely. I am not sufficiently familiar with number field sieves (using polynomials of degree greater than 2) to explain much.
2021-04-05, 15:51   #4
charybdis

Apr 2020

1000001112 Posts

Quote:
 Originally Posted by LaurV Now you ask a third friend and he tells you that 2x+2x+2z=12, you kick his ass, because this is not useful to you, it is not independent from the second, and then he says sorry and tells you 2y-z+w=5. Now, you don't need w, but you can't get more from this guy, you already pulled his fingernails and squeezed his nose

I don't have much more to say about QS; LaurV and Dr Sardonicus have done a pretty good job of explaining the basics. The one thing I'd add to connect their two posts is that we can allow x^2 mod N to have one or two factors ("large primes") a bit larger than the smoothness bound B; these are the "partial relations" that LaurV refers to, and they can be combined into full relations when the same large prime appears in more than one partial relation.

The mathematics involved in NFS is more sophisticated than QS, but the general idea is similar. Roughly speaking, "relations" and "cycles" in NFS correspond respectively to "partial relations" and "full/combined relations" in QS.

Quote:
 Originally Posted by bur How is the "need at least x relations" for NFS and SIQS chosen? Code: found 542503 relations, need at least 4468504
For QS, we know the number of variables in the system of equations that we need to solve, so the number of (full/combined) relations needed is set to slightly more than that number of variables. With NFS, the number of relations needed is just an estimate produced by yafu. Often this is an underestimate, so if there aren't enough relations, yafu adds 5% to its estimate and goes back to sieving.

Quote:
 After nfs found the polynomial it accumulates relations: Code: total yield: 13895, q=1150861 (0.00246 sec/rel; ETA 0h02m) What is q? When does it stop for a specific q?
Relations in NFS consist of two numbers (the "rational norm" and "algebraic norm") which both satisfy some smoothness conditions, i.e. their prime factors are small. q=1150861 means that the siever is searching for relations where 1150861 is a factor of the algebraic norm, or maybe the rational norm if this is SNFS.

How long is spent on each q value depends on which siever yafu chooses. The sievers are labelled gnfs-lasieve4I<n>e with <n> ranging from 11 to 16; the larger this value, the longer the siever spends on a specific q.

Quote:
 After that it says Code: 326 Special q, 488 reduction iterations reports: 99084057->10611292->9562226->4168473->4168470->3934638 Number of relations with k rational and l algebraic primes for (k,l)=: Can this output be explained in simple terms?
"326 Special q" tells you how many q values have been tried in that range. (This is actually a slight lie for reasons that I won't go into.) I don't know what the rest of the numbers mean but I'd guess that they relate to the finer details of what the siever has been doing.

Quote:
 Afterwards several of these Code: Total yield: 70275 0/0 mpqs failures, 32943/35804 vain mpqs milliseconds total: Sieve 39943 Sched 0 medsched 23512 TD 72687 (Init 3342, MPQS 45683) Sieve-Change 57558 TD side 0: init/small/medium/large/search: 859 3317 1029 1114 6297 come up. It seems this yields some more relations. Once more, can it be explained in simple terms?
"Total yield: 70275" is the number of relations that the siever found in the range it's just completed. I think most of the other numbers here are timing information, showing how much CPU time was spent in each stage of the sieving process.

Quote:
 In the linear algebra/filtering step it says Code: keeping 5478273 ideals with weight <= 200, target excess is 37555 commencing in-memory singleton removal begin with 4730654 relations and 5478273 unique ideals reduce to 1339923 relations and 1328963 ideals in 22 passes max relations containing the same ideal: 85 nfs: raising min_rels by 5.00 percent to 4997472 What is the "target excess"? And what is the goal here? It seems like it wasn't reached, as min_rels was increased and sieving started again. This sometimes happens over and over. In this case it stopped when "... same ideal: 77".
The goal is to have enough relations to finish the factorization. yafu (well, really msieve) wants to know if it has enough relations to start combining ("merging") them into cycles. In this case it discovers that it doesn't, because 1339923-1328963 < 37555.

Quote:
 Code: found 313232 cycles, need 308046 weight of 308046 cycles is about 21812975 (70.81/cycle) distribution of cycle lengths: 1 relations: 32845 2 relations: 31392 3 relations: 31314 4 relations: 28921 5 relations: 26046 6 relations: 22902 7 relations: 20085 8 relations: 17671 9 relations: 15533 10+ relations: 81337 heaviest cycle: 26 relations commencing cycle optimization start with 2114433 relations pruned 48198 relations Can this part be explained briefly?
This is a breakdown of the cycles by how many relations have been merged to make them.

Quote:
 Code: matrix is 307866 x 308046 (93.8 MB) with weight 29784334 (96.69/col) sparse part has weight 20889229 (67.81/col) filtering completed in 2 passes matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col) sparse part has weight 20875519 (67.97/col) matrix starts at (0, 0) matrix is 306949 x 307129 (93.7 MB) with weight 29742133 (96.84/col) sparse part has weight 20875519 (67.97/col) saving the first 48 matrix rows for later matrix includes 64 packed rows matrix is 306901 x 307129 (89.6 MB) with weight 23610028 (76.87/col) sparse part has weight 20425605 (66.50/col) using block size 65536 for processor cache size 4096 kB commencing Lanczos iteration (4 threads) Does the weight per col have to be large enough or why is it goind through several iterations? A large weight equals small matrix size?
The weight is the number of non-zero entries in the matrix, i.e. the number of 1s, since the matrix is mod 2 as Dr Sardonicus explained. The lower the weight, the easier it is to find the linear dependencies, but there's a trade-off, as if we aim for a sparser matrix during the merge phase, it ends up being larger. This is an important consideration when handling large numbers for which the matrix step can take weeks, but the target matrix density can only be altered by calling msieve manually rather than via yafu.

Quote:
 Then I think it is looking for the gcd Code: commencing square root phase reading relations for dependency 1 read 153621 cycles cycles contain 571496 unique relations read 571496 relations multiplying 571496 relations multiply complete, coefficients have about 23.13 million bits initial square root is modulo 4407917 This commences until gcd is differente from 1 or N. Each time the dependency increases, what does that mean? And is there a specific reason why the size of the coefficients is mentioned? It seems to fluctuate slightly with each dependency.
In QS, the aim is to find x and y such that x^2 = y^2 mod N, and then taking gcd(N, x-y) gives a factor of N. However, half of the time this factor is 1 or N. To insure against this possibility, we want to find lots of (x,y) pairs. Luckily, this requires virtually no extra effort compared to finding one such pair. Suppose that instead of n+1 equations in n variables, we have n+50 equations in n variables. Now we get 50 independent solutions (dependencies), and if each of these has a 1/2 probability of producing a non-trivial factor of N, then it's overwhelmingly likely that we find the factors. 50 more relations can be found very quickly, so when running QS, yafu sieves until there are enough relations that a factorization is virtually guaranteed.

With NFS, the principle is the same: we have lots of linear equations mod 2, and each independent solution/dependency has a 1/2 chance of finding the factors. However, in NFS, we don't get x and y such that x^2 = y^2 mod N straight away; we have to take the square root of an enormous complex number to find them. As for the coefficient sizes, you'd have to ask msieve's creator jasonp to find out why they're printed. But at least they give an idea of the size of the number we're taking the square root of.

 2021-04-05, 16:42 #5 bsquared     "Ben" Feb 2007 2×32×191 Posts Excellent info provided so far. The only thing I can add is some specifics on how many relations are needed for NFS and SIQS. For SIQS, the number of relations needed is just a few more than the number of factor base primes. We know at any point during the sieving process how many relations have been found: the number of full relations + the number of relations combined from partials. A full relation occurs whenever Q(x)/a = (ax + 2b)x + c fully factors over the factor base. (a, b, and c are coefficients of a quadratic polynomial and x is some small integer). If Q(x)/a almost fully factors, with 1 or 2 large primes remaining, then we have a partial relation. These are added to a special data structure that can be quickly processed to tell us how these partial relations can be combined into fulls (I can't beat the simplicity of LaurV's example here). It is a little more work to gather and process these partial relations, but it is well worth it. Large jobs obtain most of their full relations by combining from partials. For NFS, the number of relations needed is mostly a function of the large prime bounds. These are the lpba and lpbr lines of the .job file. This thread provides a little more info on the estimates yafu makes (the last few posts in particular). Last fiddled with by bsquared on 2021-04-05 at 16:44
 2021-04-05, 21:09 #6 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 10110111011102 Posts In reality NFS has similar requirements to qs for how many relations are needed. We just count them differently which messes that up. In QS we tend to count full and combined relations while in nfs we count partial relations. For 1 and 2 large primes it is possible to fully count how many combined relations have been found. This becomes a lot harder for 3 or more(a lot of nfs has 2 for rational poly and 2 for algebraic making 4).
 2021-04-07, 17:34 #7 bur   Aug 2020 167 Posts Thanks a lot guys, that is some great reading. You know someone really understood a topic if they can explain the basics in very simple terms. Reading it put a smile to my face both due to finally understanding what's going on (on a low level probably) and also because the poor guy got his nails pulled (shame on me). It also made me appreciate what a genius idea these sieves are and how enormous the computational work is. I will have to go through it a few more times and likely some questions will come up. Thanks again.
2021-04-07, 17:51   #8
bsquared

"Ben"
Feb 2007

343810 Posts

Quote:
 Originally Posted by LaurV you ask a third "friend" ... you kick his ass ... you already pulled his fingernails and squeezed his nose..
That's how you treat your friends, eh? Remind me never to become your enemy

 2021-04-10, 08:52 #9 bur   Aug 2020 167 Posts Code: commencing square root phase reading relations for dependency 1 read 182232 cycles cycles contain 637316 unique relations read 637316 relations multiplying 637316 relations multiply complete, coefficients have about 27.84 million bits initial square root is modulo 98794741 GCD is 1, no factor found reading relations for dependency 2 read 181783 cycles cycles contain 636612 unique relations read 636612 relations multiplying 636612 relations multiply complete, coefficients have about 27.80 million bits initial square root is modulo 96579421 GCD is N, no factor found reading relations for dependency 3 ... charybdis, I didn't really get the explanation for the last part. I included another example showing that it takes several attempts to find a GCD != 1 or N. If I got you right, more than n relations are found for n variables, so with the excess our chance of finding a factor are is 50% for each excess relation. But why does it happen so often that I still see GCD is 1 or N with this large excess? So it's two questions, first why does it happen so often and second, what is changed (apparently "the dependency") until finally a non-trivial GCD is found? And something else: Code: begin with 1395803 relations and 1627193 unique ideals reduce to 1343077 relations and 1305488 ideals in 9 passes max relations containing the same ideal: 84 I understood that target excess is 28409 means that 28409 more relations than ideals are required and it's fulfilled in the example, but what are those ideals? Maybe it's even possible to explain the difference between unique ideals and ideals?
2021-04-10, 12:50   #10
charybdis

Apr 2020

263 Posts

Quote:
 Originally Posted by bur charybdis, I didn't really get the explanation for the last part. I included another example showing that it takes several attempts to find a GCD != 1 or N. If I got you right, more than n relations are found for n variables, so with the excess our chance of finding a factor are is 50% for each excess relation. But why does it happen so often that I still see GCD is 1 or N with this large excess? So it's two questions, first why does it happen so often and second, what is changed (apparently "the dependency") until finally a non-trivial GCD is found?
Each dependency is a single solution to the equations, i.e. a set of columns of the matrix which add to 0. At the end of the linear algebra you'll see something like "recovered 30 nontrivial dependencies". Each of these dependencies has a 1/2 probability of finding a non-trivial gcd, assuming there are only two factors.

Quote:
 I understood that target excess is 28409 means that 28409 more relations than ideals are required and it's fulfilled in the example, but what are those ideals? Maybe it's even possible to explain the difference between unique ideals and ideals?
Continuing the analogy from earlier, ideals are essentially the variables that our equations are in. (They correspond roughly to factors of the rational and algebraic norms.) I don't think there's any difference between "ideals" and "unique ideals" here; the word "unique" implies that we count each ideal only once, no matter how many relations it appears in.

Last fiddled with by charybdis on 2021-04-10 at 13:38 Reason: rows -> columns

2021-04-10, 15:41   #11
LaurV
Romulan Interpreter

Jun 2011
Thailand

3·47·67 Posts

Quote:
 Originally Posted by bsquared That's how you treat your friends, eh? Remind me never to become your enemy
"Please God protect me from my friends. From my enemies I protect myself." (old Romanian proverb).

 Similar Threads Thread Thread Starter Forum Replies Last Post James Heinrich YAFU 8 2021-03-15 19:43 EdH YAFU 8 2018-03-14 17:22 BudgieJane YAFU 3 2016-02-22 15:14 Uncwilly Lounge 4 2011-04-01 19:15 firejuggler Aliquot Sequences 7 2010-05-29 02:46

All times are UTC. The time now is 08:39.

Fri May 14 08:39:52 UTC 2021 up 36 days, 3:20, 0 users, load averages: 2.78, 2.63, 2.42