mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cunningham Tables (https://www.mersenneforum.org/forumdisplay.php?f=51)
-   -   5- Table Discussion and OddPerfect.org (https://www.mersenneforum.org/showthread.php?t=4425)

ewmayer 2005-12-05 20:32

[QUOTE=akruppa]The exam is over and went much better than I had expected. Many thanks to Prof. Ernst W. Mayr for a very relaxed and successful examination! :wink:[/QUOTE]For those who were wondering - no relation there. (My last name has an extra 'e' - and neither I nor Alex's examining professor is any relation to the recently-deceased famous Harvard zoologist and evolutionary theorist Ernst Mayr, AFAIK.) Alex and I may be buddies, but even we have out limits as far as beer-lubricated academic nepotism is concerned. ;)

akruppa 2005-12-06 09:58

I've split the patents discussion into a [URL=http://www.mersenneforum.org/showthread.php?t=5092]new thread[/URL] in the Lounge.

Alex

akruppa 2005-12-26 17:46

The CWI filter program gave me a bit of trouble. It turned out that removing dupes and singletons worked fine, but merging didn't! Internally, filter uses the 31st bit of the dword storing the primes as a flag for whether the prime has been paired yet or not, so 31 bit primes would get mangled. I changed the code to store primes shifted right by one bit, after all, almost all of them are odd, so the lsb doesn't hold a lot of information. The prime 2 is represented by 0.

Enter the next catch - free relations have the different ideals stored in place of the primes in the relations, and the ideals can be even or odd! So another nights worth of hacking went by, making the code treat free and non-free relations differently. Only prime are shifted right now, ideals aren't. This means I'll get free relations only on primes <2^30, but that's ok - I don't think there are many primes >2^30 where all ideals are present in my relations.

Started a merging run last night, it now is at pass 18 and still has 8.6M unbalanced ideals :sad:. I doubt I'll get the matrix size much below (8M)^2 which will take awfully long to solve. Maybe I borked something during the hacking and merging doesn't work like it should... need to do a regression test next.

Stay tuned to see me mumble to myself some more,
Alex

Numbers 2005-12-28 15:28

Please feel free to mumble away. I'm sure I am not alone in being fascinated to see how this turns out. In fact I have what is possibly a very silly question.

As I understand it, if we are trying to factor n, the idea is to find as many pairs x, y as satisfy x^2 = y^2(mod n), and GCD(xy, n) = 1.

when it will be seen that GCD(x-y, n) is a factor of n.

I get that bit. And, obviously, sometimes GCD(x-y, n) = 1, which is not a very useful factor. Which explains why we need a lot of relations, because some of them are not useful. But seeing as you only need one relation to find a factor, why do you start by finding millions of them. Isnt that just overkill, or have I completely misunderstood?

xilman 2005-12-28 16:06

[QUOTE=Numbers]Please feel free to mumble away. I'm sure I am not alone in being fascinated to see how this turns out. In fact I have what is possibly a very silly question.

As I understand it, if we are trying to factor n, the idea is to find as many pairs x, y as satisfy x^2 = y^2(mod n), and GCD(xy, n) = 1.

when it will be seen that GCD(x-y, n) is a factor of n.

I get that bit. And, obviously, sometimes GCD(x-y, n) = 1, which is not a very useful factor. Which explains why we need a lot of relations, because some of them are not useful. But seeing as you only need one relation to find a factor, why do you start by finding millions of them. Isnt that just overkill, or have I completely misunderstood?[/QUOTE]Close but no cigar.

What we are looking for is a set of relations which [b]when multiplied together[/b] satisfy x^2==y^2 (mod n).

For instance, suppose we have -6 == 10 mod N, -2==5 mod N and 27 == 8 mod N
(I'm just making these up as I go along!)

None of these are of the form x^2==y^2 mod N, but if you multiply them together, you find 324 == 400 mod N, or 18^2 == 20^2 mod N, from which gcd(18+20,N) may be a useful factor of N, if N is divisible by 19, that is.

The example above is greatly oversimplified, but it illustrates the fundamental idea behind virtually all factor-base methods for factoring, including CFRAC, QS and NFS.

Paul

alpertron 2005-12-28 16:44

This is also explained in the introduction of the [URL=http://www.mersennewiki.org/index.php/SIQS]article about SIQS[/URL] on MersenneWiki.

akruppa 2005-12-28 17:33

The merging finished yesterday, filter reports 7.21M unbalanced ideals. With small primes added, the matrix should be about (7.35M)^2, so the last merging passes fared better than I expected.
I'm currently merging relations for 3,463+. The hacked version has finished already, I'll repeat with the same parameters and input file but using the original CWI version. Fingers crossed that the output will be identical...

Alex

Numbers 2005-12-28 23:50

[QUOTE=xilman]Close but no cigar.[/QUOTE]

Shame about the cigar, would have gone well with a brandy after dinner.

Found a nice paper on this:
[url]http://www.mersenneforum.org/attachments/pdfs/FSP200-50SNV.pdf[/url]

highly readable and not too difficult, but probably a bit out of date.

Many thanks, again.

akruppa 2005-12-30 23:36

Oh, the joy of hacking someone else's code.

My attempts to re-run the merging pass with the original filter program were met with limited success - the program locked up hard each time. It keeps running but does not even respond to kill -9. What is more, if you start it more than once, the machine locks up hard! This is now the third time the admin has to reboot the node I was using. Not sure if he'll see the humour in that.

Turns out that the CWI tools put a time stamp into binary format files, which simply uses a time_t variable. But in 64-bit applications, sizeof(time_t)=8, in 32-bit only 4. I created the file with my 64 bit version, reading it with the original 32 bit binary causes the data to be read shifted by 4 bytes. This causes a length value to be read as 0, after subtracting a header length it becomes negative. This small negative value is then used as the length to read from a file. If I'm not mistaken, the value gets cast to 64 bits with sign extension on the way to the system call. If the kernel interprets it as an unsigned then, it'll try to read 2^63-1bytes = 10.000.000 TByte from disk. Would explain why it takes so long...

Long story short, beware of binary relations files.

Alex

Wacky 2005-12-31 00:53

[QUOTE=akruppa]Long story short, beware of binary relations files.[/QUOTE]

Or, more generally, be careful about all of your relations.

akruppa 2006-01-20 17:58

[QUOTE=Wacky]Or, more generally, be careful about all of your relations.[/QUOTE]

The nice thing about NFS is that you can get a lot of excess relations and then choose which ones to keep! :wink:

Brief history: after sieving sq on the rational side in [30M, 70M] it turned out that a sieve region of 16394x8192 (gnfs-lasieveI14e) would use a lot of special-q values (about up to 150M) and probably produce a pretty large matrix. So sieving on the algebraic side was done with a 32768x16384 region. There is no mention of a gnfs-lasieveI15e program in Franke's documentation, but it turned out that a simple "make gnfs-lasieveI15e" worked just fine. The binary did not run on Pentiums as the L1 cache size too small, but no problem on Athlons or Opterons. Sieving of sq on the algebraic side was done in the interval [20M, 70M]. In addition, some line sieving over the lattice siever factor base primes was done. Later, a part of the rational side was re-sieved with the large sieve region size, but this didn't add too many new relations, unsurprisingly.

All in all there were 174237784 unique relations, after removing singleton ideals of norm >70M, 107136618 relations remained on 79210759 ideals of norm >70M (plus about 2*4118064 smaller ideals) for a real excess of about 19689731.

The clique algorithm was run in many small passes. By default, the clique algorithm tries to halve the excess in each pass, but experiments suggested that running the algorithm more often, discarding fewer relations each time, leads to better results. When the algorithm was allowed to only discard 100000 relations per pass at most, the resulting data set included about 260000 fewer ideals than after running it with the default behaviour (23505974 vs. 23769293). Only about a 1% improvement, but so easily obtained that it should be considered worthwhile.

Merging with 35 passes, merging ideals that appear up to 25 times, discarding relations sets with more than 17.5 relations left 7308156 equations on 7139734 ideals >1M, the resulting matrix being 7296206x7314657. The filter program was hacked (note the pattern?) to only discard relation sets of weight >=maxrels+0.5 if the maximum allowed number of relations were discarded. That way another pass can be run quite easily to discard exactly as many of the heaviest relation sets as desired. This left 7290681 equations on 7139474 ideals >1M and produced a matrix of 7296195x7297185, oversquare by 990. This matrix is currently running.

The behaviour of the parallel block-Lanczos (not the distributed one, mind you, as I don't have it) is pretty unintuitive... the single cpu version had an estimated time of 53 days to solve the matrix, the two cpu version almost exactly the same so the second cpu did not result in any speedup at all. However, the four cpu version has an estimated time of about 28 days. That is now running. Depending on how often I get time on a cluster node, we might see factors in late February or early March.

Alex


All times are UTC. The time now is 08:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.