mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   YAFU (https://www.mersenneforum.org/forumdisplay.php?f=96)
-   -   Yafu (https://www.mersenneforum.org/showthread.php?t=10871)

bsquared 2011-10-27 13:33

[QUOTE=bsquared;275977]That said, the limit of C150 is arbitrary. I don't know of a reason why it wouldn't continue to work above that, but at the same time wouldn't be surprised if it didn't. [/QUOTE]


[CODE]starting SIQS on c125: 68525671337287457013612258265324281116005247164488030328858023717563314386831050030455580919563012104834409828620867405731871
Floating exception
[/CODE]

It appears it is broken for larger inputs anyway. I got a C116 to start (at 12.85 rels/sec) but it fails somewhere between there and C125. That was somewhat of a surprise, but I'm not really motivated to fix it right now.

The C116 would probably take about 4 days on one thread at that rate.

bsquared 2011-10-27 13:56

[QUOTE=LaurV;275952] (I observed that the time almost doubles with every 5-6 digits, and I assume that is because each 5-6 digits means a new "digit" in your internal numeration base). [/QUOTE]

Not exactly. It grows at that rate both because it becomes more and more difficult to find "good" factorizations of numbers that grow as approximately sqrt(N), and because you need more and more of them. For details see, for example, Appendix A of [URL="http://www.google.com/url?sa=t&rct=j&q=contini%20siqs&source=web&cd=2&ved=0CCgQFjAB&url=http%3A%2F%2Fwww.crypto-world.com%2Fdocuments%2Fcontini_siqs.pdf&ei=r2KpTty1D4WvsAKR853qDw&usg=AFQjCNFMqZqsBEZD5CjYc_FUVpLtGKiU5A&sig2=QC0zV_cPJ17jbZKDEHqAuA"]Contini's masters thesis[/URL].

LaurV 2011-10-27 14:51

I know the paper and I TOTALLY agree with the author (first chapter, where he explains his 4 motives why he believe SIQS is not to be thrown out with so much ease). In fact, that is one of the reasons I like SIQS more then NFS (the other reason is related to the fact that I better understand it, comparing to nfs, from which I don't understand too much).

I will be back in no more then 4000 CPU days with some factorization, so you could not argue anymore against raising that limit :P

bsquared 2011-10-27 18:53

You will have to use the latest SVN if you are really serious about trying out a huge job. Over my lunch hour I fixed some low hanging bugs and now things appear to work up to about C145.

[CODE]==== sieve params ====
n = 145 digits, 480 bits
factor base: 1076032 primes (max prime = 35089877)
single large prime cutoff: 4280964994 (122 * pmax)
double large prime range from 51 to 58 bits
double large prime cutoff: 217157592290480736
allocating 24 large prime slices of factor base
buckets hold 2048 elements
sieve interval: 92 blocks of size 32768
polynomial A has ~ 19 factors
using multiplier of 1
using SPV correction of 20 bits, starting at offset 33
using SSE2 for x128 sieve scanning
using SSE2 for resieving 13-16 bit primes
trial factoring cutoff at 127 bits
==== sieving in progress (1 thread): 1076096 relations needed ====
==== Press ctrl-c to abort and save state ====
Bpoly 71 of 65536: buffered 1 rels, checked 8726 (0.20 rels/sec)
Bpoly 187 of 65536: buffered 1 rels, checked 22955 (0.10 rels/sec)
Bpoly 303 of 65536: buffered 1 rels, checked 37339 (0.07 rels/sec)
Bpoly 419 of 65536: buffered 1 rels, checked 51638 (0.05 rels/sec)
Bpoly 535 of 65536: buffered 1 rels, checked 66007 (0.04 rels/sec)
Bpoly 651 of 65536: buffered 1 rels, checked 80373 (0.03 rels/sec)
Bpoly 767 of 65536: buffered 3 rels, checked 94643 (0.09 rels/sec)
Bpoly 883 of 65536: buffered 3 rels, checked 109141 (0.07 rels/sec)
[/CODE]

Above ~C145 I got crashes with a bunch of memory corruption errors.

You can see that it will be a lot of work. I have some rough extrapolations that say you'll need about 30M relations for a C145. At (optimistcally) 0.1 rels/sec, that's 3472 CPU days on a high end workstation.

Andi47 2011-10-28 18:33

Yafu not resuming NFS sqrt
 
Today in the morning I had to interrupt a NFS job (started with aliqueit; interruption was necessary because I had to leave home for work and take my laptop with me; sqrt was at "reading dependency 6..."), which was in the middle of NFS square root. I just resumed it (starting aliqueit with the -p -e -y flags and having the R=1 flag set in yafu.ini) - it looked for relations files, and then started filtering and linalg. Fortunately it is just a c105, so I lost only a few minutes.

Does Yafu not look if there already is a solved matrix ready, when it attempts to resume an NFS job?

bsquared 2011-10-28 18:53

[QUOTE=Andi47;276154]Today in the morning I had to interrupt a NFS job (started with aliqueit), which was in the middle of NFS square root. I just resumed it (starting aliqueit with the -p -e -y flags and having the R=1 flag set in yafu.ini) - it looked for relations files, and then started filtering and linalg. Fortunately it is just a c105, so I lost only a few minutes.

Does Yafu not look if there already is a solved matrix ready, when it attempts to resume an NFS job?[/QUOTE]


No. And as far as I know, there's no easy way to tell in an automated way if a given solved matrix corresponds to the current input. The matrix, matrix checkpoints, and dependency files don't carry along the input, AFAIK. So resuming sqrt requires a special flag. Currently msieve has -nc3 for this.

Actually, the same is true of -R. Doing -R is essentially an assertion to yafu that any sieving data files that currently exist are valid for the current factorization, so go ahead and append to them. Without this assertion, there's no easy way to tell.

I'm in the middle of revamping some of yafu's nfs and one of the things I'm doing is adding in several flags for resuming jobs - including the sqrt step. So this capability is coming.

Integration with aliqueit is a different matter. Even after the flags are in place in yafu, you'd have to add nc3=1 (or something like that) in yafu.ini before restarting aliqueit, and then remove the flag after the sqrt restarts (so the next iteration doesn't try sqrt right away).

schickel 2011-10-28 19:26

[QUOTE=bsquared;276158]Integration with aliqueit is a different matter. Even after the flags are in place in yafu, you'd have to add nc3=1 (or something like that) in yafu.ini before restarting aliqueit, and then remove the flag after the sqrt restarts (so the next iteration doesn't try sqrt right away).[/QUOTE]Or have several one-off flags similar to "-e" that tell aliqueit to skip (or advance to) some particular step on the current line only.

bsquared 2011-10-28 19:30

[QUOTE=schickel;276164]Or have several one-off flags similar to "-e" that tell aliqueit to skip (or advance to) some particular step on the current line only.[/QUOTE]

Yeah, that'd be great. You volunteering? :smile:

bchaffin 2011-10-28 20:18

As long as you're revamping nfs... :smile:

One thing which would be nice is the ability to use a different number of threads for NFS linear algebra than for all other factoring steps. On my Westmere 6-core, I get best performance with 12 threads for ECM/SIQS/GNFS sieving, but 6 threads for linear algebra.

Some more info for the curious:

I've tested the Hyper-Threading speedup for ECM, SIQS, and GNFS sieving; they're all highly optimized code so they don't get much, but but running 2 threads on a core gives marginally better total throughput (~5-7%) than running with HT disabled.

But linear algebra is different; it's significantly slower with HT. On my to-do list (which is not making much progress...) is to investigate whether this is due to cache-size parameters and could be fixed by telling msieve that the per-core caches are half the size. But I have no idea whether there is any easy fix, others probably know better.

In my experience, for linear algebra disabling HT entirely via BIOS is faster than running 6/12 threads, which is faster than running 12/12 threads. I don't know if the penalty for having HT enabled is due to interference/cache pollution from the unused threads or because Linux is not perfect about spreading active threads across different cores.

So for big factorizations I recommend rebooting and turning off HT. But obviously that's not worth it for ever c105 you do, so just using 1 thread per core is better than nothing.

Note that all of this is just from my Core i7 running Ubuntu, your mileage may vary. This almost certainly doesn't apply to AMD.

jasonp 2011-10-28 21:36

If you're compiling your own Msieve binary, you can also try running make with LARGEBLOCKS=1, which makes the LA do matrix multiplies in much larger pieces. For big matrices multiple people have reported nice speedups; the increased cache footprint is more than offset by the increased cache reuse.

bsquared 2011-10-28 21:46

[QUOTE=jasonp;276177]If you're compiling your own Msieve binary, you can also try running make with LARGEBLOCKS=1, which makes the LA do matrix multiplies in much larger pieces. For big matrices multiple people have reported nice speedups; the increased cache footprint is more than offset by the increased cache reuse.[/QUOTE]

You can also play with TARGET_DENSITY, right?

Also, bchaffin, as you're probably aware, you'd have to recompile msieve and then recompile yafu to link in the changes. I should probably also mention that the version of msieve I link into yafu is slightly different than the SVN repository. I make local changes to pass the "too few cycles, matrix probably cannot build" error back up the stack instead of exiting the application, so that yafu can gracefully handle it. I haven't pushed to incorporate that change into SVN because factMsieve relies on the exit signal to handle the error. It'd be nice if the script could be changed, but its also not hard to edit my local msieve source every so often.


All times are UTC. The time now is 22:59.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.