mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   CADO NFS (https://www.mersenneforum.org/showthread.php?t=11948)

ET_ 2019-04-18 08:38

[QUOTE=EdH;514016]I have the scripts pretty much rewritten and they are "mostly" working correctly.

Here is a file of Info for 91 composites factored using VBCurtis' params files, where available. (There may be duplicate runs for some of the numbers.)[/QUOTE]

I downloaded the CADO package last week, per your instructions.

What should I do now with this file?

EdH 2019-04-18 14:25

[QUOTE=ET_;514028]I downloaded the CADO package last week, per your instructions.

What should I do now with this file?[/QUOTE]
I was a bit confusing in my post. The scripts I referred to were the ones I'd lost earlier that I use for my Aliquot factoring via ecmpi and CADO-NFS across several machines. The file I posted was data from the CADO runs I did using VBCurtis' modified params files from the "[URL="https://www.mersenneforum.org/showthread.php?t=24274"]improved params...[/URL]" thread. They are mostly for VBCurtis to review, but anyone else can look them over for timing and poly/sieve info that can be used to adjust the params files for local use.

Sorry for the confusion.

VBCurtis 2019-04-18 15:01

[QUOTE=ET_;514028]I downloaded the CADO package last week, per your instructions.

What should I do now with this file?[/QUOTE]

If you used git to download the package, just cd into the cado-nfs folder and "make". It will ask if you allow downloading cmake, accept.
An hour or so later you'll have a working copy of CADO.

To test it, while in the cado-nfs directory invoke:
./cado-nfs.py {input number}

With stock parameters, CADO is a bit slower than YAFU; with my improved params files, CADO is somewhat faster than YAFU. I await tests like Ed's to measure just how much faster across a variety of hardware, but so far I estimate it's 20 to 30% faster than YAFU. For example, I used 3 Haswell 3.3ghz cores to factor a C139 in 38 hours with CADO.

To use my params files, grab them from the "improved parameters" thread, and save to the folder /cado-nfs/parameters/factor. You can rename the original ones for safekeeping, or overwrite them.

If you do so, please post a before & after time on the same composite or on two composites very close in size (say, a factor of 2 apart). It seems CADO performs quite differently by hardware type.

ET_ 2019-04-18 17:07

[QUOTE=VBCurtis;514046]
To use my params files, grab them from the "improved parameters" thread, and save to the folder /cado-nfs/parameters/factor. You can rename the original ones for safekeeping, or overwrite them.

If you do so, please post a before & after time on the same composite or on two composites very close in size (say, a factor of 2 apart). It seems CADO performs quite differently by hardware type.[/QUOTE]

I will do both tests to improve my knowledge of the system.
Just need a hint on the size of the composites (you said C139?) and where to choose them. I will gladly perform a test on numbers you want to run before and after the file substitution.

VBCurtis 2019-04-18 19:17

Any composites you have an interest in will do; some folks have factored the same number twice, or even three times (Yafu, Cado stock, cado my files) to compare speeds.

If a direct comparison with the same input is your style, I suggest finding a composite between 100 and 120 digits; if you have none from your own interests, PM me and I can send an entry from an aliquot sequence. If you'd rather do productive work, find any two inputs of similar size (say, within a factor of two) for an a/b test, where "a" can be YAFU or stock CADO and "b" is improved CADO.

I am currently developing files for C125, C135, C140; C95 to C120 (and C130) are presently posted. CADO rounds inputs to the nearest multiple-of-5-digits for parameter choice; we can write custom params for an individual number, but until 165+ digits that seems like unnecessary complication.

RichD is currently testing my beta C140 file against stock-CADO on a pair of C138s, a size that takes between 1 and 1.5 days on a quad-core desktop; he already has a strong idea of what speed his rig manages with YAFU.

A C120 takes a couple of hours, while C105 takes half an hour or so. As you can imagine, I've done much more testing on C110 and lower because the tests are so fast; I'm mostly using a best-fit curve of time vs difficulty to evaluate whether C125+ files are "as fast" as the files for smaller numbers.

fivemack 2019-04-22 20:42

CADO for the sieving step
 
So, it's nice and easy to download and install (I was a little surprised that it carefully downloaded and installed its own copy of cmake), and not too difficult to give a polynomial.

The automatic script fell over very quickly after trying to run ten separate two-threaded sievers at 13GB memory usage each on my 96GB machine (though spent ten minutes producing a free-relations file before starting to sieve); it left behind a file project.wucmd which contained plausible-looking raw 'las' command lines.

I am a little disconcerted that I can't find evidence of the out-of-memory-failure in /var/log or in any of the log files (I'm deducing the memory usage by running the line from project.cmd); the console output ends

[code]
Info:Lattice Sieving: Adding workunit L2253_sieving_230045000-230050000 to database
[/code]

and some process is clearly still running on that console even though no 'las' processes are running on the system.

Running las with '-t 20' gives a process which uses 21GB and does seem to be running on twenty CPUs at least some of the time, and generating about three relations a second.

VBCurtis 2019-04-23 03:38

My plan for the C207 team sieve is to set tasks.sieve.las.threads = 4, which will cause every client las (siever) process to run 4-threaded. This causes an occasional small error noting some bucket is full, but it allocates a bit more memory for said bucket (I believe the setting is bkmult) and continues with a suggestion that maybe I'm using too many threads for my choice of lim's. It's consistent, then, that your 20-threaded single process would trigger that error more frequently and require more RAM than my 4-threaded process @12-13GB does.
When running the deg 6 poly 4-threaded on a 20-core machine with 30+ other threads busy, I was also finding about 3 relations per second; top showed CPU use near 400%, so your timing confuses me a little.
I've been trying the obvious flags on the command line for cado-nfs-client.py to set the number of threads on the client side, with no luck.

fivemack 2019-04-23 06:40

The log at the end of the 20-threaded job was
[code]
$ taskset -c 10-19,30-39 /scratch/cado-gnfs/cado-nfs-2.3.0/build/oak/sieve/las -I 16 -poly L2253.jon/L2253.poly -q0 231000000 -q1 231001000 -lim0 268000000 -lim1 268000000 -lpb0 33 -lpb1 33 -mfb0 99 -mfb1 99 -lambda0 3.1 -lambda1 3.2 -fb L2253.jon/L2253.roots.gz -out L2253.jon/231M-231M001 -t 20 -sqside 1 -stats-stderr

# Average J=32760 for 68 special-q's, max bucket fill 0.796379
# Discarded 0 special-q's out of 68 pushed
# Total cpu time 28131.83s [norm 7.14+23.2, sieving 8781.8 (7022.8 + 448.1 + 1310.9), factor 19319.7 (1995.1 + 17324.6)]
# Total elapsed time 2051.09s, per special-q 30.163s, per relation 0.238249s
# PeakMemusage (MB) = 25783
# Total 8609 reports [3.27s/r, 126.6r/sq]
[/code]

This is a *lot* slower than gnfs-lasieve4I16e, but looking at that command line I am using 3LP on the rational side which is usually very slow; rerunning with -mfb0 66 before contemplating changing the factor-large-part parameters.

henryzz 2019-04-23 08:58

Why is number of threads per client not a client side setting?

fivemack 2019-04-23 19:21

[CODE]
I=16, 3lpa 2lpr, -t20
# Average J=32760 for 33 special-q's, max bucket fill 0.507688
# Discarded 0 special-q's out of 33 pushed
# Total cpu time 5855.10s [norm 2.46+10.1, sieving 4878.9 (3963.3 + 224.9 + 690.7), factor 963.7 (532.0 + 431.7)
# Total elapsed time 363.64s, per special-q 11.0193s, per relation 0.0980416s
# PeakMemusage (MB) = 25783
# Total 3709 reports [1.58s/r, 112.4r/sq]

I=15, 3lpa 2lpr, -t20

# Average J=16360 for 47 special-q's, max bucket fill 0.409821
# Discarded 0 special-q's out of 47 pushed
# Total cpu time 1473.23s [norm 0.99+4.8, sieving 1010.2 (705.3 + 76.7 + 228.3), factor 457.2 (197.0 + 260.2)
# Total elapsed time 80.44s, per special-q 1.7114s, per relation 0.0352634s
# PeakMemusage (MB) = 8456
# Total 2281 reports [0.646s/r, 48.5r/sq]

[/CODE]

I should do a -t40 run since my comparison is 40 parallel gnfs-lasieve4I15e jobs

VBCurtis 2019-04-23 19:29

And it looks like I should do an I=15 run both 2- and 4- threaded to see if we benefit from I=16 at all! CADO is just fine with very large Q values, so it's not important if yield is worse than half on I=15 vs I=16; that would suggest Q-range of perhaps 20M to 1100M rather than 20M to 450M, and memory use under 4GB per process. It also may turn out that optimal lim choices for I=15 are a bit higher than I=16; we are constrained by memory footprint on I=16 to have lim's about where I tested (Rlim 260M, Alim 550M), but we have no such constraints on I=15.

Did you run makefb before las to generate the factor-base file? I haven't yet had success finding relations using las free-standing, but that would clearly be preferable.


All times are UTC. The time now is 19:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.