mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > CADO-NFS

Reply
 
Thread Tools
Old 2019-04-18, 08:38   #353
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

112368 Posts
Default

Quote:
Originally Posted by EdH View Post
I have the scripts pretty much rewritten and they are "mostly" working correctly.

Here is a file of Info for 91 composites factored using VBCurtis' params files, where available. (There may be duplicate runs for some of the numbers.)
I downloaded the CADO package last week, per your instructions.

What should I do now with this file?
ET_ is offline   Reply With Quote
Old 2019-04-18, 14:25   #354
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

D1B16 Posts
Default

Quote:
Originally Posted by ET_ View Post
I downloaded the CADO package last week, per your instructions.

What should I do now with this file?
I was a bit confusing in my post. The scripts I referred to were the ones I'd lost earlier that I use for my Aliquot factoring via ecmpi and CADO-NFS across several machines. The file I posted was data from the CADO runs I did using VBCurtis' modified params files from the "improved params..." thread. They are mostly for VBCurtis to review, but anyone else can look them over for timing and poly/sieve info that can be used to adjust the params files for local use.

Sorry for the confusion.
EdH is offline   Reply With Quote
Old 2019-04-18, 15:01   #355
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

432510 Posts
Default

Quote:
Originally Posted by ET_ View Post
I downloaded the CADO package last week, per your instructions.

What should I do now with this file?
If you used git to download the package, just cd into the cado-nfs folder and "make". It will ask if you allow downloading cmake, accept.
An hour or so later you'll have a working copy of CADO.

To test it, while in the cado-nfs directory invoke:
./cado-nfs.py {input number}

With stock parameters, CADO is a bit slower than YAFU; with my improved params files, CADO is somewhat faster than YAFU. I await tests like Ed's to measure just how much faster across a variety of hardware, but so far I estimate it's 20 to 30% faster than YAFU. For example, I used 3 Haswell 3.3ghz cores to factor a C139 in 38 hours with CADO.

To use my params files, grab them from the "improved parameters" thread, and save to the folder /cado-nfs/parameters/factor. You can rename the original ones for safekeeping, or overwrite them.

If you do so, please post a before & after time on the same composite or on two composites very close in size (say, a factor of 2 apart). It seems CADO performs quite differently by hardware type.
VBCurtis is offline   Reply With Quote
Old 2019-04-18, 17:07   #356
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×2,383 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
To use my params files, grab them from the "improved parameters" thread, and save to the folder /cado-nfs/parameters/factor. You can rename the original ones for safekeeping, or overwrite them.

If you do so, please post a before & after time on the same composite or on two composites very close in size (say, a factor of 2 apart). It seems CADO performs quite differently by hardware type.
I will do both tests to improve my knowledge of the system.
Just need a hint on the size of the composites (you said C139?) and where to choose them. I will gladly perform a test on numbers you want to run before and after the file substitution.
ET_ is offline   Reply With Quote
Old 2019-04-18, 19:17   #357
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

10E516 Posts
Default

Any composites you have an interest in will do; some folks have factored the same number twice, or even three times (Yafu, Cado stock, cado my files) to compare speeds.

If a direct comparison with the same input is your style, I suggest finding a composite between 100 and 120 digits; if you have none from your own interests, PM me and I can send an entry from an aliquot sequence. If you'd rather do productive work, find any two inputs of similar size (say, within a factor of two) for an a/b test, where "a" can be YAFU or stock CADO and "b" is improved CADO.

I am currently developing files for C125, C135, C140; C95 to C120 (and C130) are presently posted. CADO rounds inputs to the nearest multiple-of-5-digits for parameter choice; we can write custom params for an individual number, but until 165+ digits that seems like unnecessary complication.

RichD is currently testing my beta C140 file against stock-CADO on a pair of C138s, a size that takes between 1 and 1.5 days on a quad-core desktop; he already has a strong idea of what speed his rig manages with YAFU.

A C120 takes a couple of hours, while C105 takes half an hour or so. As you can imagine, I've done much more testing on C110 and lower because the tests are so fast; I'm mostly using a best-fit curve of time vs difficulty to evaluate whether C125+ files are "as fast" as the files for smaller numbers.

Last fiddled with by VBCurtis on 2019-04-18 at 19:18
VBCurtis is offline   Reply With Quote
Old 2019-04-22, 20:42   #358
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11000110110012 Posts
Default CADO for the sieving step

So, it's nice and easy to download and install (I was a little surprised that it carefully downloaded and installed its own copy of cmake), and not too difficult to give a polynomial.

The automatic script fell over very quickly after trying to run ten separate two-threaded sievers at 13GB memory usage each on my 96GB machine (though spent ten minutes producing a free-relations file before starting to sieve); it left behind a file project.wucmd which contained plausible-looking raw 'las' command lines.

I am a little disconcerted that I can't find evidence of the out-of-memory-failure in /var/log or in any of the log files (I'm deducing the memory usage by running the line from project.cmd); the console output ends

Code:
Info:Lattice Sieving: Adding workunit L2253_sieving_230045000-230050000 to database
and some process is clearly still running on that console even though no 'las' processes are running on the system.

Running las with '-t 20' gives a process which uses 21GB and does seem to be running on twenty CPUs at least some of the time, and generating about three relations a second.

Last fiddled with by fivemack on 2019-04-24 at 07:59
fivemack is offline   Reply With Quote
Old 2019-04-23, 03:38   #359
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

52·173 Posts
Default

My plan for the C207 team sieve is to set tasks.sieve.las.threads = 4, which will cause every client las (siever) process to run 4-threaded. This causes an occasional small error noting some bucket is full, but it allocates a bit more memory for said bucket (I believe the setting is bkmult) and continues with a suggestion that maybe I'm using too many threads for my choice of lim's. It's consistent, then, that your 20-threaded single process would trigger that error more frequently and require more RAM than my 4-threaded process @12-13GB does.
When running the deg 6 poly 4-threaded on a 20-core machine with 30+ other threads busy, I was also finding about 3 relations per second; top showed CPU use near 400%, so your timing confuses me a little.
I've been trying the obvious flags on the command line for cado-nfs-client.py to set the number of threads on the client side, with no luck.
VBCurtis is offline   Reply With Quote
Old 2019-04-23, 06:40   #360
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

6,361 Posts
Default

The log at the end of the 20-threaded job was
Code:
$ taskset -c 10-19,30-39 /scratch/cado-gnfs/cado-nfs-2.3.0/build/oak/sieve/las -I 16 -poly L2253.jon/L2253.poly -q0 231000000 -q1 231001000 -lim0 268000000 -lim1 268000000 -lpb0 33 -lpb1 33 -mfb0 99 -mfb1 99 -lambda0 3.1 -lambda1 3.2 -fb L2253.jon/L2253.roots.gz -out L2253.jon/231M-231M001 -t 20 -sqside 1 -stats-stderr

# Average J=32760 for 68 special-q's, max bucket fill 0.796379
# Discarded 0 special-q's out of 68 pushed
# Total cpu time 28131.83s [norm 7.14+23.2, sieving 8781.8 (7022.8 + 448.1 + 1310.9), factor 19319.7 (1995.1 + 17324.6)]
# Total elapsed time 2051.09s, per special-q 30.163s, per relation 0.238249s
# PeakMemusage (MB) = 25783
# Total 8609 reports [3.27s/r, 126.6r/sq]
This is a *lot* slower than gnfs-lasieve4I16e, but looking at that command line I am using 3LP on the rational side which is usually very slow; rerunning with -mfb0 66 before contemplating changing the factor-large-part parameters.
fivemack is offline   Reply With Quote
Old 2019-04-23, 08:58   #361
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·2,861 Posts
Default

Why is number of threads per client not a client side setting?
henryzz is offline   Reply With Quote
Old 2019-04-23, 19:21   #362
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

6,361 Posts
Default

Code:
I=16, 3lpa 2lpr, -t20
# Average J=32760 for 33 special-q's, max bucket fill 0.507688                                         
# Discarded 0 special-q's out of 33 pushed                                                             
# Total cpu time 5855.10s [norm 2.46+10.1, sieving 4878.9 (3963.3 + 224.9 + 690.7), factor 963.7 (532.0 + 431.7)
# Total elapsed time 363.64s, per special-q 11.0193s, per relation 0.0980416s                          
# PeakMemusage (MB) = 25783                                                                            
# Total 3709 reports [1.58s/r, 112.4r/sq]                 

I=15, 3lpa 2lpr,  -t20
                                             
# Average J=16360 for 47 special-q's, max bucket fill 0.409821                                         
# Discarded 0 special-q's out of 47 pushed                                                             
# Total cpu time 1473.23s [norm 0.99+4.8, sieving 1010.2 (705.3 + 76.7 + 228.3), factor 457.2 (197.0 + 260.2)
# Total elapsed time 80.44s, per special-q 1.7114s, per relation 0.0352634s                            
# PeakMemusage (MB) = 8456                                                                             
# Total 2281 reports [0.646s/r, 48.5r/sq]
I should do a -t40 run since my comparison is 40 parallel gnfs-lasieve4I15e jobs

Last fiddled with by fivemack on 2019-04-23 at 22:48
fivemack is offline   Reply With Quote
Old 2019-04-23, 19:29   #363
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

52×173 Posts
Default

And it looks like I should do an I=15 run both 2- and 4- threaded to see if we benefit from I=16 at all! CADO is just fine with very large Q values, so it's not important if yield is worse than half on I=15 vs I=16; that would suggest Q-range of perhaps 20M to 1100M rather than 20M to 450M, and memory use under 4GB per process. It also may turn out that optimal lim choices for I=15 are a bit higher than I=16; we are constrained by memory footprint on I=16 to have lim's about where I tested (Rlim 260M, Alim 550M), but we have no such constraints on I=15.

Did you run makefb before las to generate the factor-base file? I haven't yet had success finding relations using las free-standing, but that would clearly be preferable.
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
CADO-NFS on windows jux CADO-NFS 22 2019-11-12 12:08
CADO help henryzz CADO-NFS 4 2017-11-20 15:14
CADO and WinBlows akruppa Programming 22 2015-12-31 08:37
CADO-NFS skan Information & Answers 1 2013-10-22 07:00
CADO R.D. Silverman Factoring 4 2008-11-06 12:35

All times are UTC. The time now is 02:03.

Thu Oct 1 02:03:55 UTC 2020 up 20 days, 23:14, 1 user, load averages: 1.39, 1.42, 1.45

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.