mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > CADO-NFS

Reply
 
Thread Tools
Old 2018-08-14, 04:27   #298
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

32×479 Posts
Default

My current GNFS-180 CADO run is using roughly 15 clients (30 cores), and also drops client tasks randomly. The ones that drop are on a LAN with the server, and on a faster machine than the other clients. It doesn't make much sense, but I just restart the ones that drop. I don't have the CADO server set up to auto-issue clients to other machines.

If you recall what text the server writes when a file doesn't come back in time, you could grep for that text in the log and see what ad-value causes the hangup. If it's in the middle of the poly search range, you might choose to increase the time allowed before giving up on an issued task. I think it's tasks.wu.timeout, but I'm not certain.

I believe the hang occurs because CADO only reissues a task twice before giving up, and the wrapper logic isn't smart enough to issue one more workunit above admax nor move to root opt when one range fails to come back.
VBCurtis is offline   Reply With Quote
Old 2018-08-14, 09:13   #299
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

10110010110102 Posts
Default

Quote:
Originally Posted by EdH View Post
Thanks! I was just going to go search that out. admin was 1e3 and adrange was 5e2. I changed them to 8400 and 420, respectively and restarted from scratch.

I think I'm running 54 clients now, but for some reason, I keep losing some. Since they're in bash generated gnome-terminals, I never catch any crash info. They're just closed and gone.
I believe that there are good ways of stopping them disappearing. The pause command is a possibility. I would imagine others may have better ideas.
henryzz is offline   Reply With Quote
Old 2018-08-14, 16:03   #300
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

3×5×223 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
My current GNFS-180 CADO run is using roughly 15 clients (30 cores), and also drops client tasks randomly. The ones that drop are on a LAN with the server, and on a faster machine than the other clients. It doesn't make much sense, but I just restart the ones that drop. I don't have the CADO server set up to auto-issue clients to other machines.
I start all mine separately, too. They are all LAN connected and run with local scripts that retask them when they are done sieving. I have scripts on each machine for all clients and a single one.

Quote:
Originally Posted by VBCurtis View Post
If you recall what text the server writes when a file doesn't come back in time, you could grep for that text in the log and see what ad-value causes the hangup. If it's in the middle of the poly search range, you might choose to increase the time allowed before giving up on an issued task. I think it's tasks.wu.timeout, but I'm not certain.

I believe the hang occurs because CADO only reissues a task twice before giving up, and the wrapper logic isn't smart enough to issue one more workunit above admax nor move to root opt when one range fails to come back.
This part I may have to study more...

Quote:
Originally Posted by henryzz View Post
I believe that there are good ways of stopping them disappearing. The pause command is a possibility. I would imagine others may have better ideas.
I suppose I could add ";sleep 43200" to my gnome-terminal command, too, but I'd have to do that for every script on all the machines. Maybe the next time I do a machine-wide rewrite...
EdH is offline   Reply With Quote
Old 2018-08-15, 12:15   #301
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

3×5×223 Posts
Default

Hey Curtis,


Here's a c151 run with your enhancements from before:
Code:
Info:Polynomial Selection (size optimized): Aggregate statistics:
Info:Polynomial Selection (size optimized): potential collisions: 167972
Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 168252/45.550/54.833/60.700/1.149
Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 168252/43.600/48.610/55.130/1.161
Info:Polynomial Selection (size optimized): Total time: 302064
Info:Polynomial Selection (root optimized): Aggregate statistics:
Info:Polynomial Selection (root optimized): Total time: 8840.08
Info:Polynomial Selection (root optimized): Rootsieve time: 8838.9
Info:Generate Factor Base: Total cpu/real time for makefb: 32.13/6.7074
Info:Generate Free Relations: Total cpu/real time for freerel: 1239.29/161.525
Info:Lattice Sieving: Aggregate statistics:
Info:Lattice Sieving: Total number of relations: 76147350
Info:Lattice Sieving: Average J: 7754.82 for 1053607 special-q, max bucket fill: 0.61839
Info:Lattice Sieving: Total CPU time: 3.56919e+06s
Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 459.26/299.79
Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics:
Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 298.7s
Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 1633.11/704.758
Info:Filtering - Singleton removal: Total cpu/real time for purge: 1167.91/523.987
Info:Filtering - Merging: Total cpu/real time for merge: 1506.16/1320.63
Info:Filtering - Merging: Total cpu/real time for replay: 137.02/115.897
Info:Linear Algebra: Total cpu/real time for bwc: 401675/0.000190258
Info:Linear Algebra: Aggregate statistics:
Info:Linear Algebra: Krylov: WCT time 33366.92
Info:Linear Algebra: Lingen CPU time 1324.14, WCT time 203.31
Info:Linear Algebra: Mksol: WCT time 18229.37
Info:Quadratic Characters: Total cpu/real time for characters: 169.38/47.1608
Info:Square Root: Total cpu/real time for sqrt: 9452.87/1305.95
Info:HTTP server: Shutting down HTTP server
Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 4.29757e+06/107871
Info:root: Cleaning up computation data in /tmp/cado.rma48tdr
1142592283136731570545301662765774155859478622044326713 2907521994696204311372547245994693384494827511814130029530759988300338906185818977360167429826659
Ed
EdH is offline   Reply With Quote
Old 2018-08-17, 03:52   #302
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

3×5×223 Posts
Default

Here's a c132 run using your c130 file from post 294. I added admin = 1260 and changed adrange to 840:
Code:
Info:Square Root: Total cpu/real time for sqrt: 2378.22/323.447
Info:Polynomial Selection (size optimized): Aggregate statistics:
Info:Polynomial Selection (size optimized): potential collisions: 293781
Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 297375/38.750/48.467/58.100/1.577
Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 297375/37.450/42.329/49.080/0.895
Info:Polynomial Selection (size optimized): Total time: 77673.3
Info:Polynomial Selection (root optimized): Aggregate statistics:
Info:Polynomial Selection (root optimized): Total time: 1253.51
Info:Polynomial Selection (root optimized): Rootsieve time: 1252.44
Info:Generate Factor Base: Total cpu/real time for makefb: 8.08/1.76903
Info:Generate Free Relations: Total cpu/real time for freerel: 619.78/81.2041
Info:Lattice Sieving: Aggregate statistics:
Info:Lattice Sieving: Total number of relations: 36903672
Info:Lattice Sieving: Average J: 3789.6 for 383367 special-q, max bucket fill: 0.739391
Info:Lattice Sieving: Total CPU time: 333999s
Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 98.16/316.572
Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics:
Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 316.3s
Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 458.01/214.492
Info:Filtering - Singleton removal: Total cpu/real time for purge: 184.18/63.6672
Info:Filtering - Merging: Total cpu/real time for merge: 242.93/211.991
Info:Filtering - Merging: Total cpu/real time for replay: 35.22/28.8648
Info:Linear Algebra: Total cpu/real time for bwc: 27425.2/0.000190735
Info:Linear Algebra: Aggregate statistics:
Info:Linear Algebra: Krylov: WCT time 2275.83
Info:Linear Algebra: Lingen CPU time 321.19, WCT time 50.15
Info:Linear Algebra: Mksol: WCT time 1244.45
Info:Quadratic Characters: Total cpu/real time for characters: 46.15/12.6496
Info:Square Root: Total cpu/real time for sqrt: 2378.22/323.447
Info:HTTP server: Shutting down HTTP server
Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 444422/9602.2
Info:root: Cleaning up computation data in /tmp/cado.0z5tf6z_
1638528746893449983213861346839041825501626828319370583078901 317138241249483753195631898143827752284973893404091800009687302134027717
EdH is offline   Reply With Quote
Old 2018-08-17, 06:20   #303
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

32·479 Posts
Default

Thanks, Ed! This gives us a baseline for future tests on your farm, as well as gives me more info about how different systems produce quite different balances of matrix time vs sieve time.
I have a couple of quite fast results at c140 and c145; I'll get those two files posted next week. I have little data in 125 and 135 regions yet, and no factorization results fast enough to fit the trendline of other best results from 100 to 150 digits.
VBCurtis is offline   Reply With Quote
Old 2018-10-03, 03:43   #304
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

6C916 Posts
Default

Is there a way to force re-running of factorization with CADO-NFS at a particular step? My HD filled up when I wasn't paying attention and CADO couldn't save files related to the mksol step. Thus, it errored out at the end of that step and won't restart mksol since it "completed", but I'm not sure if I can edit something to cause just the mksol (and subsequent) steps to be run.
wombatman is offline   Reply With Quote
Old 2018-10-03, 05:50   #305
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

103278 Posts
Default

I got away for a while from refining parameters for small tasks, but I did complete a GNFS-180 (13*2^906-1) using CADO for sieving!
I used msieve for poly select. Poly score was 9.891e-14, a record for C180.
CADO params attached; I did a little test-sieving with GGNFS, as well as trying two or three sets of parameters for a day each on CADO. I ended up using lim's of 60M and 100M, 32/33LP, I=15, and 64/95 for MFB. ncurves was set to 17 on the 2LP side, 12 on 3LP side.
Sieving Q=10M to 87M yielded just over 620M relations; alas, the host machine ran out of disk space while filtering, so I copied all the relations to a single file on another machine and set msieve to work. Density 96 allowed a matrix 23.5M in size; I ran out of patience and disk to sieve enough for my preferred density around 110.
Roughly 50 cores were used for 5 weeks for sieving; nearly my entire farm.
In hindsight, 32/33 is one LP too big. Default CADO uses 31/32 and I=14; I should have bumped one of these one step, but not both. I had initial yield around 12, and average yield of 8.0-8.1; even for me, that's a bit high! If I were to try another this size with CADO, I would choose 31/32LP and I=15. My next job will be GNFS-186, for which I plan to bump the lim's to 80M/120M, but leave the other parameters alone.
Yield from CADO was substantially better than GGNFS; alas, my GGNFS test sieve notes were lost during the factorization and I'm too lazy to repeat the tests.
Attached Files
File Type: txt params.c180.txt (2.0 KB, 29 views)
VBCurtis is offline   Reply With Quote
Old 2018-10-03, 18:44   #306
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

1101000100012 Posts
Default

Quote:
Originally Posted by wombatman View Post
Is there a way to force re-running of factorization with CADO-NFS at a particular step? My HD filled up when I wasn't paying attention and CADO couldn't save files related to the mksol step. Thus, it errored out at the end of that step and won't restart mksol since it "completed", but I'm not sure if I can edit something to cause just the mksol (and subsequent) steps to be run.
If it's helpful, I'm currently running a hybrid CADO-NFS/msieve procedure for my factoring. All the LA is done by msieve.

I use the following in a bash script with the composite ($1) in the command line. I force the temporary directory to be /tmp/hybrid so I know where it is:
Code:
#!/bin/bash/

cd Math/cado-nfs
./cado-nfs.py $1 tasks.workdir=/tmp/hybrid tasks.filter.run=false
echo "Finished cado-nfs!"
cd /tmp/hybrid
cat c*.upload/*.gz >comp.dat.gz
cat *.poly >comp.polyT
mv comp.polyT comp.poly
echo "n: $1" >comp.n
echo "N $1" >comp.fb
~/Math/cado-nfs/poly2fb
~/Math/msieve/msieve -i comp.n -s comp.dat.gz -l compmsieve.log -nf comp.fb -t 8 -nc
cat compmsieve.log | grep " factor: "
cat compmsieve.log | grep " factor: " > ~/FactorList
The last line is in case /tmp/hybrid gets removed. Of note, I have run into requests for more relations on occasion.

At some point, I'll probably add the procedure to my "How I ..." pages, but I haven't because of not solving several parameter issues yet.
EdH is offline   Reply With Quote
Old 2018-10-04, 02:42   #307
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

110110010012 Posts
Default

Thanks. If I can figure out anything else, I'll take a shot with that.
wombatman is offline   Reply With Quote
Old 2018-10-05, 02:19   #308
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

32·193 Posts
Default

I was able to get the linear algebra to re-run by deleting the bwc folder under the /tmp/ work directory. Just posting this in case someone else runs into the same issue (or I do again...)
wombatman is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
CADO-NFS on windows jux CADO-NFS 22 2019-11-12 12:08
CADO help henryzz CADO-NFS 4 2017-11-20 15:14
CADO and WinBlows akruppa Programming 22 2015-12-31 08:37
CADO-NFS skan Information & Answers 1 2013-10-22 07:00
CADO R.D. Silverman Factoring 4 2008-11-06 12:35

All times are UTC. The time now is 06:56.

Mon Sep 21 06:56:34 UTC 2020 up 11 days, 4:07, 0 users, load averages: 1.28, 1.51, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.