![]() |
I'm playing with cado-nfs again and I have some data for a c156 run below. My farm is only about half its size from before and many of those are throttling due to heat.
I wouldn't mind having the optimum params for the sizes I don't yet. I'm probably going to be playing with my hybrid setup (cado-nfs/msieve) more so than completing with cado-nfs. But, if you have a particular request, let me know and I'll run a pure cado-nfs session. This c156 used your c155 enhanced params: [code] Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 150312 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 151793/47.410/56.749/62.720/1.079 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 151793/46.330/50.825/57.370/1.514 Info:Polynomial Selection (size optimized): Total time: 295917 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 14779.5 Info:Polynomial Selection (root optimized): Rootsieve time: 14778.1 Info:Generate Factor Base: Total cpu/real time for makefb: 37.33/8.10258 Info:Generate Free Relations: Total cpu/real time for freerel: 2466.57/322.135 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 139321222 Info:Lattice Sieving: Average J: 8229.15 for 1556330 special-q, max bucket fill: 0.629541 Info:Lattice Sieving: Total CPU time: 5.94073e+06s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 756.79/462.649 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 462.3s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 2275.86/656.124 Info:Filtering - Singleton removal: Total cpu/real time for purge: 884.42/348.291 Info:Filtering - Merging: Total cpu/real time for merge: 1805.72/1607.63 Info:Filtering - Merging: Total cpu/real time for replay: 200.41/181.724 Info:Linear Algebra: Total cpu/real time for bwc: 822208/0.000218153 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 68695.93 Info:Linear Algebra: Lingen CPU time 1900.97, WCT time 291.16 Info:Linear Algebra: Mksol: WCT time 37246.96 Info:Quadratic Characters: Total cpu/real time for characters: 247.36/69.3931 Info:Square Root: Total cpu/real time for sqrt: 13184.9/1945.28 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 7.09549e+06/172154 Info:root: Cleaning up computation data in /tmp/cado.s0wg3y09 50991922118453733800893403222829283966167525130934549348631327443632984128217423175047 17463527688603612045476003591361612022388344033034895907221918699698189 [/code] |
Here's a c123 run with default params:
[code] Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 21528.1 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 21720/36.270/43.664/48.210/0.888 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 21720/35.540/39.371/44.430/1.211 Info:Polynomial Selection (size optimized): Total time: 2285.24 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 1865.24 Info:Polynomial Selection (root optimized): Rootsieve time: 1863.87 Info:Generate Factor Base: Total cpu/real time for makefb: 10.62/2.3481 Info:Generate Free Relations: Total cpu/real time for freerel: 156.16/20.4439 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 13059349 Info:Lattice Sieving: Average J: 3800.88 for 178693 special-q, max bucket fill: 0.772261 Info:Lattice Sieving: Total CPU time: 149258s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 34.46/59.7937 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 59.7s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 153.42/49.1674 Info:Filtering - Singleton removal: Total cpu/real time for purge: 138.1/36.692 Info:Filtering - Merging: Total cpu/real time for merge: 203.37/175.179 Info:Filtering - Merging: Total cpu/real time for replay: 20.59/17.1169 Info:Linear Algebra: Total cpu/real time for bwc: 11504.3/0.000172377 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 899.45 Info:Linear Algebra: Lingen CPU time 175.19, WCT time 27.82 Info:Linear Algebra: Mksol: WCT time 515.74 Info:Quadratic Characters: Total cpu/real time for characters: 23.33/6.71058 Info:Square Root: Total cpu/real time for sqrt: 995.39/135.9 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 166648/4008.26 Info:root: Cleaning up computation data in /tmp/cado.8n8o3zi4 103994263537692083322430805948369677781344426601569708601629807617921045503073013 4877573650317223159406644589665201167150419 [/code] |
3 Attachment(s)
Attached are my current params choices for c115, c120, and c130. I do not plan to tweak c115 any further, but I'm still doing a little tinkering with the bigger files.
I've only done 3 jobs at c125 level, so I don't have a new file for that just yet; I'll get to work this week on that size. |
Thanks Curtis,
Now I have to go back and research something, though. My latest run, a c151, which is using your enhanced params.c150, has appeared to have gotten stuck: [code] Info:Polynomial Selection (size optimized): Marking workunit c150_polyselect1_184500-185000 as ok (99.2% => ETA Mon Aug 13 21:37:53 2018) [/code]There is no work being assigned, all clients are in waiting and it doesn't look like the host is doing anything anymore, either. I'll have to see if I can retrieve the restart info and attempt that. I have to go look, though - didn't I have this trouble before? Ed |
Ed-
You did, and I've been putting admin = xxxx settings into c125 and up files since you had that trouble, since it seems that very small starting coeffs sometimes get stuck. Try adding "admin = 8400" directly above the "admax = " line in the file. Since you use a ton of clients, you may choose to reduce adrange to 840 or 420, so that smaller units of work are distributed. It'll keep more clients fed for more of the poly select phase. Just make sure you use a multiple of 420, since the poly select is 2-threaded and we only search multiples of 210. That reminds me, I forgot to put admin = 1260 into the c130 params file just above. |
[QUOTE=VBCurtis;493844]Ed-
You did, and I've been putting admin = xxxx settings into c125 and up files since you had that trouble, since it seems that very small starting coeffs sometimes get stuck. Try adding "admin = 8400" directly above the "admax = " line in the file. Since you use a ton of clients, you may choose to reduce adrange to 840 or 420, so that smaller units of work are distributed. It'll keep more clients fed for more of the poly select phase. Just make sure you use a multiple of 420, since the poly select is 2-threaded and we only search multiples of 210. That reminds me, I forgot to put admin = 1260 into the c130 params file just above.[/QUOTE] Thanks! I was just going to go search that out. admin was 1e3 and adrange was 5e2. I changed them to 8400 and 420, respectively and restarted from scratch. I think I'm running 54 clients now, but for some reason, I keep losing some. Since they're in bash generated gnome-terminals, I never catch any crash info. They're just closed and gone. |
My current GNFS-180 CADO run is using roughly 15 clients (30 cores), and also drops client tasks randomly. The ones that drop are on a LAN with the server, and on a faster machine than the other clients. It doesn't make much sense, but I just restart the ones that drop. I don't have the CADO server set up to auto-issue clients to other machines.
If you recall what text the server writes when a file doesn't come back in time, you could grep for that text in the log and see what ad-value causes the hangup. If it's in the middle of the poly search range, you might choose to increase the time allowed before giving up on an issued task. I think it's tasks.wu.timeout, but I'm not certain. I believe the hang occurs because CADO only reissues a task twice before giving up, and the wrapper logic isn't smart enough to issue one more workunit above admax nor move to root opt when one range fails to come back. |
[QUOTE=EdH;493847]Thanks! I was just going to go search that out. admin was 1e3 and adrange was 5e2. I changed them to 8400 and 420, respectively and restarted from scratch.
I think I'm running 54 clients now, but for some reason, I keep losing some. Since they're in bash generated gnome-terminals, I never catch any crash info. They're just closed and gone.[/QUOTE] I believe that there are good ways of stopping them disappearing. The pause command is a possibility. I would imagine others may have better ideas. |
[QUOTE=VBCurtis;493849]My current GNFS-180 CADO run is using roughly 15 clients (30 cores), and also drops client tasks randomly. The ones that drop are on a LAN with the server, and on a faster machine than the other clients. It doesn't make much sense, but I just restart the ones that drop. I don't have the CADO server set up to auto-issue clients to other machines.[/QUOTE]
I start all mine separately, too. They are all LAN connected and run with local scripts that retask them when they are done sieving. I have scripts on each machine for all clients and a single one. [QUOTE=VBCurtis;493849]If you recall what text the server writes when a file doesn't come back in time, you could grep for that text in the log and see what ad-value causes the hangup. If it's in the middle of the poly search range, you might choose to increase the time allowed before giving up on an issued task. I think it's tasks.wu.timeout, but I'm not certain. I believe the hang occurs because CADO only reissues a task twice before giving up, and the wrapper logic isn't smart enough to issue one more workunit above admax nor move to root opt when one range fails to come back.[/QUOTE]This part I may have to study more... [QUOTE=henryzz;493861]I believe that there are good ways of stopping them disappearing. The pause command is a possibility. I would imagine others may have better ideas.[/QUOTE] I suppose I could add ";sleep 43200" to my gnome-terminal command, too, but I'd have to do that for every script on all the machines. Maybe the next time I do a machine-wide rewrite... |
Hey Curtis,
Here's a c151 run with your enhancements from before: [code] Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 167972 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 168252/45.550/54.833/60.700/1.149 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 168252/43.600/48.610/55.130/1.161 Info:Polynomial Selection (size optimized): Total time: 302064 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 8840.08 Info:Polynomial Selection (root optimized): Rootsieve time: 8838.9 Info:Generate Factor Base: Total cpu/real time for makefb: 32.13/6.7074 Info:Generate Free Relations: Total cpu/real time for freerel: 1239.29/161.525 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 76147350 Info:Lattice Sieving: Average J: 7754.82 for 1053607 special-q, max bucket fill: 0.61839 Info:Lattice Sieving: Total CPU time: 3.56919e+06s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 459.26/299.79 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 298.7s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 1633.11/704.758 Info:Filtering - Singleton removal: Total cpu/real time for purge: 1167.91/523.987 Info:Filtering - Merging: Total cpu/real time for merge: 1506.16/1320.63 Info:Filtering - Merging: Total cpu/real time for replay: 137.02/115.897 Info:Linear Algebra: Total cpu/real time for bwc: 401675/0.000190258 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 33366.92 Info:Linear Algebra: Lingen CPU time 1324.14, WCT time 203.31 Info:Linear Algebra: Mksol: WCT time 18229.37 Info:Quadratic Characters: Total cpu/real time for characters: 169.38/47.1608 Info:Square Root: Total cpu/real time for sqrt: 9452.87/1305.95 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 4.29757e+06/107871 Info:root: Cleaning up computation data in /tmp/cado.rma48tdr 1142592283136731570545301662765774155859478622044326713 2907521994696204311372547245994693384494827511814130029530759988300338906185818977360167429826659 [/code]Ed |
Here's a c132 run using your c130 file from post 294. I added admin = 1260 and changed adrange to 840:
[code] Info:Square Root: Total cpu/real time for sqrt: 2378.22/323.447 Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 293781 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 297375/38.750/48.467/58.100/1.577 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 297375/37.450/42.329/49.080/0.895 Info:Polynomial Selection (size optimized): Total time: 77673.3 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 1253.51 Info:Polynomial Selection (root optimized): Rootsieve time: 1252.44 Info:Generate Factor Base: Total cpu/real time for makefb: 8.08/1.76903 Info:Generate Free Relations: Total cpu/real time for freerel: 619.78/81.2041 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 36903672 Info:Lattice Sieving: Average J: 3789.6 for 383367 special-q, max bucket fill: 0.739391 Info:Lattice Sieving: Total CPU time: 333999s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 98.16/316.572 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 316.3s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 458.01/214.492 Info:Filtering - Singleton removal: Total cpu/real time for purge: 184.18/63.6672 Info:Filtering - Merging: Total cpu/real time for merge: 242.93/211.991 Info:Filtering - Merging: Total cpu/real time for replay: 35.22/28.8648 Info:Linear Algebra: Total cpu/real time for bwc: 27425.2/0.000190735 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 2275.83 Info:Linear Algebra: Lingen CPU time 321.19, WCT time 50.15 Info:Linear Algebra: Mksol: WCT time 1244.45 Info:Quadratic Characters: Total cpu/real time for characters: 46.15/12.6496 Info:Square Root: Total cpu/real time for sqrt: 2378.22/323.447 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 444422/9602.2 Info:root: Cleaning up computation data in /tmp/cado.0z5tf6z_ 1638528746893449983213861346839041825501626828319370583078901 317138241249483753195631898143827752284973893404091800009687302134027717 [/code] |
Thanks, Ed! This gives us a baseline for future tests on your farm, as well as gives me more info about how different systems produce quite different balances of matrix time vs sieve time.
I have a couple of quite fast results at c140 and c145; I'll get those two files posted next week. I have little data in 125 and 135 regions yet, and no factorization results fast enough to fit the trendline of other best results from 100 to 150 digits. |
Is there a way to force re-running of factorization with CADO-NFS at a particular step? My HD filled up when I wasn't paying attention and CADO couldn't save files related to the mksol step. Thus, it errored out at the end of that step and won't restart mksol since it "completed", but I'm not sure if I can edit something to cause just the mksol (and subsequent) steps to be run.
|
1 Attachment(s)
I got away for a while from refining parameters for small tasks, but I did complete a GNFS-180 (13*2^906-1) using CADO for sieving!
I used msieve for poly select. Poly score was 9.891e-14, a record for C180. CADO params attached; I did a little test-sieving with GGNFS, as well as trying two or three sets of parameters for a day each on CADO. I ended up using lim's of 60M and 100M, 32/33LP, I=15, and 64/95 for MFB. ncurves was set to 17 on the 2LP side, 12 on 3LP side. Sieving Q=10M to 87M yielded just over 620M relations; alas, the host machine ran out of disk space while filtering, so I copied all the relations to a single file on another machine and set msieve to work. Density 96 allowed a matrix 23.5M in size; I ran out of patience and disk to sieve enough for my preferred density around 110. Roughly 50 cores were used for 5 weeks for sieving; nearly my entire farm. In hindsight, 32/33 is one LP too big. Default CADO uses 31/32 and I=14; I should have bumped one of these one step, but not both. I had initial yield around 12, and average yield of 8.0-8.1; even for me, that's a bit high! If I were to try another this size with CADO, I would choose 31/32LP and I=15. My next job will be GNFS-186, for which I plan to bump the lim's to 80M/120M, but leave the other parameters alone. Yield from CADO was substantially better than GGNFS; alas, my GGNFS test sieve notes were lost during the factorization and I'm too lazy to repeat the tests. |
[QUOTE=wombatman;497263]Is there a way to force re-running of factorization with CADO-NFS at a particular step? My HD filled up when I wasn't paying attention and CADO couldn't save files related to the mksol step. Thus, it errored out at the end of that step and won't restart mksol since it "completed", but I'm not sure if I can edit something to cause just the mksol (and subsequent) steps to be run.[/QUOTE]
If it's helpful, I'm currently running a hybrid CADO-NFS/msieve procedure for my factoring. All the LA is done by msieve. I use the following in a bash script with the composite ($1) in the command line. I force the temporary directory to be /tmp/hybrid so I know where it is: [code] #!/bin/bash/ cd Math/cado-nfs ./cado-nfs.py $1 tasks.workdir=/tmp/hybrid tasks.filter.run=false echo "Finished cado-nfs!" cd /tmp/hybrid cat c*.upload/*.gz >comp.dat.gz cat *.poly >comp.polyT mv comp.polyT comp.poly echo "n: $1" >comp.n echo "N $1" >comp.fb ~/Math/cado-nfs/poly2fb ~/Math/msieve/msieve -i comp.n -s comp.dat.gz -l compmsieve.log -nf comp.fb -t 8 -nc cat compmsieve.log | grep " factor: " cat compmsieve.log | grep " factor: " > ~/FactorList [/code]The last line is in case /tmp/hybrid gets removed. Of note, I have run into requests for more relations on occasion. At some point, I'll probably add the procedure to my "How I ..." pages, but I haven't because of not solving several parameter issues yet. |
Thanks. If I can figure out anything else, I'll take a shot with that.
|
I was able to get the linear algebra to re-run by deleting the bwc folder under the /tmp/ work directory. Just posting this in case someone else runs into the same issue (or I do again...)
|
[QUOTE=wombatman;497391]I was able to get the linear algebra to re-run by deleting the bwc folder under the /tmp/ work directory. Just posting this in case someone else runs into the same issue (or I do again...)[/QUOTE]
That sounds good. I wondered if there may be a way, but I don't know enough about CADO-NFS to know where to look. I remember deleting some files to rerun some msieve steps in a similar fashion, but I'd have to research what files I did that with, too. |
I'm running a C146 using the default parameters.c145 using CADO-NFS (3.0-dev of a few months old)
So I can compare it with VBCurtis better parameters :) |
2 Attachment(s)
I did do two C116s with different parameters, but the difference was negligible:
Aliquot 2515098 index 518 C116 (default c115 parameters) [code]57260624163558114907105482591325041150144795085724341980289789123163352120442484657512017354806368372606893536015689[/code][code]Info:Square Root: Factors: 5366819040225876374810043852991007541729343594563691842154849129 10669378589882202242046713973010894938014632978556641 Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 21492.9 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 21788/33.700/41.357/46.600/1.112 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 21788/32.870/36.651/41.770/0.956 Info:Polynomial Selection (size optimized): Total time: 2839.91 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 501.88 Info:Polynomial Selection (root optimized): Rootsieve time: 500.25 Info:Generate Factor Base: Total cpu/real time for makefb: 9.41/0.595049 Info:Generate Free Relations: Total cpu/real time for freerel: 108.79/3.46431 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 7476517 Info:Lattice Sieving: Average J: 1911.27 for 175501 special-q, max bucket fill -bkmult 1.0,1s:1.156260 Info:Lattice Sieving: Total time: 39802.7s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 26.93/38.9862 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 37.9s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 206.85/149.014 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 125.0s Info:Filtering - Singleton removal: Total cpu/real time for purge: 177.66/156.63 Info:Filtering - Merging: Total cpu/real time for merge: 177.67/141.849 Info:Filtering - Merging: Total cpu/real time for replay: 22.74/18.5985 Info:Linear Algebra: Total cpu/real time for bwc: 4464.77/404.3 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 238.7, iteration CPU time 0.01, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (15000 iterations) Info:Linear Algebra: Lingen CPU time 189.98, WCT time 17.9 Info:Linear Algebra: Mksol: WCT time 127.46, iteration CPU time 0.01, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (7000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 32.85/5.83791 Info:Square Root: Total cpu/real time for sqrt: 550.98/79.8058 Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 102295/3488.36[/code]Aliquot 2324328 index 1754 C116 (parameters from [URL]https://mersenneforum.org/showpost.php?p=493816&postcount=294[/URL]) [code]16231073505513910375586118092559047695476874861018044665771298171360337986331251933825629231182569153705365622682051[/code][code]Info:Square Root: Factors: 24031256700679804831899451534455089680339988350903629141920263071775427401 675415094086809031967441080036510893844651 Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 12883.5 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 13313/32.880/41.399/50.410/1.390 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 13313/32.880/36.561/42.320/1.095 Info:Polynomial Selection (size optimized): Total time: 2000.54 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 523.23 Info:Polynomial Selection (root optimized): Rootsieve time: 522.04 Info:Generate Factor Base: Total cpu/real time for makefb: 6.27/0.403468 Info:Generate Free Relations: Total cpu/real time for freerel: 216.27/6.85179 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 13327906 Info:Lattice Sieving: Average J: 1888.34 for 188639 special-q, max bucket fill -bkmult 1.0,1s:1.177090 Info:Lattice Sieving: Total time: 40384.9s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 47.52/70.735 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 70.19999999999999s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 289.07/166.718 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 140.10000000000002s Info:Filtering - Singleton removal: Total cpu/real time for purge: 191.33/155.735 Info:Filtering - Merging: Total cpu/real time for merge: 156.48/120.167 Info:Filtering - Merging: Total cpu/real time for replay: 22.89/18.0962 Info:Linear Algebra: Total cpu/real time for bwc: 4680.55/469.7 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 292.48, iteration CPU time 0.01, COMM 0.0, cpu-wait 0.01, comm-wait 0.0 (14000 iterations) Info:Linear Algebra: Lingen CPU time 184.26, WCT time 17.55 Info:Linear Algebra: Mksol: WCT time 138.62, iteration CPU time 0.01, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (7000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 35.84/6.89125 Info:Square Root: Total cpu/real time for sqrt: 660.36/92.6011 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 101763/3579.63[/code] |
Aliquot 2380332 index 1881 C120 (default parameters)
[code]199287076077883734257277351040454144441000030759204246737980837594631635996650954243527367115572838436022957492373596441 [/code][code]nfo:Square Root: Factors: 7881718165498387521633582355355082520893963326151214533539440818991 25284724966473365753246256372756737619902873207696951 Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 19893.3 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 20038/34.700/42.654/47.580/1.068 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 20038/33.860/37.787/43.120/0.946 Info:Polynomial Selection (size optimized): Total time: 4368.44 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 1693.06 Info:Polynomial Selection (root optimized): Rootsieve time: 1691.2 Info:Generate Factor Base: Total cpu/real time for makefb: 11.41/0.716381 Info:Generate Free Relations: Total cpu/real time for freerel: 215.96/6.88374 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 13210726 Info:Lattice Sieving: Average J: 1886.16 for 231501 special-q, max bucket fill -bkmult 1.0,1s:1.181410 Info:Lattice Sieving: Total time: 57872.1s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 46.94/79.2834 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 78.29999999999998s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 368.44/240.362 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 200.29999999999998s Info:Filtering - Singleton removal: Total cpu/real time for purge: 282.74/235.527 Info:Filtering - Merging: Total cpu/real time for merge: 280.69/221.189 Info:Filtering - Merging: Total cpu/real time for replay: 30.63/24.3347 Info:Linear Algebra: Total cpu/real time for bwc: 8233.92/769.52 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 469.89, iteration CPU time 0.02, COMM 0.0, cpu-wait 0.01, comm-wait 0.0 (19000 iterations) Info:Linear Algebra: Lingen CPU time 247.53, WCT time 24.12 Info:Linear Algebra: Mksol: WCT time 257.82, iteration CPU time 0.02, COMM 0.0, cpu-wait 0.01, comm-wait 0.0 (9500 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 48.31/9.27324 Info:Square Root: Total cpu/real time for sqrt: 892.67/125.86 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 147386/5296.77 [/code]Aliquot 2729472 index 1295 C120 parameters from [URL]https://mersenneforum.org/showpost.php?p=493816&postcount=294[/URL] [code]205139061732834871832777149640498376066462709627022246947242115024979684074456599994309638679576103310646656416412986993[/code][code]Info:Square Root: Factors: 551378319562220129379188795853993269123323508651567304673518464786424349 372047747353776779859748207675312690256707865957 Info:Square Root: Total cpu/real time for sqrt: 883.47/124.153 Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 20828.5 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 20997/35.120/43.125/51.980/1.501 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 20997/34.030/37.941/43.730/1.044 Info:Polynomial Selection (size optimized): Total time: 4206.01 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 1018.39 Info:Polynomial Selection (root optimized): Rootsieve time: 1017.02 Info:Generate Factor Base: Total cpu/real time for makefb: 11.48/0.695448 Info:Generate Free Relations: Total cpu/real time for freerel: 425.41/13.5391 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 17128933 Info:Lattice Sieving: Average J: 1906.83 for 200409 special-q, max bucket fill -bkmult 1.0,1s:1.224170 Info:Lattice Sieving: Total time: 53632.9s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 60.59/98.9291 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 98.8s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 301.1/103.734 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 95.39999999999999s Info:Filtering - Singleton removal: Total cpu/real time for purge: 124.38/42.177 Info:Filtering - Merging: Total cpu/real time for merge: 202.8/152.94 Info:Filtering - Merging: Total cpu/real time for replay: 28.72/22.8961 Info:Linear Algebra: Total cpu/real time for bwc: 8543.11/836.9 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 485.25, iteration CPU time 0.02, COMM 0.0, cpu-wait 0.01, comm-wait 0.0 (19000 iterations) Info:Linear Algebra: Lingen CPU time 232.35, WCT time 22.34 Info:Linear Algebra: Mksol: WCT time 303.19, iteration CPU time 0.02, COMM 0.0, cpu-wait 0.01, comm-wait 0.0 (10000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 45.39/7.23988 Info:Square Root: Total cpu/real time for sqrt: 883.47/124.153 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 139378/5141.93 [/code]So those do improve CPU and elapsed time. 147386/5296.77 [B]139378/5141.93[/B] |
I sent my params files for C120 and lower to CADO in early 2018; I believe they're included in the 3.0 git versions. The "new" params you list from the forum in August are the result of trying to eke out 5% more speed, e.g. from spending less time in poly select.
I think I didn't have a clear case that the new August params were obviously faster, so I didn't send them to the CADO folks. I do believe they're faster, but barely so. In your case, the second input was a factor of 3 smaller, so "should" have been ~5% faster even using the same params; yet, in this test, the second job was ~500 thread-sec faster (0.5%). Hrmmm Thanks for reporting your test! |
I get a 30% improvement (faster sieving and smaller matrix) compared to current git for a c100 with this patch:
[CODE]diff --git a/parameters/factor/params.c100 b/parameters/factor/params.c100 index e0f8b151c..e72a78e41 100644 --- a/parameters/factor/params.c100 +++ b/parameters/factor/params.c100 @@ -27,18 +27,23 @@ tasks.polyselect.ropteffort = 0.5 # Sieve ########################################################################### -tasks.lim0 = 919082 -tasks.lim1 = 1051872 -tasks.lpb0 = 24 -tasks.lpb1 = 25 -tasks.sieve.mfb0 = 49 -tasks.sieve.mfb1 = 50 +tasks.lim0 = 620000 +tasks.lim1 = 950000 +tasks.lpb0 = 26 +tasks.lpb1 = 26 +tasks.sieve.mfb0 = 52 +tasks.sieve.mfb1 = 52 tasks.sieve.ncurves0 = 11 tasks.sieve.ncurves1 = 16 -tasks.I = 11 +tasks.sieve.lambda0 = 1.775 +tasks.sieve.lambda1 = 1.775 +tasks.I = 12 + +#tasks.sieve.qrange = 10000 +tasks.sieve.qrange = 2000 +tasks.qmin = 300000 +tasks.sieve.rels_wanted = 3200000 -tasks.sieve.qrange = 10000 -tasks.qmin = 1051872 ########################################################################### # Filtering[/CODE] |
[QUOTE=Gimarel;502155]I get a 30% improvement (faster sieving and smaller matrix) compared to current git for a c100 with this patch:
[/QUOTE] RSA100 [B]default params[/B] Total cpu/elapsed time for entire factorization: 22154.3/645.17 [B]Your params[/B] Total cpu/elapsed time for entire factorization: 16324.5/624.355 :bow: |
1 Attachment(s)
Attached is a new params file for C105. (delete the .txt suffix for CADO to recognise)
I've discovered that adding ~5% to target relations wanted yields a 30-50% reduction in matrix-solve time. Second, starting at very low Q produces lots of duplicate relations, but still sieves faster than a more normal starting Q value. I've moved Qmin from lim0 / 4 to lim0 / 16. I appreciate any comparison runs to CADO-git 3.0 or msieve/factmsieve.py. |
[QUOTE=VBCurtis;504740]Attached is a new params file for C105. (delete the .txt suffix for CADO to recognise)
I've discovered that adding ~5% to target relations wanted yields a 30-50% reduction in matrix-solve time. Second, starting at very low Q produces lots of duplicate relations, but still sieves faster than a more normal starting Q value. I've moved Qmin from lim0 / 4 to lim0 / 16. I appreciate any comparison runs to CADO-git 3.0 or msieve/factmsieve.py.[/QUOTE] default c105 params [code]109632612184518587513828778388808335173534513011647049031182822755389306787273092842624843623516386466817[/code] [code]Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 9952.64 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 9418/30.140/36.778/42.590/1.071 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 9418/29.410/32.699/37.260/0.964 Info:Polynomial Selection (size optimized): Total time: 808.92 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 235.3 Info:Polynomial Selection (root optimized): Rootsieve time: 234.53 Info:Generate Factor Base: Total cpu/real time for makefb: 3.32/0.233966 Info:Generate Free Relations: Total cpu/real time for freerel: 107.57/3.43082 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 5314822 Info:Lattice Sieving: Average J: 1901.73 for 48049 special-q, max bucket fill -bkmult 1.0,1s:1.191890 Info:Lattice Sieving: Total time: 9941.66s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 18.54/26.7606 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 26.4s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 85.03/49.4793 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 43.599999999999994s Info:Filtering - Singleton removal: Total cpu/real time for purge: 61.84/31.2877 Info:Filtering - Merging: Total cpu/real time for merge: 61.04/42.6015 Info:Filtering - Merging: Total cpu/real time for replay: 8.2/6.48986 Info:Linear Algebra: Total cpu/real time for bwc: 743.78/87.26 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 50.75, iteration CPU time 0, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (6000 iterations) Info:Linear Algebra: Lingen CPU time 59.71, WCT time 5.66 Info:Linear Algebra: Mksol: WCT time 12.55, iteration CPU time 0, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (3000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 11.27/1.93752 Info:Square Root: Total cpu/real time for sqrt: 183.65/27.1619 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 29423.9/1054.2 20134157657651598100858476591180697221596297298877 5445105479386907888285946208202118057060224985751395221 [/code][B]29,424 CPUsec / 1,054 WCT[/B] new c105 params [code]107774169621361997338509361816744634033698066592448031046848063179170577441576972428346870619448060739421[/code][code]Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 4473.5 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 4331/30.060/37.058/44.280/1.429 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 4331/29.800/32.734/37.690/1.062 Info:Polynomial Selection (size optimized): Total time: 413.75 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 156.39 Info:Polynomial Selection (root optimized): Rootsieve time: 155.82 Info:Generate Factor Base: Total cpu/real time for makefb: 2.9/0.21063 Info:Generate Free Relations: Total cpu/real time for freerel: 107.34/3.42818 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 5795540 Info:Lattice Sieving: Average J: 1920.26 for 48157 special-q, max bucket fill -bkmult 1.0,1s:1.252780 Info:Lattice Sieving: Total time: 9458.12s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 19.41/27.519 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 27.2s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 88/33.5168 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 29.499999999999996s Info:Filtering - Singleton removal: Total cpu/real time for purge: 40.61/18.0718 Info:Filtering - Merging: Total cpu/real time for merge: 92.75/64.2703 Info:Filtering - Merging: Total cpu/real time for replay: 11.11/8.90364 Info:Linear Algebra: Total cpu/real time for bwc: 995.79/81.67 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 36.87, iteration CPU time 0, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (7000 iterations) Info:Linear Algebra: Lingen CPU time 82.87, WCT time 7.8 Info:Linear Algebra: Mksol: WCT time 27.01, iteration CPU time 0.01, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (4000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 17.61/3.01295 Info:Square Root: Total cpu/real time for sqrt: 255.01/37.0997 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 30104.3/951.544 86095283758496432104324268439192937 1251801084989468478426112275293353943851931081438584113030607271192533 [/code][B]30,104 CPUsec / 952 WCT[/B] The machine (dual Xeon E5-2650) has quite a bit of variance on these small composites, so I might need to do a couple of runs and take the average to make a fair comparison. |
another fast c105
[code]106940938640385179198475163991847262203326129306969135559785399736892912049980963873983444092667843094963[/code]for this one I used your polyselect params, but slightly different sieve params: [code]tasks.lim0 = 750000 tasks.lim1 = 1200000 tasks.lpb0 = 26 tasks.lpb1 = 26 tasks.sieve.mfb0 = 52 tasks.sieve.mfb1 = 52 tasks.sieve.ncurves0 = 11 tasks.sieve.ncurves1 = 16 tasks.I = 12 tasks.sieve.qrange = 5000 tasks.sieve.qmin = 100000 tasks.sieve.rels_wanted = 5700000 ########################################################################### # Filtering ########################################################################### tasks.filter.purge.keep = 170 tasks.filter.maxlevel = 20 tasks.filter.target_density = 155.0[/code][code]Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 4473.5 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 4246/29.740/37.012/44.730/1.470 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 4246/29.280/32.778/38.300/1.149 Info:Polynomial Selection (size optimized): Total time: 405.26 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 173.41 Info:Polynomial Selection (root optimized): Rootsieve time: 172.83 Info:Generate Factor Base: Total cpu/real time for makefb: 2.54/0.168372 Info:Generate Free Relations: Total cpu/real time for freerel: 108.61/3.46267 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 6718856 Info:Lattice Sieving: Average J: 1910.06 for 43933 special-q, max bucket fill -bkmult 1.0,1s:1.294890 Info:Lattice Sieving: Total time: 9501.16s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 23.24/32.0983 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 31.900000000000002s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 103.57/57.6821 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 51.3s Info:Filtering - Singleton removal: Total cpu/real time for purge: 56.14/39.6721 Info:Filtering - Merging: Total cpu/real time for merge: 73.47/50.1402 Info:Filtering - Merging: Total cpu/real time for replay: 8.51/6.87158 Info:Linear Algebra: Total cpu/real time for bwc: 716.98/59.17 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 29.15, iteration CPU time 0, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (6000 iterations) Info:Linear Algebra: Lingen CPU time 69.91, WCT time 6.98 Info:Linear Algebra: Mksol: WCT time 14.8, iteration CPU time 0, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (3000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 15.29/2.85822 Info:Square Root: Total cpu/real time for sqrt: 232.4/33.9271 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 28071.1/882.506 96419734812774740119994365852866859903514856721672957 1109118779968024502616433643203969184292927435816559 [/code][B]28,071 CPUsec / 883 WCT[/B] Could be an outlier, so we need moar data! |
Thanks; seems I'm only helping wall-clock time (nice, but not all that much help).
I'll do some more research. I've been tracking poly select time, sieve time, bwc time; but I have been using the sum of those three as a proxy for job length, ignoring filtering time and other little steps (e.g. free relations). |
C105 Parameter
1 Attachment(s)
Better sieving parameters for C105. I didn't change the poly selection part.
In my experience [C]tasks.lim1[/C] should be equal to the top end of the (expected) sieving range. If you want to oversieve a bit to reduce the matrix size, it's better to use the parameter [C]tasks.filter.required_excess[/C] than to specify [C]tasks.sieve.rels_wanted[/C]. In my example the [C]lambda0[/C] and [C]lambda1[/C] paramters are essential because otherwise the siever produces to much useless relations. These are not optimized but should be about right. The parameter [C]tasks.sieve.rels_wanted[/C] is needed in my example, because cado overestimates the neede relations by a factor 2. The parameters[C]tasks.lim0[/C] and [C]tasks.qmin[/C] are not optimised. With these parameters I get a sieving speedup of about 10-15% and a smaller matrix. |
[QUOTE=Gimarel;504908]Better sieving parameters for C105. I didn't change the poly selection part.
In my experience [C]tasks.lim1[/C] should be equal to the top end of the (expected) sieving range. If you want to oversieve a bit to reduce the matrix size, it's better to use the parameter [C]tasks.filter.required_excess[/C] than to specify [C]tasks.sieve.rels_wanted[/C]. In my example the [C]lambda0[/C] and [C]lambda1[/C] paramters are essential because otherwise the siever produces to much useless relations. These are not optimized but should be about right. The parameter [C]tasks.sieve.rels_wanted[/C] is needed in my example, because cado overestimates the needed relations by a factor 2. The parameters[C]tasks.lim0[/C] and [C]tasks.qmin[/C] are not optimised. With these parameters I get a sieving speedup of about 10-15% and a smaller matrix.[/QUOTE] We want more parameter files from you! c105 [code]107713203868901378890486921109668147250599518916591688453404410233186403423078799985908643904700547429021[/code] [code] Info:Square Root: Factors: 464884127923667954781021992089902060842382186081760172952730363 231699035090712841274341777335868170736967 Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 4473.5 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 4342/28.940/37.049/44.340/1.438 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 4342/28.940/32.749/37.280/1.096 Info:Polynomial Selection (size optimized): Total time: 366.14 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 195.9 Info:Polynomial Selection (root optimized): Rootsieve time: 195.31 Info:Generate Factor Base: Total cpu/real time for makefb: 2.06/0.151476 Info:Generate Free Relations: Total cpu/real time for freerel: 215.04/6.83992 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 6319648 Info:Lattice Sieving: Average J: 1918.22 for 63809 special-q, max bucket fill -bkmult 1.0,1s:1.301890 Info:Lattice Sieving: Total time: 8788.08s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 21.97/25.6841 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 25.5s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 111.77/40.0544 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 30.6s Info:Filtering - Singleton removal: Total cpu/real time for purge: 57.94/24.8971 Info:Filtering - Merging: Total cpu/real time for merge: 45.12/29.8022 Info:Filtering - Merging: Total cpu/real time for replay: 7.98/6.32849 Info:Linear Algebra: Total cpu/real time for bwc: 534.81/42.85 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 18.05, iteration CPU time 0, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (5000 iterations) Info:Linear Algebra: Lingen CPU time 57.86, WCT time 5.44 Info:Linear Algebra: Mksol: WCT time 12.33, iteration CPU time 0, COMM 0.0, cpu-wait 0.0, comm-wait 0.0 (3000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 13.61/2.23025 Info:Square Root: Total cpu/real time for sqrt: 150.89/22.1238 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 25453.4/787.263[/code][B]25453 CPUsec / 787 WCT[/B] |
[QUOTE=Gimarel;504908]Better sieving parameters for C105. I didn't change the poly selection part.
In my experience [C]tasks.lim1[/C] should be equal to the top end of the (expected) sieving range. If you want to oversieve a bit to reduce the matrix size, it's better to use the parameter [C]tasks.filter.required_excess[/C] than to specify [C]tasks.sieve.rels_wanted[/C]. In my example the [C]lambda0[/C] and [C]lambda1[/C] paramters are essential because otherwise the siever produces to much useless relations. These are not optimized but should be about right. The parameter [C]tasks.sieve.rels_wanted[/C] is needed in my example, because cado overestimates the neede relations by a factor 2. The parameters[C]tasks.lim0[/C] and [C]tasks.qmin[/C] are not optimised. With these parameters I get a sieving speedup of about 10-15% and a smaller matrix.[/QUOTE] Nice! I tested on a C103, and got a faster time than my previous best at C102. I used my own poly-select parameters (posted on my C105 file). If I understand Lambda correctly, 1.775 * 27 = 48, so you're using mfb0 and mfb1 48 for a 27LP job. Interesting! I confirmed this by setting those to 48 rather than 54, with almost no change in sieve time. I then changed qmin to 60k and rels_wanted to 6.5M (to correct for the massive duplicate-relations produced at small Q). This job filtered twice, as did Gimarel's parameters; sieve time, CPU time, and WCT were all 8+% better than Gimarel's settings. I run 30-threaded on a Xeon, so quite a lot of relations are found during the first filtering pass; other testers may find different timings using fewer threads. My results: Gimarel's params: sieve time 4890, CPU time 17700, WCT 541. Setting Q=60k: sieve time 4220, CPU time 16460, WCT 502 I did many other tests, such as lambda = 1.75 or 1.8, Q-min 30k, 40k, 80k; none faster than 502 WCT but lots around 520. It is clear that CPU time has some calculation flaw, as WCT * threads > CPU time. The machine is a dual 10-core, running 10-threaded msieve LA; I use 30 threads for tasks, 20 threads for server tasks. I'll next try different lim's and LA settings. |
Curtis:
Regarding the c172 from last week: if I run more GNFS (though I don't plan to right this moment) I will commit to not fiddling with parameters during its run as I did here: particularly I changed the target matrix density to 110 from 170 (I was getting a lot of duplicates and not many relations per workunit) and I tried adjusting qmin down (from 19600000 to 9800000) towards the end, though I don't think that had any effect at all---it certainly did not start sieving below what it had already done. Lattice Sieving: Total time: 1.49137e+07s Linear Algebra: Total cpu/real time for bwc: 4.80653e+06/653502 Complete Factorization: Total cpu/elapsed time for entire factorization: 2.65919e+07/1.17675e+06 Filtering - Merging: Merged matrix has 15059957 rows and total weight 1656595352 (110.0 entries per row on average) Like before, the timing information is a bit of a mess. To extract timing information from the log: $ egrep -i 'total.+time' 37771_279.log | grep -viw debug | sed -e's/^.\+Info://' | uniq -c | less A bit of a hack but it lets me see that the sieving time only increases as I stop and start the process. Apologies if this is less than useful. I need to be a little more methodical about how I approach this. |
Thanks for the data! Also thanks for the unix protip to extract the info.
Your changes don't distort the data much, if at all; changing q-min after the run begins won't alter sieve behavior, as you discovered. Changing matrix density alters post-processing, but does nothing to the sieve process. |
Ubuntu 18.04 and CADO-NFS
I've upgraded some of my machines that were running CADO-NFS from Ubuntu 16.04 to Ubuntu 18.04. Now they won't run the previous CADO-NFS and trying to make from scratch also fails. Any thoughts? Is there something simple I'm missing?
[code] [ 44%] Building C object sieve/strategies/CMakeFiles/benchfm.dir/utils_st/tab_strategy.c.o Linking CXX executable benchfm /usr/bin/ld: CMakeFiles/benchfm.dir/utils_st/tab_point.c.o: relocation R_X86_64_32 against `.rodata' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: final link failed: Nonrepresentable section on output collect2: error: ld returned 1 exit status sieve/strategies/CMakeFiles/benchfm.dir/build.make:287: recipe for target 'sieve/strategies/benchfm' failed make[2]: *** [sieve/strategies/benchfm] Error 1 CMakeFiles/Makefile2:1454: recipe for target 'sieve/strategies/CMakeFiles/benchfm.dir/all' failed make[1]: *** [sieve/strategies/CMakeFiles/benchfm.dir/all] Error 2 Makefile:123: recipe for target 'all' failed make: *** [all] Error 2 Makefile:7: recipe for target 'all' failed make: *** [all] Error 2 [/code]I have recompiled GMP and GMP-ECM with no issues and YAFU still runs OK, but I haven't tried recompiling YAFU or any of my other packages, yet. |
Previous post update
Apparently, the upgrade to 18.04 on my machines resulted in Python being removed. Installing both Python and Python3 has seemed to fix the issue.
|
[QUOTE=EdH;511983]Apparently, the upgrade to 18.04 on my machines resulted in Python being removed. Installing both Python and Python3 has seemed to fix the issue.[/QUOTE]
Had the same experience a few weeks ago with 18.04. CADO would just crash despite the fact I had installed Python3. It all worked after I installed the missing Python package. EdH - can you add this tip to your excellent install guide for CADO? |
[QUOTE=swellman;512013]Had the same experience a few weeks ago with 18.04. CADO would just crash despite the fact I had installed Python3. It all worked after I installed the missing Python package.
EdH - can you add this tip to your excellent install guide for CADO?[/QUOTE] Thanks for the confirmation. I do see that I only need to reinstall Python, rather than Python3. I will add a note in a day or so. I didn't want to edit anything while the board was acting up. |
I have one machine running 18.04, and a few running 16.04. The one running 18.04 had a CADO install that won't play well with the installs on the other machines; when trying a distributed GNFS job, the clients on 16.04 (even with the newest CADO git) note that /download/las is different than the one on the server, download las from the server, and it crashes upon invocation because 18.04 has a newer GCC so the lib's don't match.
So, until/unless I upgrade my 16.04 machines and rebuild CADO, I can't run a server/client setup for a big job. Annoying, and I wish CADO wouldn't check the client las versus the server las. Unfortunately, the one running 18.04 is the only machine that all my others can connect to, sigh. |
I'm not sure if this helps, but I had a similar issue due to various hardware and have to use --binddir=build/<username>/ on my clients. My current server is 16.04 and some clients are 18.04.
|
EdH is right, --bindir is the flag you want.
I don't point it at my build/ directory, but at /pkg/cado/lib/cado-nfs-3.0.0 in an experimental containerized setup I've been working on, where something like this is required since downloading copies of binaries is ill-advised in this setup. I've also used it when doing ad-hoc clustering across machines with different CPU types. |
[QUOTE=EdH;512288]I'm not sure if this helps, but I had a similar issue due to various hardware and have to use --binddir=build/<username>/ on my clients. My current server is 16.04 and some clients are 18.04.[/QUOTE]
I tried this invocation, got "error: no such option: --binddir" : ./cado-nfs-client.py --server=http://{servername:port} --binddir=build/{install directory} This is on the 18.04 machine, running recent-GIT (say, two weeks ago) CADO. |
The flag is called --bindir. One "d", not two.
The build/ subdirectory is found in the unpacked source tree after compiling, or in the directory you built from if you followed the instructions in the "configure using cmake directly" section in the README file. In there is usually a directory named for the machine where CADO-NFS was compiled; ".mpi" is added if CADO-NFS was configured to use MPI. The other option for --bindir is the lib/cado-nfs-3.0.0 subdirectory under the installation root. Both of these locations have these subdirectories: [CODE]$ cd ~/cado-nfs/build/{hostname}.mpi; ls -d */ CMakeFiles/ filter/ linalg/ numbertheory/ scripts/ sqrt/ utils/ config/ gf2x/ misc/ polyselect/ sieve/ tests/ $ cd /mnt/pkg/cado/lib/cado-nfs-3.0.0/; ls -d */ filter/ misc/ polyselect/ sieve/ utils/ linalg/ numbertheory/ scripts/ sqrt/ [/CODE] I have not used the build dir as my setting for --bindir but I expect it would work. (My installation root is a bit odd because /mnt hosts a glusterfs filesystem and I install 3rd-party packages into subdirectories of /pkg to keep things separated. Theoretically I could keep the CADO source code on glusterfs but with my current setup it makes compiling prohibitively slow. These details aren't really important, though.) |
Darn! And, I'm always trying to be so precise...
Sorry about that misspelling.:sad: |
Minor Issue With Development Version
[code]
... Linking CXX executable test-flint [100%] Built target test-flint Scanning dependencies of target check_rels [100%] Building C object misc/CMakeFiles/check_rels.dir/check_rels.c.o Linking CXX executable check_rels [100%] Built target check_rels user@machine:~/Math/cado-nfs$ ./cado-nfs.py 90377629292003121684002147101760858109247336549001090677693 Traceback (most recent call last): File "./cado-nfs.py", line 43, in <module> import cadotask File "./scripts/cadofactor/cadotask.py", line 2960, in <module> patterns.Observer): File "./scripts/cadofactor/cadotask.py", line 3151, in SievingTask if tuple(sys.version_info)[0] < 3: NameError: name 'sys' is not defined [/code]This was reproduced on several machines. I added: [code] import sys [/code]to ./scripts/cadofactor/cadotask.py and all machines work now. Is there any anticipated trouble running the latest (developmental) version on the server and a mixture of dev and 2.3.0 on the clients? Are there better params files or modifications to make if there will be over a dozen multi-core machines running as clients? What info would be useful to gather in a compiled log? I'm looking at something like "egrep"ping some of the lines from the log files for all my cado-nfs runs. |
Ed-
I have improved params files for C90 through C125 completed; I've been putting off posting them until I have concrete a/b comparisons for timings, something I've been lazy about doing. I did finish C90 and C95 timing: On 1 thread, CADO-git from Feb factoring a C90 was 2236 seconds, while my file was 941 seconds and YAFU was 841 seconds. On 6 threads, YAFU did a c95 in 1510 seconds, CADO stock 1008 seconds, CADO with my params 625 seconds. For a farm like yours where sieving is manycore but postprocessing is single-machine, it makes sense to add 10% to the target number of relations. This adds 10% to sieving time, but reduces matrix time by 30-60%. On a single machine this is slightly a waste of time (at least on my params files, where I've tried to find the ideal # of relations such that the extra time spent sieving balances the time saved on matrix), but you'll save quite a lot in wall-clock time. I have draft files for C130 to C155, which I think are better than stock but not yet fully refined. Which files would you like? |
[QUOTE=VBCurtis;513146]Ed-
I have improved params files for C90 through C125 completed; I've been putting off posting them until I have concrete a/b comparisons for timings, something I've been lazy about doing. I did finish C90 and C95 timing: On 1 thread, CADO-git from Feb factoring a C90 was 2236 seconds, while my file was 941 seconds and YAFU was 841 seconds. On 6 threads, YAFU did a c95 in 1510 seconds, CADO stock 1008 seconds, CADO with my params 625 seconds. For a farm like yours where sieving is manycore but postprocessing is single-machine, it makes sense to add 10% to the target number of relations. This adds 10% to sieving time, but reduces matrix time by 30-60%. On a single machine this is slightly a waste of time (at least on my params files, where I've tried to find the ideal # of relations such that the extra time spent sieving balances the time saved on matrix), but you'll save quite a lot in wall-clock time. I have draft files for C130 to C155, which I think are better than stock but not yet fully refined. Which files would you like?[/QUOTE]I caught the new thread on params files. I'll get them from there later. I have adjusted my plans somewhat: I think I'll work on refining my data mining first. Then I'll work with the current params to get a larger sample. After that, I'll try the modified files, as they are and finally, try the adjustments you suggest for my "farm." A question arises: Although the ideal is to compare exact composite factorizations, is there value in comparing different composites of the same size? I ask this because my current interest is in scripts that take Aliquot sequences up to ~140 dd via ecmpi and CADO-NFS. If I aggregate the data for same sized, but different, composites, would that still be valuable? A last, and quite different subject: Since I have a working openmpi cluster and there is a section within the CADO-NFS documentation discussing mpi distributed LA, I am inclined to study this a bit. Is it implemented and working or just on its way to being available? |
Your last question is, to me, the most interesting. I believe MPI is working but not documented/supported by the cado-nfs.py script; I expect you'll have to invoke the filtering/bwc steps individually. That said, I haven't peeked inside the cado-nfs.py file itself to see if there are MPI flags afoot.
I collect data by job-size too, and believe that aggregating average job times for a given length is helpful for refining parameters. I use a combination of "these parameters produced the record-quickest job of this size" and "those parameters have a lower average time per job" to decide on fastest param choices; I've also learned after 100+ jobs that some inputs just get lucky polynomials and the params have little to do with the record-low job time. I conclude that it's important to record poly score for each job, as well as first digit (C130 with first digit 6 will clearly be slower than C130 with first digit 1, more than halfway to a C131). If you also gather multiple results for a specific input size, I think that will be very helpful. If refining parameters for C130-155 job sizes interests you, I'll be happy to share on the other thread my procedures and which specific things I've tried; my results may be more influenced by my hardware than I realize, so your data-gathering may alter some of the choices. |
[QUOTE=EdH;513201]Since I have a working openmpi cluster and there is a section within the CADO-NFS documentation discussing mpi distributed LA, I am inclined to study this a bit. Is it implemented and working or just on its way to being available?[/QUOTE]
Have a look at local.sh.example in the CADO folder. Seems MPI is ready-to-enable. |
Is there a way to run with python 2.7? I would like to test on a system on which I am unable to install new software. Alternatively, is it easy to install python3 somewhere in my home directory (where I am able to) and then have cado use that and not the system python?
[edit] Found a guide for installing python in a home directory, involving compiling from source (ugh). [url]https://codeghar.wordpress.com/2013/09/26/install-python-3-locally-under-home-directory-in-centos-6-4/[/url] |
Hey Curtis,
Thanks for all the info. I'll need to spend a bit of time with mpi after I get all the other things figured out. Thanks for pointing me to the example file I've compiled a bunch of data, a taste of which is below. Is there enough info in the listings? I can't seem to find any version info in the logs, nor can I figure a way to display params file unique ids. Would you be able to identify modified versus unmodified params files from any particular entry within the param file that was used. Would you even need that? I suppose I could add things like lpb0/1 and mfb0/1 values, etc., but the less needed, the better. Here's a sample of what I've compiled thus far: [code] N<138> = 107858998268122985412779892463164903278148452826296404480200645178455386381787979632919305998206531705192000227253731053070278282139941833 Polynomial Selection (root optimized): Finished, best polynomial from file /tmp/cadofactor/c140.upload/c140.polyselect2.4wl6t5p_.opt_60 has Murphy_E = 3.56e-07 Generate Factor Base: Total cpu/real time for makefb: 14.5/2.95031 Generate Free Relations: Total cpu/real time for freerel: 616.08/80.0004 Lattice Sieving: Total number of relations: 32432783 Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 89.32/212.046 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 429.15/154.403 Filtering - Singleton removal: Total cpu/real time for purge: 210.65/72.3947 Filtering - Merging: Total cpu/real time for merge: 768.14/667.166 Filtering - Merging: Total cpu/real time for replay: 59.26/48.687 Linear Algebra: Total cpu/real time for bwc: 35648.5/9138.31 Quadratic Characters: Total cpu/real time for characters: 70.24/20.2549 Square Root: Factors: 4549852818280784137048891955891900188206561071 23706041178905384340613948799770096216924619543119243174197081516303277264106367581589913223 Square Root: Total cpu/real time for sqrt: 3600.09/502.14 Generate Factor Base: Total cpu/real time for makefb: 14.5/2.95031 Generate Free Relations: Total cpu/real time for freerel: 616.08/80.0004 Lattice Sieving: Total number of relations: 32432783 Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 89.32/212.046 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 429.15/154.403 Filtering - Singleton removal: Total cpu/real time for purge: 210.65/72.3947 Filtering - Merging: Total cpu/real time for merge: 768.14/667.166 Filtering - Merging: Total cpu/real time for replay: 59.26/48.687 Linear Algebra: Total cpu/real time for bwc: 35648.5/9138.31 Quadratic Characters: Total cpu/real time for characters: 70.24/20.2549 Square Root: Total cpu/real time for sqrt: 3600.09/502.14 Complete Factorization: Total cpu/elapsed time for entire factorization: 709733/16633.3 ==================================================== N<133> = 1727651299324421928413446724594590880782759412370592923996274508812661722242606590744478625899708857621177484486201878763223620024233 Polynomial Selection (root optimized): Finished, best polynomial from file /tmp/cadofactor/c135.upload/c135.polyselect2.sy3j0pmg.opt_54 has Murphy_E = 1.56e-07 Generate Factor Base: Total cpu/real time for makefb: 18.49/4.08387 Generate Free Relations: Total cpu/real time for freerel: 308.64/40.4427 Lattice Sieving: Total number of relations: 16820659 Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 47.48/122.93 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 237.05/225.074 Filtering - Singleton removal: Total cpu/real time for purge: 125.5/52.3022 Filtering - Merging: Total cpu/real time for merge: 701.01/618.263 Filtering - Merging: Total cpu/real time for replay: 49.69/41.2704 Linear Algebra: Total cpu/real time for bwc: 25661.9/6836.89 Quadratic Characters: Total cpu/real time for characters: 49.05/16.6863 Square Root: Factors: 30694206313771024179893176877347248620777395297910299794744836039393412307041777132449849 56285908867085034103571361947375561950143217 Square Root: Total cpu/real time for sqrt: 2458.36/380.537 Generate Factor Base: Total cpu/real time for makefb: 18.49/4.08387 Generate Free Relations: Total cpu/real time for freerel: 308.64/40.4427 Lattice Sieving: Total number of relations: 16820659 Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 47.48/122.93 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 237.05/225.074 Filtering - Singleton removal: Total cpu/real time for purge: 125.5/52.3022 Filtering - Merging: Total cpu/real time for merge: 701.01/618.263 Filtering - Merging: Total cpu/real time for replay: 49.69/41.2704 Linear Algebra: Total cpu/real time for bwc: 25661.9/6836.89 Quadratic Characters: Total cpu/real time for characters: 49.05/16.6863 Square Root: Total cpu/real time for sqrt: 2458.36/380.537 Complete Factorization: Total cpu/elapsed time for entire factorization: 551879/12586.8 ==================================================== N<135> = 125844213559622587695829140394565697295294400320119878829054659162532607479389416797743340395107681033406552019669161119071940264791049 Polynomial Selection (root optimized): Finished, best polynomial from file /tmp/cadofactor/c135.upload/c135.polyselect2.xnurtmbv.opt_90 has Murphy_E = 1.26e-07 Generate Factor Base: Total cpu/real time for makefb: 18.98/4.3001 Generate Free Relations: Total cpu/real time for freerel: 308.8/44.8066 Lattice Sieving: Total number of relations: 16814708 Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 48.67/123.119 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 231.22/151.544 Filtering - Singleton removal: Total cpu/real time for purge: 103.84/40.3791 Filtering - Merging: Total cpu/real time for merge: 852.76/755.683 Filtering - Merging: Total cpu/real time for replay: 58.03/48.4404 Linear Algebra: Total cpu/real time for bwc: 34137.1/9071.89 Quadratic Characters: Total cpu/real time for characters: 56.94/19.1343 Square Root: Factors: 1112732091037743546494994765551139874188005873287498358284451704438039 113094800242760311335825733173241877904256425883496448249505690591 Square Root: Total cpu/real time for sqrt: 3070.26/472.152 Generate Factor Base: Total cpu/real time for makefb: 18.98/4.3001 Generate Free Relations: Total cpu/real time for freerel: 308.8/44.8066 Lattice Sieving: Total number of relations: 16814708 Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 48.67/123.119 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 231.22/151.544 Filtering - Singleton removal: Total cpu/real time for purge: 103.84/40.3791 Filtering - Merging: Total cpu/real time for merge: 852.76/755.683 Filtering - Merging: Total cpu/real time for replay: 58.03/48.4404 Linear Algebra: Total cpu/real time for bwc: 34137.1/9071.89 Quadratic Characters: Total cpu/real time for characters: 56.94/19.1343 Square Root: Total cpu/real time for sqrt: 3070.26/472.152 Complete Factorization: Total cpu/elapsed time for entire factorization: 680225/16017.4 [/code]This is from a fresh install of the dev version from a few days ago. |
Some Disconcerting ETAs, but then "All Is Well..."
[code]
Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 17:20:51 2019 [0.143 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 18:10:45 2019 [0.285 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 19:00:35 2019 [0.428 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 19:50:28 2019 [0.570 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 20:40:24 2019 [0.713 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 21:30:19 2019 [0.855 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 22:20:10 2019 [0.998 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 23:10:04 2019 [1.140 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Thu Apr 11 23:59:56 2019 [1.283 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 00:49:46 2019 [1.425 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 01:39:38 2019 [1.568 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 02:29:32 2019 [1.710 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 03:19:24 2019 [1.853 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 04:09:15 2019 [1.995 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 04:59:07 2019 [2.138 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 05:48:58 2019 [2.280 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 06:38:50 2019 [2.423 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 07:28:45 2019 [2.565 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 08:18:38 2019 [2.708 s/iter] Info:Linear Algebra: mksol: N=1000 ; ETA (N=21000): Fri Apr 12 09:08:29 2019 [2.850 s/iter] Info:Linear Algebra: mksol: N=860 ; ETA (N=21000): Fri Apr 12 12:40:41 2019 [3.456 s/iter] Info:Quadratic Characters: Starting Info:Square Root: Starting Info:Square Root: Creating file of (a,b) values Info:Square Root: finished Info:Square Root: Factors: 76011547790026822726326568236942510573210375997181897566667718127 635610967526542663809596695842346695879983625491514026964044776163 Info:Polynomial Selection (size optimized): Aggregate statistics: Info:Polynomial Selection (size optimized): potential collisions: 38210.8 Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 38840/38.730/46.683/50.720/0.852 Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 38840/37.740/42.075/47.510/1.204 Info:Polynomial Selection (size optimized): Total time: 7618.67 Info:Polynomial Selection (root optimized): Aggregate statistics: Info:Polynomial Selection (root optimized): Total time: 2994.82 Info:Polynomial Selection (root optimized): Rootsieve time: 2993.53 Info:Generate Factor Base: Total cpu/real time for makefb: 50.58/11.3882 Info:Generate Free Relations: Total cpu/real time for freerel: 311.62/45.3888 Info:Lattice Sieving: Aggregate statistics: Info:Lattice Sieving: Total number of relations: 27681208 Info:Lattice Sieving: Average J: 7707.97 for 87893 special-q, max bucket fill -bkmult 1.0,1s:1.113920 Info:Lattice Sieving: Total time: 181352s Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 73.17/198.683 Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics: Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 198.3s Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 424.32/391.023 Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics: Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2: 367.7s Info:Filtering - Singleton removal: Total cpu/real time for purge: 368.81/174.105 Info:Filtering - Merging: Total cpu/real time for merge: 879.66/789.756 Info:Filtering - Merging: Total cpu/real time for replay: 67.5/62.5504 Info:Linear Algebra: Total cpu/real time for bwc: 32922.6/8766.06 Info:Linear Algebra: Aggregate statistics: Info:Linear Algebra: Krylov: WCT time 5463.93, iteration CPU time 0.12, COMM 0.01, cpu-wait 0.0, comm-wait 0.0 (42000 iterations) Info:Linear Algebra: Lingen CPU time 360.93, WCT time 104.03 Info:Linear Algebra: Mksol: WCT time 2974.44, iteration CPU time 0.13, COMM 0.01, cpu-wait 0.0, comm-wait 0.0 (21000 iterations) Info:Quadratic Characters: Total cpu/real time for characters: 59.92/19.9351 Info:Square Root: Total cpu/real time for sqrt: 3066.22/474.952 Info:HTTP server: Shutting down HTTP server Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 418847/14039.8 76011547790026822726326568236942510573210375997181897566667718127 635610967526542663809596695842346695879983625491514026964044776163 nfsDone 100% 276 0.3KB/s 00:00 user@computer:~$ date Thu Apr 11 17:28:48 EDT 2019 [/code] |
All of my params files include a target number of relations, so I would be able to tell when your runs use one of my files because the number of relations found would be just over my target number set.
Also, my files use a substantially larger LP bound than default, so the number of relations is quite a bit larger than default settings. The only item I'd want that wasn't in your paste of the data dump is the size of the input number. For my own data-gathering, I record Q-range sieved (which isn't in the summary, but also isn't that important), poly select time (size and root, but total is enough), sieve time, bwc time, wall-clock time. Recently I've been also recording the total weight of the matrix (the number printed just before linear algebra starts), so that I could see how various density selections influence matrix size; again, not too important but if we start collaborating on C150+ settings it would be nice to have. |
I encoded the composite size in "<XXX>" right after the N in the first line:
[code] N<[B]138[/B]> = 107858998268122985412779892463164903278148452826296404480200645178455386381787979632919305998206531705192000227253731053070278282139941833 [/code]I put it there instead of at the end, so it would be more accessible. I'll work on adding your list items. Should I trim the rest, or leave everything. Actually, I'm wondering if you were referring to the correct post. I had two posts earlier. The first was the samples for a C138, C133 and C135. The second was to point out a discrepancy in the LA mksol ETAs. Do any of the interested parties watch this thread for this type of info, or should I try to post to the CADO-NFS project? |
I see the composite size now, that's an excellent way to encode it.
I don't know what is valuable for other people; the timings in the summary are all that's really needed to evaluate the params for a composite. |
I realize these have duplication, but I haven't figured out how to automatically remove duplicate lines, yet.
Anyway, Is all the info you need contained in the following three samples? C86, C129, C121: [code] N<86> = 56667759878435851136396148508097449099773621318662989162599675933873393187564668667973 Lattice Sieving: params = {'maxresubmit': 5, 'lim0': 393010, 'workdir': '/tmp/cadofactor', 'maxwuerror': 2, 'name': 'c85', 'rels_wanted': 0, 'wutimeout': 10800, 'lim1': 551399, 'qmin': 146453, 'sqside': 1, 'run': True, 'gzip': True, 'maxtimedout': 100, 'qrange': 1000, 'maxwu': 10, 'maxfailed': 100} Polynomial Selection (root optimized): Finished, best polynomial from file /tmp/cadofactor/c85.upload/c85.polyselect2.dq2cf04y.opt_12 has Murphy_E = 2.95e-06 Generate Factor Base: Total cpu/real time for makefb: 0.47/0.135556 Generate Free Relations: Total cpu/real time for freerel: 3.85/0.515574 Lattice Sieving: sieving.run(): Task state: {'wu_submitted': 0, 'wu_timedout': 0, 'start_achievement': -1, 'qnext': 146453, 'start_real_time': 0, 'rels_wanted': 440080, 'wu_received': 0, 'wu_failed': 0, 'rels_found': 0} Lattice Sieving: Total number of relations: 470449 Lattice Sieving: Total time: 518.71s Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 1.06/0.725687 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 3.44/1.06687 Filtering - Singleton removal: Total cpu/real time for purge: 0.6/0.373629 Lattice Sieving: sieving.run(): Task state: {'wu_submitted': 219, 'rels_wanted': 498934, 'start_achievement': 0.004069714597345937, 'rels_found': 470449, 'qnext': 365000, 'wu_failed': 0, 'stats_avg_J': '1002.231756149383 12237', 'wu_timedout': 0, 'stats_total_cpu_time': '912.3099999999998', 'start_real_time': 3763830410.102031, 'wu_received': 153, 'stats_total_time': '518.71', 'stats_max_bucket_fill': '1.0,1s:1.594150'} Lattice Sieving: sieving.run(): Task state: {'wu_submitted': 285, 'rels_wanted': 522895, 'start_achievement': 0.004069714597345937, 'rels_found': 504252, 'qnext': 431000, 'wu_failed': 0, 'stats_avg_J': '1001.8836890243903 13120', 'wu_timedout': 0, 'stats_total_cpu_time': '972.5799999999998', 'start_real_time': 3763830410.102031, 'wu_received': 164, 'stats_total_time': '551.3100000000002', 'stats_max_bucket_fill': '1.0,1s:1.594150'} Filtering - Merging: Merged matrix has 41245 rows and total weight 7011876 (170.0 entries per row on average) Square Root: Factors: 358031419880876938192954600148936974177 158275940969901931610142052723104530012975662949 Generate Factor Base: Total cpu/real time for makefb: 0.47/0.135556 Generate Free Relations: Total cpu/real time for freerel: 3.85/0.515574 Lattice Sieving: Total number of relations: 651829 Lattice Sieving: Total time: 722.52s Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 1.46/1.11501 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 5.82/2.44343 Filtering - Singleton removal: Total cpu/real time for purge: 5.91/2.61454 Filtering - Merging: Total cpu/real time for merge: 13.92/9.40176 Filtering - Merging: Total cpu/real time for replay: 1.34/1.11267 Linear Algebra: Total cpu/real time for bwc: 27.41/8.48 Quadratic Characters: Total cpu/real time for characters: 1.43/0.41662 Square Root: Total cpu/real time for sqrt: 31.34/4.62864 Complete Factorization: Total cpu/elapsed time for entire factorization: 3518.26/98.8682 ==================================================== N<129> = 148184180288647082302426937158351493009802210451497766956818199852902417872086615646923058242179971571945498904372691387599121333 Lattice Sieving: params = {'lim1': 44217255, 'qmin': 711270, 'maxresubmit': 5, 'maxwu': 10, 'rels_wanted': 0, 'run': True, 'maxwuerror': 2, 'qrange': 10000, 'wutimeout': 10800, 'sqside': 1, 'name': 'c130', 'workdir': '/tmp/cadofactor', 'gzip': True, 'lim0': 13124945, 'maxtimedout': 100, 'maxfailed': 100} Polynomial Selection (root optimized): Finished, best polynomial from file /tmp/cadofactor/c130.upload/c130.polyselect2.qxg0sxtu.opt_48 has Murphy_E = 5.93e-07 Generate Factor Base: Total cpu/real time for makefb: 50/10.6005 Generate Free Relations: Total cpu/real time for freerel: 311.55/40.6101 Lattice Sieving: sieving.run(): Task state: {'qnext': 711270, 'rels_found': 0, 'start_achievement': -1, 'wu_submitted': 0, 'wu_received': 0, 'wu_timedout': 0, 'rels_wanted': 22129742, 'start_real_time': 0, 'wu_failed': 0} Lattice Sieving: Total number of relations: 22630418 Lattice Sieving: Total time: 101935s Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 59.73/158.776 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 312.5/289.52 Filtering - Singleton removal: Total cpu/real time for purge: 64.6/62.8826 Lattice Sieving: sieving.run(): Task state: {'stats_avg_J': '7660.828321977037 47991', 'qnext': 2220000, 'wu_submitted': 151, 'stats_total_time': '101935.03999999998', 'start_real_time': 3763831072.010403, 'rels_wanted': 22856722, 'start_achievement': 0.013695957232578672, 'rels_found': 22630418, 'stats_max_bucket_fill': '1.0,1s:1.120020', 'wu_timedout': 0, 'stats_total_cpu_time': '195411.35000000003', 'wu_received': 67, 'wu_failed': 0} Filtering - Merging: Merged matrix has 1228821 rows and total weight 208900009 (170.0 entries per row on average) Square Root: Factors: 12674601840486459448841848682136209646192693900874822145430131644978522484248447 11691426851398408680908975753372972164838725377739 Generate Factor Base: Total cpu/real time for makefb: 50/10.6005 Generate Free Relations: Total cpu/real time for freerel: 311.55/40.6101 Lattice Sieving: Total number of relations: 25564242 Lattice Sieving: Total time: 120205s Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 67.39/179.278 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 396.05/371.704 Filtering - Singleton removal: Total cpu/real time for purge: 337.35/309.169 Filtering - Merging: Total cpu/real time for merge: 750.5/700.049 Filtering - Merging: Total cpu/real time for replay: 47.22/38.9662 Linear Algebra: Total cpu/real time for bwc: 24120.6/6204.46 Quadratic Characters: Total cpu/real time for characters: 56.22/15.795 Square Root: Total cpu/real time for sqrt: 3060.65/418.769 Complete Factorization: Total cpu/elapsed time for entire factorization: 304850/10683 ==================================================== N<121> = 8752900354798992135535713351999727922615660863269877993396052619685931654568649609816781542442396746201949471440217022437 Lattice Sieving: params = {'lim0': 3000000, 'gzip': True, 'maxwuerror': 2, 'maxtimedout': 100, 'run': True, 'qmin': 1000000, 'qrange': 10000, 'rels_wanted': 0, 'maxresubmit': 5, 'wutimeout': 10800, 'workdir': '/tmp/cadofactor', 'sqside': 1, 'maxfailed': 100, 'maxwu': 10, 'lim1': 5500000, 'name': 'c120'} Polynomial Selection (root optimized): Finished, best polynomial from file /tmp/cadofactor/c120.upload/c120.polyselect2.v5ylhooc.opt_90 has Murphy_E = 1.32e-06 Generate Factor Base: Total cpu/real time for makefb: 6.22/1.45258 Generate Free Relations: Total cpu/real time for freerel: 155.53/20.4213 Lattice Sieving: sieving.run(): Task state: {'start_real_time': 0, 'start_achievement': -1, 'wu_submitted': 0, 'rels_found': 0, 'wu_timedout': 0, 'wu_failed': 0, 'wu_received': 0, 'rels_wanted': 11474681, 'qnext': 1000000} Lattice Sieving: Total number of relations: 11499970 Lattice Sieving: Total time: 42836.9s Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 29.66/71.9494 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 125.83/53.849 Filtering - Singleton removal: Total cpu/real time for purge: 21.02/9.09563 Lattice Sieving: sieving.run(): Task state: {'start_achievement': 0.003286453017735308, 'wu_submitted': 453, 'stats_max_bucket_fill': '1.0,1s:1.360290', 'wu_received': 370, 'rels_wanted': 11964241, 'stats_total_time': '42836.850000000006', 'start_real_time': 3763877574.831329, 'stats_avg_J': '1913.329901275031 250190', 'qnext': 5530000, 'stats_total_cpu_time': '79603.62000000001', 'rels_found': 11499970, 'wu_timedout': 0, 'wu_failed': 0} Lattice Sieving: sieving.run(): Task state: {'start_achievement': 0.003286453017735308, 'wu_submitted': 464, 'stats_max_bucket_fill': '1.0,1s:1.360290', 'wu_received': 429, 'rels_wanted': 13184371, 'stats_total_time': '49254.79', 'start_real_time': 3763877574.831329, 'stats_avg_J': '1911.9272845857893 288247', 'qnext': 5640000, 'stats_total_cpu_time': '91392.13000000002', 'rels_found': 12923469, 'wu_timedout': 0, 'wu_failed': 0} Lattice Sieving: sieving.run(): Task state: {'start_achievement': 0.003286453017735308, 'wu_submitted': 475, 'stats_max_bucket_fill': '1.0,1s:1.360290', 'wu_received': 446, 'rels_wanted': 13482396, 'stats_total_time': '51120.21000000001', 'start_real_time': 3763877574.831329, 'stats_avg_J': '1911.4947376864536 299203', 'qnext': 5750000, 'stats_total_cpu_time': '94879.38000000002', 'rels_found': 13325503, 'wu_timedout': 0, 'wu_failed': 0} Lattice Sieving: sieving.run(): Task state: {'start_achievement': 0.003286453017735308, 'wu_submitted': 536, 'stats_max_bucket_fill': '1.0,1s:1.360290', 'wu_received': 453, 'rels_wanted': 13611673, 'stats_total_time': '51708.15000000001', 'start_real_time': 3763877574.831329, 'stats_avg_J': '1911.302068579334 303590', 'qnext': 6360000, 'stats_total_cpu_time': '95994.38000000002', 'rels_found': 13484536, 'wu_timedout': 0, 'wu_failed': 0} Filtering - Merging: Merged matrix has 742731 rows and total weight 74273389 (100.0 entries per row on average) Square Root: Factors: 63177231098564251400862698002146040730555401588791894828480702706907 138545172091245831321071555847351711191164740172937791 Generate Factor Base: Total cpu/real time for makefb: 6.22/1.45258 Generate Free Relations: Total cpu/real time for freerel: 155.53/20.4213 Lattice Sieving: Total number of relations: 13906880 Lattice Sieving: Total time: 53405.4s Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 35.98/78.1032 Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 226.18/131.068 Filtering - Singleton removal: Total cpu/real time for purge: 160.12/80.2486 Filtering - Merging: Total cpu/real time for merge: 169.38/142.37 Filtering - Merging: Total cpu/real time for replay: 19.23/14.9333 Linear Algebra: Total cpu/real time for bwc: 5288.67/1377.31 Quadratic Characters: Total cpu/real time for characters: 28.14/7.54837 Square Root: Total cpu/real time for sqrt: 1244.19/172.8 Complete Factorization: Total cpu/elapsed time for entire factorization: 141585/3000.22 [/code] |
Sure, looks like enough to make some observations.
On your C129, sieve time was 120ksec, while bwc was 24ksec; in wall clock time, bwc was 6200 of your 10600 sec job. On your C121, sieve time 53ksec, bwc 5300 sec; in wall clock time, bwc was 1300 of your 3000 sec job. So, the C121 was over-sieved enough to make bwc time 1/10th of sieve time, resulting in a substantial savings in wall clock time. The files I posted target bwc time to be 1/6th to 1/5th of sieving time; I think your C121 shows the way for your setup at bwc time = 1/10th of sieving time. Factoring a C120 in under an hour is pretty snazzy. |
1 Attachment(s)
[QUOTE=VBCurtis;513560]Sure, looks like enough to make some observations.
On your C129, sieve time was 120ksec, while bwc was 24ksec; in wall clock time, bwc was 6200 of your 10600 sec job. On your C121, sieve time 53ksec, bwc 5300 sec; in wall clock time, bwc was 1300 of your 3000 sec job. So, the C121 was over-sieved enough to make bwc time 1/10th of sieve time, resulting in a substantial savings in wall clock time. The files I posted target bwc time to be 1/6th to 1/5th of sieving time; I think your C121 shows the way for your setup at bwc time = 1/10th of sieving time. Factoring a C120 in under an hour is pretty snazzy.[/QUOTE]I also like the C138 in just over 4.5 hours, although I think my CADO/msieve hybrid would do better, which is what's driving my interest in your params and the mpi bwc. Anyway, here is a file of data for 87 runs ranging from 74 through 138 dd. It does include all previous samples. These are with unmodified params files from the original developmental version from a few days ago: |
I'm running a "last" test of new C120 and C130 files; if results fit the time-vs-size curve I expect, I'll have them posted in the morning for you to the params thread. C125 is still a little slower than the curve suggests should be possible, I still have some tweaking to try there to get a bit more speed.
|
The Last Four Unmodified Runs
1 Attachment(s)
I have installed all your modified params files from the other thread. These are the last four unmodified that ran. Of course, I realize that not all of the params files in my current set are modified, but I'm considering the next sets as Modified:
|
Of All the STUPID Things I've Done!!
I accidentally wiped out all my scripts that ran my Aliquot Sequences on my controlling machine - the ones that communicated with the db and ran the ecm and nfs scripts!
The nfs script itself was untouched and I've found an earlier backup for the main controlling script, which is quite fortunate, since I remember having to solve several issues and I have no recollection of how I did solve them. I would have had to research and relearn a bunch. Still, reconstructing all the other scripts may be time consuming, so it might be a while before I get any more data compiled and uploaded. [SIZE=1]BTW, in case you're interested, I used a recovery program to try to bring them back, but was only able to get the filenames, which unfortunately overwrote the original "deleted" files with zero-byte replacements.[/SIZE] |
Back Factoring...
1 Attachment(s)
I have the scripts pretty much rewritten and they are "mostly" working correctly.
Here is a file of Info for 91 composites factored using VBCurtis' params files, where available. (There may be duplicate runs for some of the numbers.) |
[QUOTE=EdH;514016]I have the scripts pretty much rewritten and they are "mostly" working correctly.
Here is a file of Info for 91 composites factored using VBCurtis' params files, where available. (There may be duplicate runs for some of the numbers.)[/QUOTE] I downloaded the CADO package last week, per your instructions. What should I do now with this file? |
[QUOTE=ET_;514028]I downloaded the CADO package last week, per your instructions.
What should I do now with this file?[/QUOTE] I was a bit confusing in my post. The scripts I referred to were the ones I'd lost earlier that I use for my Aliquot factoring via ecmpi and CADO-NFS across several machines. The file I posted was data from the CADO runs I did using VBCurtis' modified params files from the "[URL="https://www.mersenneforum.org/showthread.php?t=24274"]improved params...[/URL]" thread. They are mostly for VBCurtis to review, but anyone else can look them over for timing and poly/sieve info that can be used to adjust the params files for local use. Sorry for the confusion. |
[QUOTE=ET_;514028]I downloaded the CADO package last week, per your instructions.
What should I do now with this file?[/QUOTE] If you used git to download the package, just cd into the cado-nfs folder and "make". It will ask if you allow downloading cmake, accept. An hour or so later you'll have a working copy of CADO. To test it, while in the cado-nfs directory invoke: ./cado-nfs.py {input number} With stock parameters, CADO is a bit slower than YAFU; with my improved params files, CADO is somewhat faster than YAFU. I await tests like Ed's to measure just how much faster across a variety of hardware, but so far I estimate it's 20 to 30% faster than YAFU. For example, I used 3 Haswell 3.3ghz cores to factor a C139 in 38 hours with CADO. To use my params files, grab them from the "improved parameters" thread, and save to the folder /cado-nfs/parameters/factor. You can rename the original ones for safekeeping, or overwrite them. If you do so, please post a before & after time on the same composite or on two composites very close in size (say, a factor of 2 apart). It seems CADO performs quite differently by hardware type. |
[QUOTE=VBCurtis;514046]
To use my params files, grab them from the "improved parameters" thread, and save to the folder /cado-nfs/parameters/factor. You can rename the original ones for safekeeping, or overwrite them. If you do so, please post a before & after time on the same composite or on two composites very close in size (say, a factor of 2 apart). It seems CADO performs quite differently by hardware type.[/QUOTE] I will do both tests to improve my knowledge of the system. Just need a hint on the size of the composites (you said C139?) and where to choose them. I will gladly perform a test on numbers you want to run before and after the file substitution. |
Any composites you have an interest in will do; some folks have factored the same number twice, or even three times (Yafu, Cado stock, cado my files) to compare speeds.
If a direct comparison with the same input is your style, I suggest finding a composite between 100 and 120 digits; if you have none from your own interests, PM me and I can send an entry from an aliquot sequence. If you'd rather do productive work, find any two inputs of similar size (say, within a factor of two) for an a/b test, where "a" can be YAFU or stock CADO and "b" is improved CADO. I am currently developing files for C125, C135, C140; C95 to C120 (and C130) are presently posted. CADO rounds inputs to the nearest multiple-of-5-digits for parameter choice; we can write custom params for an individual number, but until 165+ digits that seems like unnecessary complication. RichD is currently testing my beta C140 file against stock-CADO on a pair of C138s, a size that takes between 1 and 1.5 days on a quad-core desktop; he already has a strong idea of what speed his rig manages with YAFU. A C120 takes a couple of hours, while C105 takes half an hour or so. As you can imagine, I've done much more testing on C110 and lower because the tests are so fast; I'm mostly using a best-fit curve of time vs difficulty to evaluate whether C125+ files are "as fast" as the files for smaller numbers. |
CADO for the sieving step
So, it's nice and easy to download and install (I was a little surprised that it carefully downloaded and installed its own copy of cmake), and not too difficult to give a polynomial.
The automatic script fell over very quickly after trying to run ten separate two-threaded sievers at 13GB memory usage each on my 96GB machine (though spent ten minutes producing a free-relations file before starting to sieve); it left behind a file project.wucmd which contained plausible-looking raw 'las' command lines. I am a little disconcerted that I can't find evidence of the out-of-memory-failure in /var/log or in any of the log files (I'm deducing the memory usage by running the line from project.cmd); the console output ends [code] Info:Lattice Sieving: Adding workunit L2253_sieving_230045000-230050000 to database [/code] and some process is clearly still running on that console even though no 'las' processes are running on the system. Running las with '-t 20' gives a process which uses 21GB and does seem to be running on twenty CPUs at least some of the time, and generating about three relations a second. |
My plan for the C207 team sieve is to set tasks.sieve.las.threads = 4, which will cause every client las (siever) process to run 4-threaded. This causes an occasional small error noting some bucket is full, but it allocates a bit more memory for said bucket (I believe the setting is bkmult) and continues with a suggestion that maybe I'm using too many threads for my choice of lim's. It's consistent, then, that your 20-threaded single process would trigger that error more frequently and require more RAM than my 4-threaded process @12-13GB does.
When running the deg 6 poly 4-threaded on a 20-core machine with 30+ other threads busy, I was also finding about 3 relations per second; top showed CPU use near 400%, so your timing confuses me a little. I've been trying the obvious flags on the command line for cado-nfs-client.py to set the number of threads on the client side, with no luck. |
The log at the end of the 20-threaded job was
[code] $ taskset -c 10-19,30-39 /scratch/cado-gnfs/cado-nfs-2.3.0/build/oak/sieve/las -I 16 -poly L2253.jon/L2253.poly -q0 231000000 -q1 231001000 -lim0 268000000 -lim1 268000000 -lpb0 33 -lpb1 33 -mfb0 99 -mfb1 99 -lambda0 3.1 -lambda1 3.2 -fb L2253.jon/L2253.roots.gz -out L2253.jon/231M-231M001 -t 20 -sqside 1 -stats-stderr # Average J=32760 for 68 special-q's, max bucket fill 0.796379 # Discarded 0 special-q's out of 68 pushed # Total cpu time 28131.83s [norm 7.14+23.2, sieving 8781.8 (7022.8 + 448.1 + 1310.9), factor 19319.7 (1995.1 + 17324.6)] # Total elapsed time 2051.09s, per special-q 30.163s, per relation 0.238249s # PeakMemusage (MB) = 25783 # Total 8609 reports [3.27s/r, 126.6r/sq] [/code] This is a *lot* slower than gnfs-lasieve4I16e, but looking at that command line I am using 3LP on the rational side which is usually very slow; rerunning with -mfb0 66 before contemplating changing the factor-large-part parameters. |
Why is number of threads per client not a client side setting?
|
[CODE]
I=16, 3lpa 2lpr, -t20 # Average J=32760 for 33 special-q's, max bucket fill 0.507688 # Discarded 0 special-q's out of 33 pushed # Total cpu time 5855.10s [norm 2.46+10.1, sieving 4878.9 (3963.3 + 224.9 + 690.7), factor 963.7 (532.0 + 431.7) # Total elapsed time 363.64s, per special-q 11.0193s, per relation 0.0980416s # PeakMemusage (MB) = 25783 # Total 3709 reports [1.58s/r, 112.4r/sq] I=15, 3lpa 2lpr, -t20 # Average J=16360 for 47 special-q's, max bucket fill 0.409821 # Discarded 0 special-q's out of 47 pushed # Total cpu time 1473.23s [norm 0.99+4.8, sieving 1010.2 (705.3 + 76.7 + 228.3), factor 457.2 (197.0 + 260.2) # Total elapsed time 80.44s, per special-q 1.7114s, per relation 0.0352634s # PeakMemusage (MB) = 8456 # Total 2281 reports [0.646s/r, 48.5r/sq] [/CODE] I should do a -t40 run since my comparison is 40 parallel gnfs-lasieve4I15e jobs |
And it looks like I should do an I=15 run both 2- and 4- threaded to see if we benefit from I=16 at all! CADO is just fine with very large Q values, so it's not important if yield is worse than half on I=15 vs I=16; that would suggest Q-range of perhaps 20M to 1100M rather than 20M to 450M, and memory use under 4GB per process. It also may turn out that optimal lim choices for I=15 are a bit higher than I=16; we are constrained by memory footprint on I=16 to have lim's about where I tested (Rlim 260M, Alim 550M), but we have no such constraints on I=15.
Did you run makefb before las to generate the factor-base file? I haven't yet had success finding relations using las free-standing, but that would clearly be preferable. |
I generated the factor-base file by running the cado-nfs script and letting it call makefb for me, then transcribed the command lines in L2253.wucmd to get raw las command lines.
[code] /scratch/cado-gnfs/cado-nfs-2.3.0/build/oak/sieve/makefb -poly L2253.jon/L2253.poly -lim 268000000 -maxbits 16 -out L2253.jon/L2253.roots.gz -t 40 [/code] seems to be the command line for makefb L2253.poly is just as for gnfs-lasieve4I15e except that it's just the polynomial and no sieving parameters: [code] n: 2688333615331433020642446747149440986283678638176205541641754312932820814295074220965678187428410875031545875881257723735692836520162515677425285432734833508071695321927492427322546769031971029 skew: 368024608.71 c0: 123272612786479316312350884349842837614862419023 c1: -8888039873820606651882838453601725169289 c2: -6398405309760975776966814915379 c3: -74102914865935467793635 c4: -344134329264960 c5: 556920 Y0: -21713911810858617860786743761277388982 Y1: 11185023447043546081 [/code] |
I would be a bit wary about running with I=15 and trusting in really large Q; yield drops off as Q goes up, and you do start to get perceptible problems with duplicates (try running a C135 with gnfs-lasieve4I12e to see both these effects)
|
I did not notice the L2253 in your initial post; I thought you were testing the C207 from the Cunningham project. Your results with I=15 vs I=16 make more sense now, and while I'll try I=15 for the sake of thoroughness (and because the CADO default C210 file uses I=15) I won't expect greatness.
Thanks for breaking down the makefb startup & how you got set up for testing. |
If I recall correctly Bob has suggested running smaller special q with a larger sieve range in the past. Maybe very small q(0-1m?) could be run with 17e. I believe duplicates won't be such an issue due to the increased sieve range(unless the sieve range is related to q size?) and yield will be incredible at tiny qs.
|
I=15 larger-scale run
[code]
$ /scratch/cado-gnfs/cado-nfs-2.3.0/build/oak/sieve/las -I 15 \ -poly L2253.jon/L2253.poly \ -q0 232000000 -q1 233000000 -lim0 268000000 -lim1 268000000 \ -lpb0 33 -lpb1 33 -mfb0 66 -mfb1 99 -lambda0 2.2 -lambda1 3.2 \ -fb L2253.jon/L2253.roots.gz -out L2253.jon/232M \ -t 40 -sqside 1 -stats-stderr # Average J=16320 for 52112 special-q's, max bucket fill 0.360714 # Discarded 0 special-q's out of 52112 pushed # Total cpu time 1718332.50s [norm 1106.12+5086.3, sieving 1158819.1 (821794.0 + 100381.7 + 236643.5), factor 553321.0 (283784.5 + 269536.5)] # Total elapsed time 49211.82s, per special-q 0.944347s, per relation 0.0194419s # PeakMemusage (MB) = 13467 # Total 2531226 reports [0.679s/r, 48.6r/sq] [/code] That's slightly lower yield (but only by a couple of percent) than I would expect from gnfs-lasieve4I15e; on the other hand it used only about a third as much memory (13467M vs forty jobs at 911M apiece=36440M). I can't say much about timings because I haven't run on that machine at that range; will do 233-234 to get a realistic comparison. It's 20% slower than forty-copies-of-15e was on average over Q=120-126. But first, trying -ncurves0={20,25,30} and -ncurves1={20,25,30} on 16e for a fixed Q range (probably about 40k, which should get reasonable speed and reasonable statistical significance) |
1 Attachment(s)
That's interesting; the timing on the ncurves runs moves around like a thermometer rather than like anything to do with mathematics. Doing large multi-threaded benchmarking runs on contemporary hardware with all its turbo boosts and the like appears quite a difficult problem.
I tried to run a 17e test; you have to rebuild everything with -DLOG_BUCKET_REGION=17 and this then doubles the size of a number of arrays; a one-threaded job tries to allocate 39GB and falls over on my busy 64G machine: [code] malloc_aligned(0x200000,2097152) called malloc_aligned(0x9d1800000,2097152) called code BUG() : condition rc == 0 failed in malloc_aligned [/code] On a busyish 96G machine it falls over when trying to allocate a second 39GB array; after allocating more swap and kill -STOPping everything else on the machine I actually get some relations [code] # 291 relation(s) for side-1 (232000009,20697965) # Time for this special-q: 441.4965s [norm 0.1389+0.6362, sieving 392.7559 (323.9631 + 29.1809 + 39.6120), factor 47.9654 (29.1407 + 18.8247)] compare 15e # 52 relation(s) for side-1 (232000009,20697965) # Time for this special-q: 33.3656s [norm 0.0152+0.1131, sieving 22.2214 (15.6162 + 1.8664 + 4.7388), factor 11.0159 (5.5952 + 5.4208)] and 16e (unfortunately for a different special-Q, but time/relations is the relevant metric) # 127 relation(s) for side-1 (231960013,7709469) # Time for this special-q: 144.9776s [norm 0.0846+0.3650, sieving 113.0854 (82.6639 + 9.1164 + 21.3050), factor 31.4426 (18.1851 + 13.2576)] So for this number, which is a GNFS-193, we're seeing 15e 0.64s/r 16e 1.14s/r 17e 1.52s/r [/code] I suppose the 2019 way to look at this would have been to hire an r5.12xlarge for a couple of hours, which actually would only have cost a dollar. |
[QUOTE=fivemack;514643]That's interesting; the timing on the ncurves runs moves around like a thermometer rather than like anything to do with mathematics.[/QUOTE]
What are the axes? Y-axis is seconds-per-specialQ and X-axis is just the sequence of specialQ; I'm running the same set of specialQ nine times with different ncurves0/ncurves1 parameters, so I was expecting nine fuzzy lines at different heights. |
Sorry henryzz, I clicked 'edit' rather than 'quote' and abused my supermoderatorial powers.
There is something in the data under all this noise: changing ncurves1 substantially changes the time for that factorisation phase whilst changing the yield very little, so I'm going to try ncurves1=5,10,15 next [code] 30.20.2:# Total cpu time 284849.56s [norm 197.30+649.6, sieving 223552.3 (161447.4 + 17982.5 + 44122.5), factor 60450.3 (38705.0 + 21745.3)] 30.20.2:# Total 236498 reports [1.2s/r, 112.7r/sq] 30.25.2:# Total cpu time 289695.86s [norm 198.30+650.6, sieving 222878.8 (161022.2 + 17985.1 + 43871.5), factor 65968.2 (38722.5 + 27245.7)] 30.25.2:# Total 236786 reports [1.22s/r, 112.8r/sq] 30.30.2:# Total cpu time 288201.77s [norm 195.66+647.7, sieving 216766.5 (155109.5 + 17964.5 + 43692.4), factor 70591.9 (38023.6 + 32568.3)] 30.30.2:# Total 236814 reports [1.22s/r, 112.8r/sq] [/code] The sieving time is the one with masses of multithreaded memory access so I can see an argument that it is going to be noisier than the other lines; indeed, the first component of sieving time contains all the wiggles in the noisy graph I posted, the rest are much closer to flat within a block. |
Revised 17e numbers
The sieving part of the first 17e special-Q was much slower than the others, so my numbers in post 369 are unrealistic.
More plausible numbers (comparison is from single-threaded jobs at I=15,16,17 range=232000000..232000010) (note that these are with a binary built with 17-bit bucket support so quite possibly less efficient than the 16-bit-bucket default): [code] grep -E "(this special-q|relation)" ../cado-nfs-2.3.0-B17/L2253.j/1?e.x 15e.x:# 50 relation(s) for side-1 (232000009,175376172) 15e.x:# Time for this special-q: 17.4141s [norm 0.0080+0.0920, sieving 12.5049 (9.0474 + 0.8440 + 2.6135), factor 4.8092 (1.7133 + 3.0959)] 16e.x:# 113 relation(s) for side-1 (232000009,175376172) 16e.x:# Time for this special-q: 49.5885s [norm 0.0600+0.1760, sieving 34.8262 (21.4750 + 2.9080 + 10.4432), factor 14.5263 (7.1553 + 7.3710)] 17e.x:# 274 relation(s) for side-1 (232000009,175376172) 17e.x:# Time for this special-q: 266.6024s [norm 0.1643+0.6817, sieving 216.8890 (157.5922 + 14.4013 + 44.8955), factor 48.8674 (30.8008 + 18.0666)] [/code] 15 0.348s/r 16 0.439s/r 17 0.973s/r So for numbers this small 17e really doesn't make sense, which we knew already. A more interesting question is what kind of yield 17e might give on SNFS jobs, particularly ugly quartics; I will use Fib(1625), quartic SNFS difficulty 271.4, as a test case. |
OK, I am reasonably confident that for 16e lpa=33 lim=268000000 it is worth using ncurves1=15 rather than the default 25 (basically, each curve factors 25% of the remaining usable composites).
[code] ncurves1 yield total time t_factor0 t_factor1 5 185576 270537.39 40156.7 6422.3 10 223265 275086.68 39581.3 11120.9 15 234399 277763.44 39406.4 16285 20 236494 277285.09 38674.7 21739.8 25 236782 289479.05 38263.5 27276.3 30 236810 288675.62 37848.9 32619.5 [/code] ncurves1=15 comes from fitting lines to t_factor0+t_factor1, adding the average non-factor time, and then optimising expected-yield / total-expected-time. |
17e is not really enough to make difficulty-270 quartic SNFS practical.
The yield is a bit higher, but you have to go to 34-bit large primes to get more than one relation per Q, and collecting a billion relations at nine CPU-seconds per relation is not a plausible job. [code] I lpr time/rel rels/Q 15 32 10.09365591 0.06975 15 33 5.61544898 0.1225 15 34 3.367857143 0.21 16 32 11.4795212 0.18275 16 33 6.513562066 0.32425 16 34 3.922789209 0.54675 17 32 28.05035398 0.452 17 33 15.07548578 0.844 17 34 8.857269553 1.432 [/code] (this is with rlim=268M alim=67M lpba=30, because big quartics are very asymmetric towards the rational side) With -t4, 15e takes 7579MB, 16e takes 25610MB, 17e takes 89361MB |
VBCurtis,
I'm having a trouble I think I had before: My system is crashing during polyselect - the host is hanging up. The composite is a C139, so CADO-NFS is using params.c140. Do you have a modified c140 I should try before I do something with the default one? Ed |
1 Attachment(s)
Ed-
RichD and I have been working on params for c140 and c145. Attached is the fastest c140 file we've yet run; I'm almost ready to post it "officially", but I want to test a few more inputs to make sure I didn't just get a lucky poly on this run. I think the hang happens on very small c5 values; we worked around it by using a small admin value such as tasks.polyselect.admin = 1680 (used in this file, in fact). Note the file has an input and name and number of clients for Rich's 4-core i5; you should reset the top section to what you normally use. Edit: may as well include timing data: On a non-HT i5-quad-core desktop, 16k thread-sec poly select, 195k thread-sec sieve, 29k thread-sec matrix. Wall clock time 112k sec, 31 hrs and change. |
Thanks Curtis,
I'll try to run this tomorrow. I did remove the RichD specific lines. -Ed |
Just an update. The original c139 that I mentioned above would not factor with cado-nfs. The host kept crashing.
I tried a different set of composites and all ran fine. I then put the original c139 back in the queue and ECM factored it while I wasn't looking. I haven't bothered finding and reconstructing the c139, but several other composites have been using the modified params.c140 file without issue. I will try to get some details up soon. Thanks! |
I'm running a c150 by myself (e.g. ./cado-nfs.py <BIG_NUMBER> --screenlog DEBUG --filelog DEBUG), I'm not seeing datetime/timestamps in my log files.
In VBCurtis's coordinated Team CADO solve of 2,2330L all the log files have both PID and datetime. What option(s) do I set to get datetime in my logs? |
The only option I set for CADO jobs is --server-threads, if I didn't set it in the params file.
Maybe try ditching the DEBUG flags? |
Every so often I am getting an error for full buckets while sieving. Is there an easy fix for this?
|
@VBCurtis:
I modified the "improved" params.c140 into a "modified" params.c150 (although I forgot to rename it within the file) based on your suggestions in the "Improved" thread and am running it on another c152 composite. I hadn't noticed this before, but maybe it's normal behavior: after settling with receipt of several WUs from many clients, it gave an ETA of ~7 PM. Later on a check showed an ETA of ~7:30 and then it kept moving later. I then started up a few more clients and even after they had submitted several WUs, the march into the future continues. At present, the original ETA of ~7 PM has transformed into later than 9:30 PM. I can foresee it as an elusive goal. I assume (I know!) this is because of diminishing relations as the search area progresses, but thought I'd offer this bit of info in case it is helpful in reaching better parameters. -Ed |
[QUOTE=EdH;526939]@VBCurtis:
I modified the "improved" params.c140 into a "modified" params.c150 (although I forgot to rename it within the file) based on your suggestions in the "Improved" thread and am running it on another c152 composite. I hadn't noticed this before, but maybe it's normal behavior: after settling with receipt of several WUs from many clients, it gave an ETA of ~7 PM. Later on a check showed an ETA of ~7:30 and then it kept moving later. I then started up a few more clients and even after they had submitted several WUs, the march into the future continues. At present, the original ETA of ~7 PM has transformed into later than 9:30 PM. I can foresee it as an elusive goal. I assume (I know!) this is because of diminishing relations as the search area progresses, but thought I'd offer this bit of info in case it is helpful in reaching better parameters. -Ed[/QUOTE] I've noticed it too when I did some stand-alone (single box) jobs for VBCurtis. The ETA on sieving perpetually increases, most likely from what you mentioned above. |
[QUOTE=RichD;526941]I've noticed it too when I did some stand-alone (single box) jobs for VBCurtis. The ETA on sieving perpetually increases, most likely from what you mentioned above.[/QUOTE]I was somewhat disappointed that adding several more totally unique clients did not seem to offset the increasing loss.
|
[QUOTE=EdH;526942]I was somewhat disappointed that adding several more totally unique clients did not seem to offset the increasing loss.[/QUOTE]
I can't think of a reason except that the first few WUs being returned may have a temporary reverse in ETA. After that is established you would have a constant flow of WUs with each being fewer relations [strike]as[/strike] than before. |
This is a side effect of one of the settings I use that CADO default does not:
I start with very low Q values, where relations are much faster and higher-yield. CADO starts with Q in the general vicinity of lim0, a higher value than my jobs usually end at. So, CADO's files run over a Q-range where sec/rel is fairly steady, while we run from Q where relations are very fast to a medium-sized Q where relations are still faster than CADO's ranges but measurably slower than the low starting Q values. So, since ETA is based on the average rate of relations, the ETA keeps slipping throughout the job. The massive yield and sec/rel improvement at low Q is also why I shift to I=13 and I=14 so much lower than the default files. Turns out I=13 at 120 digits finds so many relations at small Q values that it's overall faster than I=12, even though I=12 is faster over any particular Q-range as measured by sec/rel. I'm pretty sure that a hybrid of, say, I=14 at low Q and I=13 at higher Q would be even faster for 135-150 digits, but I haven't gotten 'round to asking the CADO mailing list if they would consider implementing such an option. |
[QUOTE=VBCurtis;526946]. . .
I'm pretty sure that a hybrid of, say, I=14 at low Q and I=13 at higher Q would be even faster for 135-150 digits, but I haven't gotten 'round to asking the CADO mailing list if they would consider implementing such an option.[/QUOTE]Since I'm using scripts, it seems I should be able to catch a percentage of completion and interrupt CADO-NFS, then swap to I=13 with a snapshot restart. Isn't this what you manually did with 2,2330L? I'm also considering more CADO-NFS/msieve hybrid work. The catch is if msieve fails to create a matrix with too few relations. At such a failure, can I alter the relations_wanted and restart with a snapshot? I've been working with my scripts quite extensively in the recent days and might try to add some of the above as well, if time permits. Thanks for all your help, -Ed |
Yes, That's what I did with 2330L, and you definitely could alter the snapshot file from I=14 to I=13 and restart. A wild guess is to do so after 30% of relations for c135 params, 50% for 140, 75% for 145, and not to bother (leave it all I=14) for c150. I'm actually planning to test I=13 for c135 params; that might be too soon for I=14.
I also agree that if msieve fails to build a matrix, you can add 10% to rels_wanted in the snapshot file and restart CADO. 5% might build a matrix, but above ~140 digits it's better to have a bit of oversieving for a faster matrix phase. |
All times are UTC. The time now is 19:56. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.