mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2018-04-27, 04:26   #1
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default CADO-NFS vs GGNFS and other NFS considerations [split from New Factoring Algorithm]

Quote:
Originally Posted by CRGreathouse View Post


The state-of-the-art algorithm for factoring large numbers (after preprocessing to remove small factors) is the number field sieve. I'm not sure what the best implementation is but something like msieve for GPU poly select, GGNFS for sieving, and maybe msieve for the linear algebra. Most of the time is spent in GGNFS, potentially across many client machines.

Prolly CADO on CPU is competitive with Msieve on CUDA for polyselect. I've heard rumors that its sievers are also superior to GGNFS (not to mention actively developed).

As far as I know, msieve is still king of post processing by a fair margin.
Dubslow is offline   Reply With Quote
Old 2018-04-27, 12:41   #2
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

49816 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Prolly CADO on CPU is competitive with Msieve on CUDA for polyselect. I've heard rumors that its sievers are also superior to GGNFS (not to mention actively developed).

As far as I know, msieve is still king of post processing by a fair margin.
I assume you're referring to EdH factoring a C150 on ~170 cores with CADO-NFS vs. GGNFS+Msieve.
in these two posts:
http://mersenneforum.org/showpost.ph...&postcount=169
http://mersenneforum.org/showpost.ph...&postcount=172

GGNFS+Mieve :
Quote:
170 cores of various flavors were used for the msieve/ggnfs run. A weak dual core laptop failed to run CADO-NFS. This should only be a very minor loss.

The msieve/ggnfs run started poly selection at 10:53:46.
Sieving started at 11:01:27.
Matrix solving began at 19:01:45
LA began at 19:01:46.
Square root started at 00:57:58
The three factors were logged at 01:45:00
CADO-NFS:
Quote:
And, the results are in:
Code:
Info:Polynomial Selection (size optimized): Aggregate statistics:
Info:Polynomial Selection (size optimized): potential collisions: 56795
Info:Polynomial Selection (size optimized): raw lognorm (nr/min/av/max/std): 57461/46.070/54.041/58.840/0.891
Info:Polynomial Selection (size optimized): optimized lognorm (nr/min/av/max/std): 57461/44.100/48.552/54.860/1.446
Info:Polynomial Selection (size optimized): Total time: 91122.7
Info:Polynomial Selection (root optimized): Aggregate statistics:
Info:Polynomial Selection (root optimized): Total time: 8478.5
Info:Polynomial Selection (root optimized): Rootsieve time: 8477.14
Info:Generate Factor Base: Total cpu/real time for makefb: 36.18/8.17465
Info:Generate Free Relations: Total cpu/real time for freerel: 624.23/82.0787
Info:Lattice Sieving: Aggregate statistics:
Info:Lattice Sieving: Total number of relations: 53833827
Info:Lattice Sieving: Average J: 3798.46 for 2342851 special-q, max bucket fill: 0.697496
Info:Lattice Sieving: Total CPU time: 6.52511e+06s
Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for dup1: 165.89/578.793
Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics:
Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1: 577.6000000000001s
Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for dup2: 1273.34/1043.39
Info:Filtering - Singleton removal: Total cpu/real time for purge: 1078.85/1214.3
Info:Filtering - Merging: Total cpu/real time for merge: 3665.72/3278.51
Info:Filtering - Merging: Total cpu/real time for replay: 230.56/198.062
Info:Linear Algebra: Total cpu/real time for bwc: 945170/0.000172853
Info:Linear Algebra: Aggregate statistics:
Info:Linear Algebra: Krylov: WCT time 78981.07
Info:Linear Algebra: Lingen CPU time 2012.76, WCT time 306.77
Info:Linear Algebra: Mksol: WCT time 42773.62
Info:Quadratic Characters: Total cpu/real time for characters: 205.42/64.8022
Info:Square Root: Total cpu/real time for sqrt: 11575.4/1607.64
Info:HTTP server: Shutting down HTTP server
Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 7.58874e+06/171384
Info:root: Cleaning up computation data in /tmp/cado.vzafj00h
35912223503197268109418424875344813700437442706880566137398291217213  22947545427314151445011966017377  584458412373341050558641690854880477452541557046361
It was finished in just less than two days - 47h 36m 24s.
So GGNFS+Mieve vs. CADO-NFS C150
Poly select: 7:39 vs. 9:46 (99,601 cpusec /170 cores=~586 sec realtime)
Sieving: 8:00:28 vs. 10:39:43 (6.52511e+06s cpusec /170 cores =~38,383 sec realtime)
LA: 5:56:13 vs. 33:54:21 (78,981.07 + 306.77 + 42,773.62 = 122,061 sec)
SQR: 47:02 vs. 26:47 (1607.64 sec)

Poly and Sieving seem to be comparable, but what a big difference in the post-sieving times (where MSieve shines). CADO-NFS probably required more relations, so that explains why it sieved 2 hours more. But since this was a C150, GGNFS probably used the 14e siever??? I don't know how the comparison changes with larger composites which use the 15e/16e/16f sievers???
VictordeHolland is offline   Reply With Quote
Old 2018-04-27, 13:32   #3
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

596110 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Prolly CADO on CPU is competitive with Msieve on CUDA for polyselect. I've heard rumors that its sievers are also superior to GGNFS (not to mention actively developed).
How can that be, though, when GGNFS is so far ahead of CADO-NFS? Don't get me wrong, CADO-NFS has made huge strides to catch up, and for small numbers it's essentially at parity. And definitely it has development at a rate far beyond that of GGNFS. But unless I've missed something (and I may have) it's not faster than GGNFS at any size which almost surely means it sieves more slowly (since the bulk of the work is sieving).
CRGreathouse is offline   Reply With Quote
Old 2018-04-27, 13:40   #4
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

67218 Posts
Default

polyselect for msieve was hard stopped by "poly_deadline=300" since I was running so many machines. I don't now what msieve would have wanted, but it might have been considerably more time if it had chosen.

14e was the chosen siever:
Code:
Tue Apr 17 11:01:27 2018 -> factmsieve.py (v0.86)
Tue Apr 17 11:01:27 2018 -> This is client 1 of 40
Tue Apr 17 11:01:27 2018 -> Running on 4 Cores with 2 hyper-threads per Core
Tue Apr 17 11:01:27 2018 -> Working with NAME = comp
Tue Apr 17 11:01:27 2018 -> Selected lattice siever: gnfs-lasieve4I14e
Tue Apr 17 11:01:27 2018 -> Creating param file to detect parameter changes...
Tue Apr 17 11:01:27 2018 -> Running setup ...
Tue Apr 17 11:01:27 2018 -> Estimated minimum relations needed: 4.55551e+07
EdH is offline   Reply With Quote
Old 2018-04-27, 15:25   #5
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3·5·307 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
How can that be, though, when GGNFS is so far ahead of CADO-NFS? Don't get me wrong, CADO-NFS has made huge strides to catch up, and for small numbers it's essentially at parity. And definitely it has development at a rate far beyond that of GGNFS. But unless I've missed something (and I may have) it's not faster than GGNFS at any size which almost surely means it sieves more slowly (since the bulk of the work is sieving).
Data comparing sieve time spent for CADO jobs vs msieve jobs show 25-50% longer times, but CADO requires 25-60% more relations for a particular LP choice than msieve. The seconds per relation time for CADO sievers are comparable, and sometimes up to 10% faster, than GGNFS in the 13e to small 14e ranges that I've tested.

CADO development is just about at the point where it may be worth adapting the factmsieve python wrapper to call the CADO siever rather than GGNFS.
VBCurtis is offline   Reply With Quote
Old 2018-04-27, 15:39   #6
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3·1,987 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Data comparing sieve time spent for CADO jobs vs msieve jobs show 25-50% longer times, but CADO requires 25-60% more relations for a particular LP choice than msieve.
Very interesting. Why is it that CADO needs more relations?
CRGreathouse is offline   Reply With Quote
Old 2018-04-27, 19:56   #7
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3·5·307 Posts
Default

Two reasons, that I can tell:
1. The CADO equivalent of target density is set very high by default.
2. Even when set lower, CADO filtering fails to produce a matrix with a similar number of relations that work for msieve.

I hope that future development addresses (2), as that holds out the possibility that the overall CADO package could catch up to GGNFS/msieve in elapsed time. Obviously, (1) is trivial to adjust and I have done so for the params files c130 and lower that I've submitted to the CADO group. Lowering density from default 170 to 145 or 150 dropped the number of relations needed without adding meaningful time to matrix-solve-time (I noted no increases on average).
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
New Factoring Algorithm GreasyScooby Factoring 4 2018-04-27 13:48
CADO-NFS and GGNFS sieving ray10may Msieve 1 2017-04-14 14:19
RAID 1 set-up, USB vs. NAS, and other considerations Uncwilly Hardware 15 2016-09-25 13:39
Factoring with GGNFS VolMike Factoring 19 2007-10-22 18:12

All times are UTC. The time now is 22:28.

Sun Jan 17 22:28:27 UTC 2021 up 45 days, 18:39, 0 users, load averages: 1.51, 1.65, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.