![]() |
RSA-240 and RSA-250 Factored!!
[url]https://lists.gforge.inria.fr/pipermail/cado-nfs-discuss/2019-December/001139.html[/url]
|
900 core-years computation time (800 sieve, 100 matrix) on 2.1ghz Xeons gold. They observe this job ran 3x faster than would be expected from an extrapolation from RSA-768, and in fact would have been 25% faster on identical hardware than RSA-768 was.
I'd love a more detailed list of parameters! Perhaps a future CADO release will include them in the c240.params default file. :) For comparison, we ran a C207 Cunningham number 2,2330L in about 60 core-years sieve, which scales really roughly to an estimate of 3840 core-years sieve (6 doublings at 5.5 digits per doubling). The CADO group found a *massive* improvement in sieve speed for large problems! 4 times faster, wowee. Edit: Their job is so fast that RSA-250 is easily within their reach. Which means that C251 from Euclid-Mullen is within reach, theoretically. I mean, imagine if all NFS work over 200 digits is suddenly twice as fast..... |
[QUOTE=VBCurtis;531861]900 core-years computation time (800 sieve, 100 matrix) on 2.1ghz Xeons gold. They observe this job ran 3x faster than would be expected from an extrapolation from RSA-768, and in fact would have been 25% faster on identical hardware than RSA-768 was.
I'd love a more detailed list of parameters! Perhaps a future CADO release will include them in the c240.params default file. :) For comparison, we ran a C207 Cunningham number 2,2330L in about 60 core-years sieve, which scales really roughly to an estimate of 3840 core-years sieve (6 doublings at 5.5 digits per doubling). The CADO group found a *massive* improvement in sieve speed for large problems! 4 times faster, wowee. Edit: Their job is so fast that RSA-250 is easily within their reach. Which means that C251 from Euclid-Mullen is within reach, theoretically. I mean, imagine if all NFS work over 200 digits is suddenly twice as fast.....[/QUOTE] Truly excellent work. And as you say doubling the speed adds 5-6 digits. If going after a ~C250, I think 2,1139+ is a better target. It's been waiting for nearly 60 years to be factored. The Euclid-Mullen cofactor is a relative newcomer. Of course I am biased towards Cunningham numbers. I'd love to hear the implementation/parameterization details that resulted in their terrific speed improvement. |
New parameter file posted for C240
Here is a link to the new C240 parameter file, posted by Paul Zimmermann about 9 hours ago. [URL="https://gforge.inria.fr/scm/browser.php?group_id=2065"]https://gforge.inria.fr/scm/browser.php?group_id=2065 :smile:
[/URL] |
[QUOTE=Robert_JD;531867]Here is a link to the new C240 parameter file, posted by Paul Zimmermann about 9 hours ago. [URL="https://gforge.inria.fr/scm/browser.php?group_id=2065"]https://gforge.inria.fr/scm/browser.php?group_id=2065 :smile:
[/URL][/QUOTE] Nice. But reading the file requires knowledge of some app specific syntax. |
For those curious, but not curious enough to git:
Poly select used some different parameters, notably incr of 110880 (similar to tests Gimarel has run with msieve) and admax (that is, c6 max) of 2e12. CADO-specific params: P 20 million, nq 1296. I bet they'd have better-yet performance with nq of 7776 and admax around 3e11, for the same search time. sizeopt-effort was set to 20; I've not seen this set on any previous params file. sieve params: factor base bounds of 1.8e9 and 2.1e9. LP bounds of 36/37, mfb 72 and 111 (exactly 2x and 3x LP, so 3LP on one side) lambda values specified at 2.0 and 3.0, respectively Q from 800 million up. CADO-specific: ncurves0 of *one hundred*. Typical previously was 25 or so. ncurves1 of 35, typical previous was 15 or so. tasks.A = 32; this is a new setting not in my ~Sep'19 git version of CADO. This "A" setting, combined with much higher ncurves, appears to be where the new speed is found. Matrix: target density 200 (170 was prior standard). This number is about 50 higher than msieve's version of this setting, so this corresponds to target density of ~150. Not that crazy for a C240. I checked the params.c90 file, which is where the CADO team explains the meaning of each setting. No mention of "A". However, there is a new setting: tasks.sieve.adjust_strategy = 0 # this may be 0, 1, 2. 0 is default. They note that 1 or 2 may require more work than 0, but give more relations. I checked a handful of other params files in today's git clone, no other use of "A". EDIT: Aha! tasks.I is not set in the new c240 file; I is the equivalent of siever size (e.g. I=16). tasks.A is set instead. I imagine these are related. |
[QUOTE=VBCurtis;531881]For those curious, but not curious enough to git:
Poly select used some different parameters, notably incr of 110880 (similar to tests Gimarel has run with msieve) and admax (that is, c6 max) of 2e12. CADO-specific params: P 20 million, nq 1296. I bet they'd have better-yet performance with nq of 7776 and admax around 3e11, for the same search time. sizeopt-effort was set to 20; I've not seen this set on any previous params file. sieve params: factor base bounds of 1.8e9 and 2.1e9. LP bounds of 36/37, mfb 72 and 111 (exactly 2x and 3x LP, so 3LP on one side) [/QUOTE] mfb? Bound for cofactors? i.e. try to factor cofactors only if less than these bounds? [quote] lambda values specified at 2.0 and 3.0, respectively [/quote] Sieve thresholds? i.e. 2*log(max element in fbase) respectively. 3x? I presume the rational side is 2 and the algebraic 3? [quote] Q from 800 million up. CADO-specific: ncurves0 of *one hundred*. Typical previously was 25 or so. ncurves1 of 35, typical previous was 15 or so. [/quote] What were B1/B2? I presume 100 curves tried at one B1/B2 then 35 more at a higher B1/B2? [quote] tasks.A = 32; this is a new setting not in my ~Sep'19 git version of CADO. This "A" setting, combined with much higher ncurves, appears to be where the new speed is found. [/quote] It would be nice to know the meaning. Also, what was the sieve area per Q? Was it constant? Or larger for smaller Q? Did they consider trying smaller Q than 800M? How many total Q? What was the average yield per Q? |
[QUOTE=R.D. Silverman;531883]What were B1/B2? I presume 100 curves tried at one B1/B2 then 35 more at a higher
B1/B2?[/QUOTE] 100 curves tried on the rational side (rational = side 0, as you presumed). 35 curves tried on the algebraic side; I believe the logic is that many of the cofactors won't split into 3 factors, so one doesn't try as hard to split the 3LP side. I believe the ECM bounds increase each trial, but the details are not documented anywhere I've seen; perhaps one would need to review the code (or ask the mailing list?) to discover these details. mfb has the same meaning as it does for GGNFS jobs. |
[QUOTE=VBCurtis;531885]
mfb has the same meaning as it does for GGNFS jobs.[/QUOTE] The reply assumes that one knows what it means for GGNFS...... May I ask for a mathematical definition? |
[QUOTE=R.D. Silverman;531886]The reply assumes that one knows what it means for GGNFS......
May I ask for a mathematical definition?[/QUOTE] It would also be nice to know how much RAM was required per thread (or process) |
Sure, but I must leave it to someone more well-versed in NFS to reply.
I believe it's the size in bits of cofactors to be fed to ECM to be split. I believe CADO uses lambda to also control the size of cofactors, as lambda * LP size. This provides finer-grained control over cofactor size; however, I am not sure about this. I tried today's git on a small job with tasks.I = 12 replaced with tasks.A = 24 (or 32), and got an error that tasks.I was not specified. This means the params.c240 file posted today by PaulZ would not actually run, I think. I was hoping/speculating that tasks.A was a measure of sieve area, and that CADO might now vary the dimensions of the area intelligently. I = 16 corresponds to 2^16 by 2^15 for the sieve region, akin to 16e GGNFS. Alas, a mystery we await an answer for! |
When calling the siever directly 2^A is the sieve region. A=31 defaults to I=16 A=29 to I=15 etc. A=2*I-1
I think that A=32 will be 2^16 by 2^16. It is twice the region of A=31 in any case. A=32 is more manageable than I=17 memory wise so it is an option for low q sieving to get more yield. sieve.adjust_strategy is different strategies for selection of I and J in 2^I by 2^J given A=I+J. It is described in las -h [QUOTE] -adjust-strategy strategy used to adapt the sieving range to the q-lattice basis (0 = logI constant, J so that boundary is capped; 1 = logI constant, (a,b) plane norm capped; 2 = logI dynamic, skewed basis; 3 = combine 2 and then 0) ; default=0[/QUOTE] 3 can be better sometimes but can use more memory. It is a potentially useful option for someone with a low yield or if you want to focus sieving on less qs. I would suggest some experimentation with this may be worthwhile. It may speed up some sizes for some q(which q might be an unanswered research question) Is it getting to the point where NFS@Home should be looking at switching to the CADO siever? |
This post to the CADO-NFS list seems very slightly relevant.
[quote]It is persisting. ./cado-nfs.py <309 digit RSA number> -t 60 I cannot provide the exact number per my work agreement. Best Regards, -- Justin Granger [/quote] Someone trying polynomial searching on a kilobit composite. |
[QUOTE=R.D. Silverman;531875]Nice. But reading the file requires knowledge of some app specific syntax.[/QUOTE]
Knowledge not required, just a dose of enough common sense logic that others in this thread would inevitably grasp :hello: |
[QUOTE=VBCurtis;531861]900 core-years computation time (800 sieve, 100 matrix) on 2.1ghz Xeons gold. They observe this job ran 3x faster than would be expected from an extrapolation from RSA-768, and in fact would have been 25% faster on identical hardware than RSA-768 was.
I'd love a more detailed list of parameters! Perhaps a future CADO release will include them in the c240.params default file. :) For comparison, we ran a C207 Cunningham number 2,2330L in about 60 core-years sieve, which scales really roughly to an estimate of 3840 core-years sieve (6 doublings at 5.5 digits per doubling). The CADO group found a *massive* improvement in sieve speed for large problems! 4 times faster, wowee. Edit: Their job is so fast that RSA-250 is easily within their reach. Which means that C251 from Euclid-Mullen is within reach, theoretically. I mean, imagine if all NFS work over 200 digits is suddenly twice as fast.....[/QUOTE] [QUOTE]...imagine if all NFS work over 200 digits is suddenly twice as fast.....[/QUOTE] I certainly wouldn't mind re-factoring RSA-200 again, which took a little less than 7 months - IF utilizing similar parameter upgrades would possibly double the speed. :devil: |
The link just went down. I'm guessing it's due to high traffic.
|
A copy of the announcement has been saved in the Internet Archive: [url]http://web.archive.org/web/20191203150058/https://lists.gforge.inria.fr/pipermail/cado-nfs-discuss/2019-December/001139.html[/url]
|
Some additional details about the RSA-240 factorization, as well as the discrete log done at the same time can be found at:
[url]https://eprint.iacr.org/2020/697[/url] |
[QUOTE=Branger;553115]Some additional details about the RSA-240 factorization, as well as the discrete log done at the same time can be found at:
[url]https://eprint.iacr.org/2020/697[/url][/QUOTE] This paper also exhibits the factors of RSA-250!! Parameters were nearly the same as for RSA-240, except for increasing sieve region from A=32 to A=33 (a doubling of sieve area, equivalent to using a mythical 17e on GGNFS). Still 2LP on one side, 3 on the other. Lim's were 2^31. Only 8.7G raw relations were needed, 6.1G unique!! They cite 2450 Xeon-Gold-2.1Ghz core-years sieving, 250 core-years matrix for 405M matrix size. |
[QUOTE=VBCurtis;553137]Parameters were nearly the same as for RSA-240, except for increasing sieve region from A=32 to A=33 (a doubling of sieve area, equivalent to using a mythical 17e on GGNFS).[/QUOTE]
Tried this out - a single instance of las with their parameters uses 38GB of memory :max: |
That makes you think how big is the LA machine. Only a few people around here can accommodate a 40M matrix let alone a 405M matrix!!
|
Same way Greg does- the supercomputing grids used by the CADO team for these factorizations can handle jobs such as a matrix distributed over many nodes. The paper includes a summary of the number of nodes used for each step of the RSA-240 matrix.
I'm not aware of filtering being split over multiple nodes, so that is the part that needs the largest-memory machine, and that likely fit in 256GB (perhaps 384). |
The filtering does take a ton of memory and a previous CADO paper showed how to split the merge phase across many threads of a single machine; per the report the filtering machine had 1.5TB of memory.
The other high memory step is the middle part of block Wiedemann, and the more clusters you split the LA across the more memory that middle part uses. The square root does not need high memory if you use Montgomery's very complex square root algorithm, or you can throw memory at the problem and use the much simpler algorithm that CADO-NFS does. |
Assuming that it isn't for huge time periods, 1.5TB isn't unthinkable for a university to have access to in one machine. I had access to 1TB at once for a while based off Ivy Bridge hardware. It will be easier/cheaper now than that.
|
| All times are UTC. The time now is 20:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.