mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > YAFU

Reply
 
Thread Tools
Old 2019-07-26, 19:19   #12
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22·32·7·13 Posts
Default

Quote:
Originally Posted by jasonp View Post
How do you handle the filtering for datasets that include TLP relations?
It does a full merge using the greedy combining process described in section 4 of Dodson and Lenstra's paper .

The resulting matrix for the C135 wasn't very large, but it was pretty dense. Some of that can be improved by rejecting cycles that are huge, which I didn't do. Even so, single-thread Lanczos only took an hour or so to run on the dense matrix.
bsquared is offline   Reply With Quote
Old 2019-07-26, 19:25   #13
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

23·32·72 Posts
Default

Ok, I was asking because if you want to tune that part of the process you can convert relations into the intermediate format that Msieve's NFS filtering uses and then run all the filtering as if it was an NFS job. I can almost guarantee the resulting matrix would wind up much better because you can leverage the algorithm research that goes into NFS filtering.

Of course that takes a backseat to tuning the sieving.
jasonp is offline   Reply With Quote
Old 2019-07-26, 19:30   #14
xilman
Bamboozled!
 
xilman's Avatar
 
May 2003
Down not across

1012 Posts
Default

Quote:
Originally Posted by bsquared View Post
More concerning to me is the fact that the TLP doesn't seem any better than DLP at the moment. Its very possible that I'm doing something wrong still, some bug or oversight that is limiting the TLP speed, or maybe it is parameters.
I suspect the latter. Before the C135 computation I'd been running my TMPQS implementation for a year or so on perhaps 20 different numbers ranging from C85 to C110 (these may not be accurate figures because I don't have access to the contemporary hand-written records which are still in the UK but they are in the right ballpark), tweaking the parameters to minimize sieving time. Our implementations are so different that the optimal parameter sets should also be expected to be different.

Back in the day there were still many composites of interest which were still best done by QS, not least because GNFS implementations were not easily available and, even if you could get hold of one, it was bleeding edge code which need skill and patience to nurse its way through a factorization. Sic transit gloria mundi.
xilman is offline   Reply With Quote
Old 2019-07-26, 19:36   #15
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×32×7×13 Posts
Default

Quote:
Originally Posted by xilman View Post
I'm sure Ben is aware of this but others may not be: in our paper we stated that GNFS was a better option than TMPQS for numbers of this size. Our experiment was purely to investigate the cycle explosion behaviour and to characterize the phase transition when it happened.
Yes, indeed.

An NFS run on the same number and same computer took about 8 hours for poly find and sieving. The matrix was larger, of course, but used Jason's multi-threaded and better optimized Lanczos. The matrix+sqrt took about another hour (on a different machine on which those tasks are better suited). In round numbers, QS ran about 75hrs / 9 hrs ~ 8 times slower than NFS. The NFS in question is yafu's automated msieve+ggnfs version.
bsquared is offline   Reply With Quote
Old 2019-07-26, 19:39   #16
xilman
Bamboozled!
 
xilman's Avatar
 
May 2003
Down not across

1012 Posts
Default

Quote:
Originally Posted by Till View Post
Sorry, but taking Moore's law into account your result does not look too impressive.

Original computation time (2001): 231 days ~= 5544 hours.
New Computation time (2019): 75 hours


Assuming that computing speed doubles every 2 years we have
Original computation time/2^9 = 10.8 hours
APPLE .NE. ORANGE

The hardware architectures are radically different. Back in the day memory could almost keep up with the cpu. That hasn't been true for a long time now. Our computation ran for a large part on MIPS and Sparc processors. That which did use x86 ran on 486 or early (inevitably 32-bit) Pentium cpus.

You also appear to be comparing elapsed times. A fairer comparison would be to use cpu-core time.

Last fiddled with by xilman on 2019-07-26 at 19:41
xilman is offline   Reply With Quote
Old 2019-07-26, 19:42   #17
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

22×32×7×13 Posts
Default

Quote:
Originally Posted by jasonp View Post
Ok, I was asking because if you want to tune that part of the process you can convert relations into the intermediate format that Msieve's NFS filtering uses and then run all the filtering as if it was an NFS job. I can almost guarantee the resulting matrix would wind up much better because you can leverage the algorithm research that goes into NFS filtering.

Of course that takes a backseat to tuning the sieving.
I wanted to roll my own at first because I wanted to understand more about how many-large-prime merging works. I'm happy I was able to understand it enough to get something up and running that works fairly well. This has been a nagging open question/task for me for probably close to a decade now that I can finally check off. Now, however, yes... using better techniques that you and others have extensively developed is on the list to do.
bsquared is offline   Reply With Quote
Old 2019-07-26, 19:52   #18
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101110010002 Posts
Default

I've been remiss in adding my congrats, getting end-to-end QS with three large primes is a major major undertaking.
jasonp is offline   Reply With Quote
Old 2019-07-26, 21:00   #19
xilman
Bamboozled!
 
xilman's Avatar
 
May 2003
Down not across

1012 Posts
Default

Quote:
Originally Posted by jasonp View Post
I've been remiss in adding my congrats, getting end-to-end QS with three large primes is a major major undertaking.
I'll take that as a compliment to Arjen and myself

Last fiddled with by xilman on 2019-07-26 at 21:01
xilman is offline   Reply With Quote
Old 2019-07-27, 15:57   #20
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2×3×311 Posts
Default

Quote:
Originally Posted by Till View Post
Sorry, but taking Moore's law into account your result does not look too impressive.

Original computation time (2001): 231 days ~= 5544 hours.
New Computation time (2019): 75 hours


Assuming that computing speed doubles every 2 years we have
Original computation time/2^9 = 10.8 hours
Hardware speed doubled every 2 years up to about 2005. Then CPUs hit the thermal limit so clock rates havn't increased much since then, although instructions per clock cycle and the number of cores have increased. So compare CPU core hours * clock rate for each calculation for a better comparison.

Chris
chris2be8 is offline   Reply With Quote
Old 2019-07-27, 17:45   #21
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

2·7·29 Posts
Default

Quote:
Originally Posted by xilman View Post
APPLE .NE. ORANGE

The hardware architectures are radically different. Back in the day memory could almost keep up with the cpu. That hasn't been true for a long time now. Our computation ran for a large part on MIPS and Sparc processors. That which did use x86 ran on 486 or early (inevitably 32-bit) Pentium cpus.

You also appear to be comparing elapsed times. A fairer comparison would be to use cpu-core time.
Quote:
Originally Posted by chris2be8 View Post
Hardware speed doubled every 2 years up to about 2005. Then CPUs hit the thermal limit so clock rates havn't increased much since then, although instructions per clock cycle and the number of cores have increased. So compare CPU core hours * clock rate for each calculation for a better comparison.

Chris
You are both certainly right, but I only did a quickshot calculation. No serious critics intended, by the way.

Otherwise, I might be able to deduce the parameters of Ben's hardware from other threads, but https://infoscience.epfl.ch/record/1...es/NPDF-28.pdf does not seem to contain the necessary information.

Anyway, whoever wants more accurate numbers is free to compute them :-)
Till is offline   Reply With Quote
Old 2019-07-29, 19:10   #22
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

1100110011002 Posts
Default

Quote:
Originally Posted by Till View Post
Actually I am impressed too (despite my calculation above).


Do you know Thorsten Kleinjungs SIQS variant?
http://www.ams.org/journals/mcom/201...-2015-03058-0/


He seems to have made an algorithmic improvement to SIQS.

There are some numbers for up to 130 digit factor arguments (measured in core years).
I'm trying to weigh whether or not to try out the ideas in the paper.

The paper has timings for the main computational steps of the class group computations for imaginary quadratic fields with d-digit discriminants. They mention that the sieving step of this computation is the same as that used during SIQS integer factorization, but I don't know much more than that about class group computations. Anyway, the table reports a time of 2.23 core years spent in the sieving step for a C130 input. Their core is a 1.9GHz Opteron core.

I used a 7210 KNL processor that has 256 virtual cores (64 physical cores, each with 4 hyperthreads) that run at 1.5 GHz.

The C135 sieving took 74 wall clock hours * 256 cores = 18,944 core-hours = 2.16 core years. Although I'm not sure how to handle the hyperthreads... if you only count physical cores (performance certainly does not scale linearly past 64 cores on this processor) then it spent 0.54 core years. Call it double that to 1.08 core-years to account for the diminishing return of the hyperthreads.

Scaling the ratio of core-years by the ratio of clock frequencies and inversely by the ratio of effort I get:
2.23 / 1.08 * 1.9 / 1.5 * exp(sqrt(ln C135 ln ln C135)) / exp(sqrt(ln C130 ln ln C130))
which if I did that right means 4.88 times the effort for equivalent job sizes.

This still isn't right because the sievers are probably memory bound, not cpu bound, and the two implementations use way different instruction sets (and AVX512 could maybe also accelerate their ideas). But it's enough to tell me that it might not be worth the effort to try to implement it.

Even so, I do need to read more and try to understand it better.
bsquared is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Where do I send my PRP primes with large k? Trilo Riesel Prime Search 3 2013-08-20 00:32
48-bit large primes! jasonp Msieve 24 2010-06-01 19:14
NFS with 5 and 6 large primes jasonp Factoring 4 2007-12-04 18:32
Why only three large primes fivemack Factoring 18 2007-05-10 12:14
What is the use of these large primes Prime Monster Lounge 34 2004-06-10 18:12

All times are UTC. The time now is 05:41.

Sat Aug 15 05:41:15 UTC 2020 up 2 days, 2:16, 0 users, load averages: 3.05, 3.09, 2.98

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.