mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > CADO-NFS

Reply
 
Thread Tools
Old 2021-09-23, 18:19   #67
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

32×47 Posts
Default

I did the matrix building to see how many uniques were required, but yes, it's faster to just do that once and otherwise only use remdups. So I'll do just that? Or is there some merit in using msieve instead of remdups?



Would it be helpful to sieve at q > 108M? A range of 1M takes slightly more than 3 hours, so a few M can quickly be added overnight.
bur is offline   Reply With Quote
Old 2021-09-23, 18:24   #68
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

11·461 Posts
Default

I don't think we can answer that until we see the data from 15 vs 12 vs 10. I'm expecting 50-60% duplicates in 12-15, and worse in 10-12.

If you do wish to take more data, I think Q=8-10M would tell us more about the optimal starting Q than 108+. Let's see what we learn with the data you have, first.
VBCurtis is offline   Reply With Quote
Old 2021-10-07, 16:52   #69
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

32·47 Posts
Default

Total - Uniques - Duplicates - Ratio

10-100M
214,512,998 - 153,030,781 - 61,482,217 - 71,34%

11-105M
219,448,738 - 158,516,513 - 60,932,225 - 72,23%

12-110M
224,369,197 - 163,859,554 - 60,509,643 - 73,03%

13-110M
220,590,893 - 162,609,603 - 57,981,290 - 73,72%


I also did several steps in between, but ran into a problem. Is it possible that the dimension parameter for remdups4
strongly influences the number of uniques found? Initially I did the remdup manually with dim close to the required maximum value. For the batch file I just used 650 because it worked for all ranges.

But I found that with dim=350 I got 85,319,847 uniques for 10-50M while with dim=650 it was only 83,992,074. Was that just coincidence and something else went wrong (total numbers of rels was the same though) or does the dimension influence it? I can't really imagine it, but who knows.

What I was planning to do is finding the number of uniques for various ranges in 5M or 1M steps. That way it's be possible to find the ideal q-min that will reach the required 152M unqiues within the shortest range. And we'd see if that happens when q-max/q-min = 8.




I don't know if sieving smaller q's makes sense, already at 10M it's much less efficient than at the larger q-mins. Or do you want to see something specific from it?

Last fiddled with by bur on 2021-10-07 at 17:01
bur is offline   Reply With Quote
Old 2021-10-07, 17:38   #70
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

11·461 Posts
Default

I believe a too-small dimension setting will let some duplicate relations sneak through; a reasonable price to pay to control memory use on really big problems. For these normal-sized jobs, I set dim to 3000 so that it is not a factor.

As for Q-min selection, it's a more complicated problem than you think it is. Sec/rel at small Q is typically 50-60% of the time at the ending Q. So, we can tolerate a quite-large duplicate ratio at small Q because the relations are found so quickly. Q=10M might be 75% faster than Q=110M, but yield 45% duplicates vs 15% at high Q. I made up those numbers, but they're typical in the data I gathered. If that's the data, is Q=10-11M worth sieving?

So, if you want to more accurately solve for the best min-Q on this job, you'd need:
relations per second at Q=10M
duplicate rate for 10-11M, found by filtering 10-110M and 11-110M to determine how many uniques are added by sieving 10-11M and then dividing by the total raw relations count for 10-11M.
relations per second at 110M
duplicate rate for 109-110M or 110-111M.

Even then, you'll have data for just one job, and this data varies from job to job. I suggest that Qmax/Qmin = 8 is "good enough" for our purposes.

Edit: Let's look at your last two lines of data, since they have the same ending Q:
12-13M has 3778304 total relations, 1249951 unique. That's a duplicate ratio of 67%, higher than I expected.
If you have data for duplicate ratio above 105M, we could then convert the sieve speeds of each range into a "uniques sec/rel" speed and presto! An answer for 12-13 vs 108-110.

Last fiddled with by VBCurtis on 2021-10-07 at 17:45
VBCurtis is offline   Reply With Quote
Old 2021-10-07, 17:56   #71
charybdis
 
charybdis's Avatar
 
Apr 2020

22316 Posts
Default

The way I tested this was to look at the CPU-time stats in the logfile ('stats_total_cpu_time') to find ranges that took almost exactly the same length of time to sieve, and then see which of these ranges produced the most unique relations.
charybdis is offline   Reply With Quote
Old 2021-10-22, 17:17   #72
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

32·47 Posts
Default

I finished the uniques ratio for a large set of different q-min/q-max settings. The data is attached as an xslx file, zipped since for some reason the forum software didn't like the file. Explanations are as comments in the file.


A brief summary:


I sieved a C167 (AL1992:1644) in the q-range of 5M to 125M. Then I used remdups4 to determine the uniques for different q-ranges with q-min of 5M to 20M in 1M steps and q-max of 50M to 125M (step size 1M or 5M).

msieve required about 152M uniques to build a matrix. As expected, depending on the choice of q-min this took a larger or smaller q-range. Minimum q-range to yield 152M uniques was q-min = 16M; q-max = 105M.

What I found interesting is that the uniques ratio at first decreased with increasing q-range (with fixed q-min) and then began to increase again and after a while to decrease again. It seems to go up and down.

This is just one job, but here the optimal q-min/max ratio was 6.6. It might be interesting to perform a similar series of tests for a different number of similar size to see if it will be that low as well. Anyway, q-min = 16M is very close to your initial choice of 17M, so whatever that was based on, it was a good choice.
Attached Files
File Type: zip Uniques ratio for C167.zip (53.5 KB, 15 views)
bur is offline   Reply With Quote
Old 2021-10-22, 23:33   #73
charybdis
 
charybdis's Avatar
 
Apr 2020

10001000112 Posts
Default

Quote:
Originally Posted by bur View Post
What I found interesting is that the uniques ratio at first decreased with increasing q-range (with fixed q-min) and then began to increase again and after a while to decrease again. It seems to go up and down.
This will be a consequence of CADO sieving below the FB bound. As soon as Q goes above lim1, the probability that a new relation is a duplicate of a previous one drops: for example, if you're using 2LP, a relation with 2 algebraic large primes and a Q value above lim1 cannot be found with any Q below lim1. So the uniques ratio goes up for a while above lim1.

There are probably some cases where the optimal Q "range" leaves a gap below lim1. This adds yet another variable to the testing, and of course CADO doesn't (yet) let you automate this without manual intervention.

Last fiddled with by charybdis on 2021-10-22 at 23:34
charybdis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Some CADO-NFS Work At Around 175-180 Decimal Digits EdH CADO-NFS 127 2020-10-07 01:47
Sigma parameter in ecm storm5510 Information & Answers 4 2019-11-30 21:32
PrimeNet error 7: Invalid parameter ksteczk PrimeNet 6 2018-03-26 15:11
Parameter Underestimation R.D. Silverman Cunningham Tables 14 2010-09-29 19:56
ECM Work and Parameter Choices R.D. Silverman Cunningham Tables 11 2006-03-06 18:46

All times are UTC. The time now is 04:20.


Sun Dec 5 04:20:53 UTC 2021 up 134 days, 22:49, 0 users, load averages: 1.41, 1.16, 1.06

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.