Using Several Instances of Aliqueit for a large gnfs job
 2011-12-12, 22:52 #2 bsquared     "Ben" Feb 2007 329810 Posts If the relations were "no good" then you would have either had skads of error message during filtering or perhaps much higher than usual duplicate relations (30% or more, say). If you saw neither of these situations, then the relations were probably fine and the 113% is just due to do a low initial guess or something. As long as you are doing things manually on linux anyway, that doesn't sound any easier than just working directly with gnfs-lasieve*. Although I suppose it avoids figuring out starting Q and min rels figures.
 Originally Posted by EdH ... or if this is just due to an underestimate in the 64-bit machine's factmsieve.py script?
"That's a bingo. Is that the way you say it?"

 2011-12-13, 03:12 #4 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 31×109 Posts Thanks Guys, I saw no error messages and these are the duplicate removals: Code: Mon Dec 12 06:30:50 2011 found 3884965 hash collisions in 23001181 relations Mon Dec 12 06:31:29 2011 added 36 free relations Mon Dec 12 06:31:29 2011 commencing duplicate removal, pass 2 Mon Dec 12 06:33:09 2011 found 3653999 duplicates and 19347218 unique I guess that's about 16% duplication? I'm going to evaluate the ease of both methods and "maybe" write some steps to take to add machines. That way I'll know where to remind myself how and maybe someone else can find it useful.
 2011-12-13, 03:33 #5 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 23×7×163 Posts Seriously speaking, there's also a possibility that you evaluated the necessary Q-range on the admission that the relation yield is a constant. But it isn't, and it is not easy to guesstimate it. Generally, it goes down as Q goes up, but the question is - how fast. One way (frequently used before launching large projects) is a dense set of spot checking runs (with many starting Qs and a span of 2000 or a 1000), followed by a spline (or better yet with normalization the by number of reported special_q's), and a guesstimate from experience with similar runs of what redundancy is going to be.
 Originally Posted by Batalov Seriously speaking, there's also a possibility that you evaluated the necessary Q-range on the admission that the relation yield is a constant. But it isn't, and it is not easy to guesstimate it. Generally, it goes down as Q goes up, but the question is - how fast. One way (frequently used before launching large projects) is a dense set of spot checking runs (with many starting Qs and a span of 2000 or a 1000), followed by a spline (or better yet with normalization the by number of reported special_q's), and a guesstimate from experience with similar runs of what redundancy is going to be.
I actually follow all this, but haven't the experience to make use of it. I therefore made use of the following logic:

For example, let's say q is going up by 1M each time and relations are growing at a rate of 5% for each 1M. And, it started at 20M. 100% (in a perfect world) would place the top at 40M. So, let's start machine 2 at 40M. I'm hoping that the relations turned up by machine 2 will offset the 40M top of machine 1 downward more so than the diminishing relations will affect the overall count. The trickier part is figuring out the starting points for machines 3, 4, 5, etc. I don't want any overlap there either, but the further away from the machine 1 range, the less return.

 2011-12-13, 18:58 #7 bchaffin   Sep 2010 Portland, OR 7×53 Posts Note that factmseive.py already has some support for running multiple threads on each of multiple machines. It doesn't hook directly into aliqueit, but as bsquared said if you're doing a bunch of manual work anyway it may be simpler just to throw the number to factmsieve and then pass the answer back to aliqueit.

