mersenneforum.org Filtering
 Register FAQ Search Today's Posts Mark Forums Read

 2011-07-11, 07:05 #1 Sleepy   May 2011 23 Posts Filtering Hi, I have a question to ask. Based on the filtering step, is the 1st maximum bound 2^32 since the bucket is [0,2^32] and uint32 is used for filtmin_a and filtmin_r? If so, isn't the bound limit a little low if larger numbers are used? Or did i misunderstood the code. Thanks!
 2011-07-11, 10:27 #2 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 33·239 Posts The filtmin parameter serves to divide ideals into 'small' and 'large'; these roughly correspond to the small- and large-prime bounds in sieving, though it's not an exact match. Even for RSA768 the sieving was done with 'small'=2^30. So this particular limit is not a serious restriction at present.
 2011-07-11, 10:50 #3 jasonp Tribal Bullet     Oct 2004 5×709 Posts This is one of several limitations in the filtering that prevents it being realistically used for extremely large problems. A more pressing limitation is that relation numbers max out at 2^32 so that a factorization must have less than 4 billion relations. We're already 25% of the way to that limit for the largest Msieve jobs, but getting further will require an update to the sieving tools that people use.
 2011-07-11, 12:23 #4 Sleepy   May 2011 23 Posts I see.. Other than the filtering bound and number of relations limitations, are there other limitations that prevents the filtering step from being realistically used for extremely large problems?
2011-07-11, 17:17   #5
R.D. Silverman

Nov 2003

22·5·373 Posts

Quote:
 Originally Posted by Sleepy I see.. Other than the filtering bound and number of relations limitations, are there other limitations that prevents the filtering step from being realistically used for extremely large problems?
Memory requirements; One needs a big machine.

 2011-07-11, 19:11 #6 firejuggler     "Vincent" Apr 2010 Over the rainbow 2×3×11×43 Posts according to this M1061 will require 1.5 to 2 billion raw relation, and it is a C320. does this mean that we are limited to C325-C330?
 2011-07-11, 19:57 #7 jasonp Tribal Bullet     Oct 2004 5×709 Posts Besides the limit on the number of relations, there are only a few things stopping, say, filtering on RSA768-size numbers, i.e. many billions of relations: - clique removal needs to be possible from disk files - duplicate and singleton removal needs to be MPI-enabled, otherwise the size of internal hashtables will exceed the memory of the machine - in-memory clique removal has an internal 16GB limit on the size of an intermediate structure With all of these done and a large-memory machine for the merge phase, we'd be good to go. There's a parallel problem that the sieving tools would need to be comfortable with large primes > 2^33
2011-07-12, 14:34   #8
xilman
Bamboozled!

"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across

22×3×941 Posts

Quote:
 Originally Posted by jasonp Besides the limit on the number of relations, there are only a few things stopping, say, filtering on RSA768-size numbers, i.e. many billions of relations: ... - duplicate and singleton removal needs to be MPI-enabled, otherwise the size of internal hashtables will exceed the memory of the machine
It's not clear to me why MPI is needed.

Back in the days of NFSNET we (I and Don Leclair, IIRC) wrote duplicate and singleton removal software for a single threaded single process environment. It wasn't perhaps as fast as it would have been had more memory been available but it worked well.

Perhaps I should dig up the remdup.c software and then take this discussion off-line.

Paul

Last fiddled with by xilman on 2011-07-12 at 14:34 Reason: Fix /tag]

 2011-07-12, 17:51 #9 jasonp Tribal Bullet     Oct 2004 5×709 Posts MPI isn't strictly needed, but when there's a terabyte of relations then you have to account somehow for intermediate lists of things (primes, ideals, relation numbers or coordinates, etc) exceeding the memory of the filtering machine. That becomes much more manageable given a cluster of average size machines all pulling information out of the same dataset and boiling it down by themselves, with a global pass at the end to reconcile all the results. For singleton removal, you at least have to have a hashtable that maps primes or ideals to a unique number, so that relations can be assigned an ideal list. Right now the size of that hashtable is the limiting factor in the memory use of the filtering. (Yes I know there are loads of space-efficient hashtable implementations out there)
 2011-07-13, 06:00 #10 Sleepy   May 2011 23 Posts I see.. okie! So upgrading the filtering steps should not be that complex but rather because of the current sieving limitations, there is not much incentives to cater for large numbers (>768) factorization yet right? Sorry one more question, i am stuck with understanding how the first bound is chosen. What is the role of bin_max, bin_min and hits_per_prime in the duplicate.c source code? Anyway, thanks guys for all the help you guys provided! Really appreciate it! =)
2011-07-13, 11:21   #11
jasonp
Tribal Bullet

Oct 2004

1101110110012 Posts

Quote:
 Originally Posted by Sleepy What is the role of bin_max, bin_min and hits_per_prime in the duplicate.c source code?
You want to choose the filtering bound as the point where large primes start to occur rarely enough that it would be useful for the filtering to see them. We've found by experiment that that point occurs when primes start appearing in less than ~20 relations; filtering can deal with a higher frequency than that but most of the singletons are visible at the 20-per-prime mark. So we need to find the point where the number of times a prime appears is less than 40 (~20 for the rational and ~20 for the algebraic version).

We have a histogram of the number of times a prime occurs, which was computed essentially for free when the first duplicate removal pass ran. Every prime < 2^32 encountered in every relation (including the duplicates :) increments the count in one 128k-size bucket. Then the average number of times a prime appears in bucket i is approximately

(count of hits in bucket i) / (number of primes in bucket i)

and the desired filtering bound is the middle of the first bucket (working from 2^32 backwards) where this number exceeds 40.0.

We don't need to count the exact number of primes in bucket i, instead we compute

(number of primes less than (start of bucket i+1)) -
(number of primes less than (start of bucket i))

using the approximation that the number of primes less than x is about x / log(x). These are what bin_min and bin_max are used for, and hits_per_prime is the computed average.

If you haven't seen it already, read this for more detail on Msieve's NFS filtering. Most of it is still accurate today.

Last fiddled with by jasonp on 2011-07-13 at 11:29

 Similar Threads Thread Thread Starter Forum Replies Last Post jasonp Msieve 36 2018-05-07 19:55 Stargate38 YAFU 4 2016-04-20 16:53 Dubslow Msieve 20 2016-02-05 14:00 richs Msieve 8 2015-01-18 17:40 R.D. Silverman Cunningham Tables 14 2010-08-05 08:30

All times are UTC. The time now is 17:09.

Wed May 18 17:09:58 UTC 2022 up 34 days, 15:11, 1 user, load averages: 1.61, 1.53, 1.52