mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2011-07-11, 07:05   #1
Sleepy
 
May 2011

23 Posts
Default Filtering

Hi,

I have a question to ask. Based on the filtering step, is the 1st maximum bound 2^32 since the bucket is [0,2^32] and uint32 is used for filtmin_a and filtmin_r?
If so, isn't the bound limit a little low if larger numbers are used? Or did i misunderstood the code. Thanks!
Sleepy is offline   Reply With Quote
Old 2011-07-11, 10:27   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·7·461 Posts
Default

The filtmin parameter serves to divide ideals into 'small' and 'large'; these roughly correspond to the small- and large-prime bounds in sieving, though it's not an exact match. Even for RSA768 the sieving was done with 'small'=2^30. So this particular limit is not a serious restriction at present.
fivemack is offline   Reply With Quote
Old 2011-07-11, 10:50   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5×709 Posts
Default

This is one of several limitations in the filtering that prevents it being realistically used for extremely large problems. A more pressing limitation is that relation numbers max out at 2^32 so that a factorization must have less than 4 billion relations. We're already 25% of the way to that limit for the largest Msieve jobs, but getting further will require an update to the sieving tools that people use.
jasonp is offline   Reply With Quote
Old 2011-07-11, 12:23   #4
Sleepy
 
May 2011

101112 Posts
Default

I see.. Other than the filtering bound and number of relations limitations, are there other limitations that prevents the filtering step from being realistically used for extremely large problems?
Sleepy is offline   Reply With Quote
Old 2011-07-11, 17:17   #5
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

23×3×311 Posts
Default

Quote:
Originally Posted by Sleepy View Post
I see.. Other than the filtering bound and number of relations limitations, are there other limitations that prevents the filtering step from being realistically used for extremely large problems?
Memory requirements; One needs a big machine.
R.D. Silverman is offline   Reply With Quote
Old 2011-07-11, 19:11   #6
firejuggler
 
firejuggler's Avatar
 
"Vincent"
Apr 2010
Over the rainbow

5·569 Posts
Default

according to this M1061 will require 1.5 to 2 billion raw relation, and it is a C320. does this mean that we are limited to C325-C330?
firejuggler is offline   Reply With Quote
Old 2011-07-11, 19:57   #7
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5×709 Posts
Default

Besides the limit on the number of relations, there are only a few things stopping, say, filtering on RSA768-size numbers, i.e. many billions of relations:

- clique removal needs to be possible from disk files
- duplicate and singleton removal needs to be MPI-enabled, otherwise the size of internal hashtables will exceed the memory of the machine
- in-memory clique removal has an internal 16GB limit on the size of an intermediate structure

With all of these done and a large-memory machine for the merge phase, we'd be good to go. There's a parallel problem that the sieving tools would need to be comfortable with large primes > 2^33
jasonp is offline   Reply With Quote
Old 2011-07-12, 14:34   #8
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

11·1,039 Posts
Default

Quote:
Originally Posted by jasonp View Post
Besides the limit on the number of relations, there are only a few things stopping, say, filtering on RSA768-size numbers, i.e. many billions of relations:
...

- duplicate and singleton removal needs to be MPI-enabled, otherwise the size of internal hashtables will exceed the memory of the machine
It's not clear to me why MPI is needed.

Back in the days of NFSNET we (I and Don Leclair, IIRC) wrote duplicate and singleton removal software for a single threaded single process environment. It wasn't perhaps as fast as it would have been had more memory been available but it worked well.

Perhaps I should dig up the remdup.c software and then take this discussion off-line.

Paul

Last fiddled with by xilman on 2011-07-12 at 14:34 Reason: Fix /tag]
xilman is offline   Reply With Quote
Old 2011-07-12, 17:51   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·709 Posts
Default

MPI isn't strictly needed, but when there's a terabyte of relations then you have to account somehow for intermediate lists of things (primes, ideals, relation numbers or coordinates, etc) exceeding the memory of the filtering machine. That becomes much more manageable given a cluster of average size machines all pulling information out of the same dataset and boiling it down by themselves, with a global pass at the end to reconcile all the results.

For singleton removal, you at least have to have a hashtable that maps primes or ideals to a unique number, so that relations can be assigned an ideal list. Right now the size of that hashtable is the limiting factor in the memory use of the filtering. (Yes I know there are loads of space-efficient hashtable implementations out there)
jasonp is offline   Reply With Quote
Old 2011-07-13, 06:00   #10
Sleepy
 
May 2011

23 Posts
Default

I see.. okie! So upgrading the filtering steps should not be that complex but rather because of the current sieving limitations, there is not much incentives to cater for large numbers (>768) factorization yet right?

Sorry one more question, i am stuck with understanding how the first bound is chosen.
What is the role of bin_max, bin_min and hits_per_prime in the duplicate.c source code?

Anyway, thanks guys for all the help you guys provided! Really appreciate it! =)
Sleepy is offline   Reply With Quote
Old 2011-07-13, 11:21   #11
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·709 Posts
Default

Quote:
Originally Posted by Sleepy View Post
What is the role of bin_max, bin_min and hits_per_prime in the duplicate.c source code?
You want to choose the filtering bound as the point where large primes start to occur rarely enough that it would be useful for the filtering to see them. We've found by experiment that that point occurs when primes start appearing in less than ~20 relations; filtering can deal with a higher frequency than that but most of the singletons are visible at the 20-per-prime mark. So we need to find the point where the number of times a prime appears is less than 40 (~20 for the rational and ~20 for the algebraic version).

We have a histogram of the number of times a prime occurs, which was computed essentially for free when the first duplicate removal pass ran. Every prime < 2^32 encountered in every relation (including the duplicates :) increments the count in one 128k-size bucket. Then the average number of times a prime appears in bucket i is approximately

(count of hits in bucket i) / (number of primes in bucket i)

and the desired filtering bound is the middle of the first bucket (working from 2^32 backwards) where this number exceeds 40.0.

We don't need to count the exact number of primes in bucket i, instead we compute

(number of primes less than (start of bucket i+1)) -
(number of primes less than (start of bucket i))

using the approximation that the number of primes less than x is about x / log(x). These are what bin_min and bin_max are used for, and hits_per_prime is the computed average.

If you haven't seen it already, read this for more detail on Msieve's NFS filtering. Most of it is still accurate today.

Last fiddled with by jasonp on 2011-07-13 at 11:29
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
NFS filtering in the era of big data jasonp Msieve 36 2018-05-07 19:55
NFS filtering error... Stargate38 YAFU 4 2016-04-20 16:53
The big filtering bug strikes again (I think) Dubslow Msieve 20 2016-02-05 14:00
Filtering Error richs Msieve 8 2015-01-18 17:40
Filtering R.D. Silverman Cunningham Tables 14 2010-08-05 08:30

All times are UTC. The time now is 08:20.


Fri Aug 19 08:20:35 UTC 2022 up 1 day, 5:49, 0 users, load averages: 1.13, 1.02, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โŠ โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โŠ† โŠ‚ โŠ„ โŠŠ โŠ‡ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”