mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2013-03-13, 11:05   #12
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

The man page for mkdir states that the mode you supply for the new directory gets modified by the umask for your account; on my test box, the created directory does indeed use 755.

Looks like your BDB is pickier about function inputs, or perhaps was compiled with diagnostics turned on. I'll investigate tonight.
jasonp is offline   Reply With Quote
Old 2013-03-13, 20:26   #13
debrouxl
 
debrouxl's Avatar
 
Sep 2009

3D116 Posts
Default

After your tweaks, SVN r860 says
Code:
DB bulk write 3 failed: Invalid argument
debrouxl is offline   Reply With Quote
Old 2013-03-14, 01:54   #14
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

What version of BDB are you using? I don't see the error in linux, but I've always been testing against BDB 5.3 since I've had to build from source everywhere...
jasonp is offline   Reply With Quote
Old 2013-03-14, 10:18   #15
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

That computer is using BDB 5.1.x, as provided in Debian testing amd64.
debrouxl is offline   Reply With Quote
Old 2013-03-15, 14:47   #16
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Still not seeing it in linux; could I trouble you to compile a latter-day BDB version from source? It will install in /usr/local/BerkeleyDB.X.Y by default, and you can change the Msieve compile line to

Code:
make all CC="gcc -L/usr/local/BerkeleyDB.X.Y/lib -I/usr/local/BerkeleyDB.X.Y/include"
Then when running point to the new shared lib like so:
Code:
LD_LIBRARY_PATH=/usr/local/BerkeleyDB.X.Y/lib msieve ...
At this point the only thing I can think of is that the system default BDB is compiled with compression turned off, though I don't know why anyone would want to do that.

I have some more changes in the pipeline after running with a larger dataset (56M relations). The complete filtering for this takes about 30 minutes on the test machine, and duplicate removal takes 38 minutes using BDB with the caveats above. It really is amazing how painful the problem is of minimizing the IO from this step while still moving all the data. Note that in linux you cannot bypass the disk cache without patching the BDB source, and the next best thing is to tune the swappiness of your system as the next round of changes will explain.
jasonp is offline   Reply With Quote
Old 2013-03-15, 15:23   #17
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

Quote:
Ideally we will eventually be able to specify a maximum amount of memory that filtering will be allowed to take, and any requirements for memory above that will be converted to manipulating disk files. Thus a given machine will be able to handle problems much larger than the available memory.
I'm really not sure that this is a sensible thing to do.

Disk files are two orders of magnitude slower than memory in the streaming case and six orders of magnitude slower in the seeking case; filtering is fast, but not fast enough that slowing it by two orders of magnitude would not be a disaster. It's not obvious to me that you don't have enough space, on a cluster big enough to run the matrix, to keep the whole thing in the memories of cluster nodes.

I can see quite a strong argument for sharding the data across many files - I think I'd rather deal with 120 4GB files, one for each residue class of [x,y] mod 11, each stored on /tmp of a separate cluster node, than with a single half-terabyte file; I've written the code to do that for de-dupe and naive-singleton-removal - but I'm really not convinced that a conventional database is the right tool for doing that.

I'm probably biased by having a personally-owned 64GB machine, on which memory size is not a restriction for any job small enough to complete within the lifetime of the machine. But I would expect that the kind of centres that host HPC clusters would have access to machines with 512GB main RAM: my local supplier, who is not particularly low-price, will sell me one for £8000, which compares very well with 23 days of depreciation on 576 cluster nodes.
fivemack is offline   Reply With Quote
Old 2013-03-15, 15:37   #18
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

37×263 Posts
Default

Quote:
Originally Posted by fivemack View Post
...I think I'd rather deal with 120 4GB files, one for each residue class of [x,y] mod 11, each stored on /tmp of a separate cluster node...
My apologies if I'm telling you how to chew gum with the below, but I would caution against hard-coding /tmp/ as the file(s) location.

On many *nix systems, /tmp/ is a separate partition specifically to prevent the ugo+w state of the directory from being used by an attacker who gains access to the machine (or a stupid programmer) from filling up the / file system, thus causing a DOS (for, for example, /var/ and /home/ if they're not separate partitions).

Quote often /tmp/ is relatively small.
chalsall is online now   Reply With Quote
Old 2013-03-15, 15:55   #19
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2·3·109 Posts
Default

Quote:
Originally Posted by fivemack View Post
I'm really not sure that this is a sensible thing to do.

Disk files are two orders of magnitude slower than memory in the streaming case and six orders of magnitude slower in the seeking case; filtering is fast, but not fast enough that slowing it by two orders of magnitude would not be a disaster. It's not obvious to me that you don't have enough space, on a cluster big enough to run the matrix, to keep the whole thing in the memories of cluster nodes.
Are the speeds you're quoting for rotating or SSD drives? SSDs have been coming down in price in the last 4 months at a jaw-dropping rate. A SSD raid array is incredibly fast; at work, we have been replacing our rotating raid arrays with SSDs on our more heavily used servers with terrific results.
tServo is offline   Reply With Quote
Old 2013-03-15, 15:58   #20
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

Quote:
Originally Posted by tServo View Post
Are the speeds you're quoting for rotating or SSD drives? SSDs have been coming down in price in the last 4 months at a jaw-dropping rate. A SSD raid array is incredibly fast; at work, we have been replacing our rotating raid arrays with SSDs on our more heavily used servers with terrific results.
Rotating; I agree that SSDs solve the seek problem, but even several ganged-up SATA links are an awful lot slower than two-channel DDR.
fivemack is offline   Reply With Quote
Old 2013-03-15, 16:34   #21
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

All of this is really an experiment in what's possible. I foresee a day in the near future when NFS@Home will have 10TB of relations for a single job, and Teragrid lined up for the matrix, but no way to bridge the two because filtering will require 1TB of memory with the current code and nobody has the time / smarts to run the memory-intensive parts of the filtering across a cluster. Even if we had such a machine the hashtable data structures we use don't scale up enough to use it effectively.

Currently, for very big jobs more than half of the filtering time is stuck in clique removal, because with a ton of excess relations we need a ton of in-memory passes to remove most of the cliques. It would be hugely faster to just do the clique removal in place on the data while it's on disk, but that requires breadth-first search on an out-of-core dataset, which sucks.

I'm not proposing doing the merge phase out of core, but it's currently not the limiting factor anyway. The limiting factor is anything we use a hashtable for that requires the complete dataset, i.e. the duplicate removal, singleton removal, and matrix build (and to a lesser extent the clique removal).

(Paul's group did run the dataset for RSA704 on a machine with 512GB of memory; I've been interested for a while in answering questions like whether access to such a machine is mandatory or optional)

Last fiddled with by jasonp on 2013-03-15 at 16:41
jasonp is offline   Reply With Quote
Old 2013-03-18, 11:48   #22
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

When building under Debian sid, against libdb5.3-dev (which is not default yet, AFAICS), msieve-db works

The dataset (for a 512-bit semi-prime) was uncompressed, remdups'ed.
Code:
commencing relation filtering
setting target matrix density to 85.0
estimated available RAM is 966.0 MB
commencing duplicate removal, pass 1
Make sure to do 'echo 0 > /proc/sys/vm/swappiness'
read 10M relations
read 20M relations
read 30M relations
read 40M relations
read 50M relations
found 55905073 relations
commencing duplicate removal, pass 2
merging 8 temporary DB files
Make sure to do 'echo 0 > /proc/sys/vm/swappiness'
wrote 0 duplicates and 55905073 unique relations
reading ideals above 1114112
commencing singleton removal
Make sure to do 'echo 0 > /proc/sys/vm/swappiness'
commencing ideal list build, pass 2
Make sure to do 'echo 0 > /proc/sys/vm/swappiness'
found 48265306 unique ideals
RelProcTime: 3007
elapsed time 00:50:09
The file timestamps indicate that making tmpdb.* took ~14 minutes, making relations.db took ~19 minutes, and making ideals.db took ~17 minutes. That was on Core i7-2670QM @ 2.2 GHz, 4 GB RAM, processor otherwise idle.


BTW, possibly silly question: would other key/value database stores/engines work for the purpose of filtering NFS relations ? Nowadays, there's a huge variety of key/value database software.

Last fiddled with by debrouxl on 2013-03-18 at 12:08
debrouxl is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
NFS filtering error... Stargate38 YAFU 4 2016-04-20 16:53
The big filtering bug strikes again (I think) Dubslow Msieve 20 2016-02-05 14:00
Filtering Sleepy Msieve 25 2011-08-04 15:05
Filtering R.D. Silverman Cunningham Tables 14 2010-08-05 08:30
Filtering Phenomenon R.D. Silverman NFSNET Discussion 2 2005-09-16 04:00

All times are UTC. The time now is 01:08.


Sat Jul 17 01:08:53 UTC 2021 up 49 days, 22:56, 1 user, load averages: 1.31, 1.67, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.