![]() |
|
|
#12 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
The man page for mkdir states that the mode you supply for the new directory gets modified by the umask for your account; on my test box, the created directory does indeed use 755.
Looks like your BDB is pickier about function inputs, or perhaps was compiled with diagnostics turned on. I'll investigate tonight. |
|
|
|
|
|
#13 |
|
Sep 2009
977 Posts |
After your tweaks, SVN r860 says
Code:
DB bulk write 3 failed: Invalid argument |
|
|
|
|
|
#14 |
|
Tribal Bullet
Oct 2004
354110 Posts |
What version of BDB are you using? I don't see the error in linux, but I've always been testing against BDB 5.3 since I've had to build from source everywhere...
|
|
|
|
|
|
#15 |
|
Sep 2009
11110100012 Posts |
That computer is using BDB 5.1.x, as provided in Debian testing amd64.
|
|
|
|
|
|
#16 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
Still not seeing it in linux; could I trouble you to compile a latter-day BDB version from source? It will install in /usr/local/BerkeleyDB.X.Y by default, and you can change the Msieve compile line to
Code:
make all CC="gcc -L/usr/local/BerkeleyDB.X.Y/lib -I/usr/local/BerkeleyDB.X.Y/include" Code:
LD_LIBRARY_PATH=/usr/local/BerkeleyDB.X.Y/lib msieve ... I have some more changes in the pipeline after running with a larger dataset (56M relations). The complete filtering for this takes about 30 minutes on the test machine, and duplicate removal takes 38 minutes using BDB with the caveats above. It really is amazing how painful the problem is of minimizing the IO from this step while still moving all the data. Note that in linux you cannot bypass the disk cache without patching the BDB source, and the next best thing is to tune the swappiness of your system as the next round of changes will explain. |
|
|
|
|
|
#17 | |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
Quote:
Disk files are two orders of magnitude slower than memory in the streaming case and six orders of magnitude slower in the seeking case; filtering is fast, but not fast enough that slowing it by two orders of magnitude would not be a disaster. It's not obvious to me that you don't have enough space, on a cluster big enough to run the matrix, to keep the whole thing in the memories of cluster nodes. I can see quite a strong argument for sharding the data across many files - I think I'd rather deal with 120 4GB files, one for each residue class of [x,y] mod 11, each stored on /tmp of a separate cluster node, than with a single half-terabyte file; I've written the code to do that for de-dupe and naive-singleton-removal - but I'm really not convinced that a conventional database is the right tool for doing that. I'm probably biased by having a personally-owned 64GB machine, on which memory size is not a restriction for any job small enough to complete within the lifetime of the machine. But I would expect that the kind of centres that host HPC clusters would have access to machines with 512GB main RAM: my local supplier, who is not particularly low-price, will sell me one for £8000, which compares very well with 23 days of depreciation on 576 cluster nodes. |
|
|
|
|
|
|
#18 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
260316 Posts |
Quote:
On many *nix systems, /tmp/ is a separate partition specifically to prevent the ugo+w state of the directory from being used by an attacker who gains access to the machine (or a stupid programmer) from filling up the / file system, thus causing a DOS (for, for example, /var/ and /home/ if they're not separate partitions). Quote often /tmp/ is relatively small. |
|
|
|
|
|
|
#19 | |
|
"Marv"
May 2009
near the Tannhäuser Gate
12168 Posts |
Quote:
|
|
|
|
|
|
|
#20 | |
|
(loop (#_fork))
Feb 2006
Cambridge, England
11001000100112 Posts |
Quote:
|
|
|
|
|
|
|
#21 |
|
Tribal Bullet
Oct 2004
1101110101012 Posts |
All of this is really an experiment in what's possible. I foresee a day in the near future when NFS@Home will have 10TB of relations for a single job, and Teragrid lined up for the matrix, but no way to bridge the two because filtering will require 1TB of memory with the current code and nobody has the time / smarts to run the memory-intensive parts of the filtering across a cluster. Even if we had such a machine the hashtable data structures we use don't scale up enough to use it effectively.
Currently, for very big jobs more than half of the filtering time is stuck in clique removal, because with a ton of excess relations we need a ton of in-memory passes to remove most of the cliques. It would be hugely faster to just do the clique removal in place on the data while it's on disk, but that requires breadth-first search on an out-of-core dataset, which sucks. I'm not proposing doing the merge phase out of core, but it's currently not the limiting factor anyway. The limiting factor is anything we use a hashtable for that requires the complete dataset, i.e. the duplicate removal, singleton removal, and matrix build (and to a lesser extent the clique removal). (Paul's group did run the dataset for RSA704 on a machine with 512GB of memory; I've been interested for a while in answering questions like whether access to such a machine is mandatory or optional) Last fiddled with by jasonp on 2013-03-15 at 16:41 |
|
|
|
|
|
#22 |
|
Sep 2009
3D116 Posts |
When building under Debian sid, against libdb5.3-dev (which is not default yet, AFAICS), msieve-db works
![]() The dataset (for a 512-bit semi-prime) was uncompressed, remdups'ed. Code:
commencing relation filtering setting target matrix density to 85.0 estimated available RAM is 966.0 MB commencing duplicate removal, pass 1 Make sure to do 'echo 0 > /proc/sys/vm/swappiness' read 10M relations read 20M relations read 30M relations read 40M relations read 50M relations found 55905073 relations commencing duplicate removal, pass 2 merging 8 temporary DB files Make sure to do 'echo 0 > /proc/sys/vm/swappiness' wrote 0 duplicates and 55905073 unique relations reading ideals above 1114112 commencing singleton removal Make sure to do 'echo 0 > /proc/sys/vm/swappiness' commencing ideal list build, pass 2 Make sure to do 'echo 0 > /proc/sys/vm/swappiness' found 48265306 unique ideals RelProcTime: 3007 elapsed time 00:50:09 BTW, possibly silly question: would other key/value database stores/engines work for the purpose of filtering NFS relations ? Nowadays, there's a huge variety of key/value database software. Last fiddled with by debrouxl on 2013-03-18 at 12:08 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| NFS filtering error... | Stargate38 | YAFU | 4 | 2016-04-20 16:53 |
| The big filtering bug strikes again (I think) | Dubslow | Msieve | 20 | 2016-02-05 14:00 |
| Filtering | Sleepy | Msieve | 25 | 2011-08-04 15:05 |
| Filtering | R.D. Silverman | Cunningham Tables | 14 | 2010-08-05 08:30 |
| Filtering Phenomenon | R.D. Silverman | NFSNET Discussion | 2 | 2005-09-16 04:00 |