mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   NFS filtering in the era of big data (https://www.mersenneforum.org/showthread.php?t=17926)

jasonp 2014-09-22 15:22

This branch of the code was an experiment for dealing with NFS filtering in an out-of-core fashion, where the dataset was too large to fit in memory all at once, until most of it had been deleted. All of the jobs we can currently handle today will fit in memory at all points, given a nice enough machine. I've seen Msieve run on machines with 512GB of memory! But eventually we will need to think about factorizations that are so large that the only way to do NFS filtering will be to either throw a cluster at it, or work out-of-core.

Berkeley DB was a convenient abstraction for this: no need for deciding file formats, parsing stuff on disk, on-the-fly compression and caching, etc. I didn't use a database because I needed the flexibility of SQL, only because it was convenient and let me focus on e.g. out-of-core sorting of the dataset using several criteria.

Other DB's are indeed faster, but this is not a typical database task. Like everything else in Msieve I've run out of time to play with the idea.

xilman 2014-09-22 16:24

[QUOTE=skan;383658]I'm also interested in finding a good database for scientific computing.[/QUOTE]I'm not competent to comment on your earlier questions. I can provide some input on this one though.

If by "scientific computing" you include bioinformatics, PostgreSQL has some big fans. It underlies FlyBase for instance. I learned about it when working for FlyBase and have subsequently used it to store factoring data for the homogeneous Cullen and Woodall tables.

Paul

debrouxl 2018-05-07 15:49

Putting an old topic back on top :smile:
I recently learned about LMDB, whose API ( [URL]http://www.lmdb.tech/doc/[/URL] ) is inspired by that of Berkeley DB, and which boasts fantastic performance. It supports integer keys.
I had a small look at the code of the msieve-db branch, but I'm not qualified about the filtering algorithm...

jasonp 2018-05-07 19:55

Sorry, but what I said about running out of time four years ago hasn't changed. I wish it was otherwise.


All times are UTC. The time now is 23:25.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.