mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve v1.46 feedback (https://www.mersenneforum.org/showthread.php?t=13676)

em99010pepe 2010-07-31 19:37

Msieve v1.46 feedback
 
jasonp,

I have a stupid question, will msieve scale better on one socket quad machines?

jasonp 2010-07-31 19:45

Multithreading has been added to the vector-vector operations, so you should see a 5-10% speedup when using multiple threads, even without using MPI. Compiling with MPI and running on only one CPU adds a little overhead, it seems to be about 10% slower in Greg's tests.

em99010pepe 2010-07-31 19:48

Another thing I don't understand is the following. When I use LA for a 29 bits integer on a quad-core machine I don't have peaks of 99 % of CPU usage, instead I have ~85 %, but when I use LA for 31 bits integers I get peaks of 99 % on CPU usage. I get this behave of different quad-machines with different OS (Win XP 64 bits and Win 7), different clock speed, different type of memory (one DDR2 and the other DDR3).

Thank you in advance.

em99010pepe 2010-07-31 19:52

[quote=jasonp;223488]Multithreading has been added to the vector-vector operations, so you should see a 5-10% speedup when using multiple threads, even without using MPI. Compiling with MPI and running on only one CPU adds a little overhead, it seems to be about 10% slower in Greg's tests.[/quote]

I see...On quad-core I should run the new version without using MPI.

Greg, could you post here your benches?

jasonp 2010-07-31 20:02

My guess is that matrices from factorizations that use 29-bit large primes are not that large, all things considered, so with large enough caches on your machine the bottleneck would be in the cache controller trying to statisfy four hungry threads at the same time. Matrices derived from 31-bit large primes are going to have dimension 2-3x larger, and this would mean much more pressure on the DRAM controller, so multiple threads can keep up with the load in that case.

I should also mention that you can try compiling with LARGEBLOCKS=1; this turns on some optimizations due to Serge Batalov, that increase the memory use but make the LA a great deal faster (he reports 25% faster for large problems).

em99010pepe 2010-08-01 08:44

[quote=jasonp;223491]

I should also mention that you can try compiling with LARGEBLOCKS=1; this turns on some optimizations due to Serge Batalov, that increase the memory use but make the LA a great deal faster (he reports 25% faster for large problems).[/quote]

Large problems, what matrices size?
Thank you.

jasonp 2010-08-01 13:55

The changes definitely help for NFS@home size matrices, and possibly for matrices smaller than that. Serge can provide more detail.

Batalov 2010-08-01 19:44

a few words about LARGEBLOCKS and ZLIB
 
LARGEBLOCKS=1 needs testing (two runs with a standard and the experimental binary, comparison, repeat...). I didn't try it on small matrices, but it had been used on [I]very large[/I] ones with good results. Note that L2/L3 cache size will matter; some CPUs will do better than others (I only tested K8/K10 AMD CPUs).

Similarly to [URL="http://mersenneforum.org/showthread.php?t=13493"]increased TARGET_DENSITY[/URL], this patch may be not helping small projects. (see Tom's results for a smaller project)
_____________

The ZLIB patch is another thing that needs testing. I have tested it extensively in different scenarios and straigthened some wrinkles. In a nutshell, the .dat file can be now gzipped... [B][I]or[/I][/B] plain. Right now, for most users, there will be no perceptible difference (if the .dat file is plain, msieve reads and appends to it plainly), but adventurous users may want to modify the perl and python scripts to create and maintain a gzipped data file (when it is first created, with a single "N ...." line in it, gzip it; then lasieve with the option -z, and concatenate the gzipped relations >> test.dat.gz; msieve will read and will append gzipped free relations). When using msieve in QS mode, curious users will find that the .dat file is internally gzipped as well (I've tested that QS can be interrupted and continued; QS that was previously sieved with earlier msieve can be continued as well - the file will remain plain).

Some platforms don't have ZLIB. For these (or if you simply don't want this feature), use NO_ZLIB=1 when using Makefile.

debrouxl 2010-08-02 08:38

I think that you already have access to all you need in the way of large datasets - but should you need the RSALS datasets for experimenting with LARGEBLOCKS=1, we'll be happy to provide them :smile:

What about adding a switch to msieve for setting the value of TARGET_DENSITY at run-time, rejecting values outside of an area of sane values ? There are only two occurrences of TARGET_DENSITY in msieve, its definition and its use, both in common/filter/merge.c.

Batalov 2010-08-02 09:19

I think command-line can easily be filled with tons of parameters, but .fb (msieve doesn't read the .poly file, you know) may be a better place. TARGET_DENSITY could go there, and some other tunable parameters, e.g. MAX_IDEALS_WEIGHT_START 40 (which is another frequently tweaked parameter*). Jasonp will be a better person to decide.

______
*[SIZE=1]for computers with enough memory this defaults to 200 when filtering is done in memory; but when it doesn't fit, (old style) the program loops this value from 20 in steps of 5 with occasional "matrix can improve, retrying". With a good hunch, one can start from a value they like, e.g. 40 or 45; as of now, this requires recompilation.[/SIZE]

jasonp 2010-08-02 12:42

The FB file is a good place to put extra options, although if you use msieve for sieving then the FB file will be huge. Right now modifiers to the command line options are handled in a pretty silly way, there is a nfs_lower and nfs_upper field which allows all NFS-related command line options to share two 64-bit numerical options. A much better idea would be to give each of the NFS phases a string that contains phase-specific options, and let each phase parse its own options. This would allow things like
[code]
msieve -np1 "-degree 5 -coeff_min 1000 -coeff_max 10000" -np2 "-target_evalue 8e-7"
[/code]
with sensible defaults chosen for everything. Doing things this way would make msieve.exe act like a wrapper around a group of different binaries that are standalone projects in other NFS suites, much like openssl.exe wraps 50 different programs into one blob. I was reluctant to add all the bells and whistles that control NFS filtering, because the experience here shows that it's very easy to choose parameters that make the result 5% better but chew up 10x as much memory. But many people here also have their own custom msieve binaries with the bells and whistles changed, so that's an academic concern.


All times are UTC. The time now is 04:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.