mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve v1.49 feedback (https://www.mersenneforum.org/showthread.php?t=15678)

jasonp 2012-01-12 14:57

My main development machine dates from 2007, so no :)

The linear algebra uses a block size of 64, and the modifications I've tried (at least twice now, including once with MPI) increase that to 128, 192 or 256 depending on compile options. I also changed the use of MMX to SSE2, since that's a natural fit at the 128 or 256 block size. Scaling up the block size means reducing the number of matrix multiplies by a similar factor, with each multiply dealing with a scaled-up amount of data. The big question is how much longer the multiply takes.

On all the machines that both I and Greg have tried the modified code on, the best we could ever get was an approximately equal runtime between size 64 and size 128. Greg also tried the larger block size code on some Teragrid nodes last year, with the same result.

I still have the old code lying around, and it should be a drop-in replacement for the Lanczos code in use now (the changes are really messy but limited to the LA source only), so I can dig it out if anyone wants. New performance measurements will be a bit tricky, since the mainline has gotten better since those changes were written.

Edit: Henry, bsquared is now the authority on high-performance QS. I don't think AVX would make QS appreciably faster

bsquared 2012-01-12 15:59

[QUOTE=jasonp;286054]I don't think AVX would make QS appreciably faster[/QUOTE]

I don't readily see it either. The most obvious place it *could* help is in the large prime bucket sieve, which is 90% assembly code at this point. In there, I compute the updates to 4 roots at a time (snippet below):

[CODE][FONT=Consolas][SIZE=2][COLOR=#0000ff]
[SIZE=2][FONT=Consolas][COLOR=#0000ff][SIZE=2][FONT=Consolas][COLOR=#0000ff]#define[/COLOR][/FONT][/SIZE][/COLOR][/FONT][/SIZE][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] COMPUTE_4_PROOTS(j) \[/SIZE][/FONT]
[SIZE=2][FONT=Consolas]ASM_G ( \[/FONT][/SIZE]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rax), %%xmm3 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm3 = next 4 values of rootupdates */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rcx), %%xmm1 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm1 = next 4 values of root1 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "psubd %%xmm3, %%xmm1 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* root1 -= ptr */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rdx), %%xmm2 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm2 = next 4 values of root2 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "psubd %%xmm3, %%xmm2 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* root2 -= ptr */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pxor %%xmm4, %%xmm4 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* zero xmm4 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pxor %%xmm5, %%xmm5 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* zero xmm5 */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa (%%rbx), %%xmm0 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* xmm0 = next 4 primes */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pcmpgtd %%xmm1, %%xmm4 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* signed comparison: 0 > root1? [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]\[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pcmpgtd %%xmm2, %%xmm5 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* signed comparison: 0 > root2? [/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]\[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pand %%xmm0, %%xmm4 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* copy prime to overflow locations */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "pand %%xmm0, %%xmm5 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* copy prime to overflow locations */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "paddd %%xmm4, %%xmm1 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* selectively add back prime (modular subtract) */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa %%xmm1, (%%rcx) \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* save new root1 values */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "paddd %%xmm5, %%xmm2 \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* selectively add back prime (modular subtract) */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515][FONT=Consolas][SIZE=2][COLOR=#a31515] "movdqa %%xmm2, (%%rdx) \n\t"[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000][FONT=Consolas][SIZE=2][COLOR=#008000]/* save new root2 values */[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] \[/SIZE][/FONT]
[/SIZE][/FONT][/CODE]

Where AVX *could* provide direct benefit is in doing 8 roots at a time instead of 4, using the same instructions. But for whatever reason, AVX doesn't extend the padd/pxor/pand instructions to 256 bits. They get 3 operand variants, but not 256 bit variants. So it's not immediately clear what benefit they will be.

Even if there were the correct instructions, there simply isn't any way to speed up the writes to random memory locations more than I already have (and others before me, by inventing things like the bucket sieve), at least as far I can see. So at best AVX could only provide a small percentage increase in speed, rather than the 2x increases that some other code (with different requirements) may be able realize.

debrouxl 2012-01-24 08:30

I need to perform polynomial selection for the remaining C151 cofactor of 439^89-1 (the full-sized number is 236 digits)... so I fired up msieve trunk r706 on a GT 540M.
I ran -np1 and -np2 in two different processes. I killed msieve -np1 after ~23h of wall clock time for -np1, it produced ~48e6 bytes of hits for the 12-5120 range of a5 (!). So far, msieve -np2 has taken ~25 hours of CPU time, on one hyperthread of Core i7-2670QM @ 2.2 GHz, and boiled down hits for 12-2000, producing several polynomials with e ~4.9e-12, and an outlier at e ~5.25e-12. Perhaps more to come
This means that for a C151:
* the GPU polynomial selection stage 1 is so fast, even on a GPU significantly less powerful than the top of the line, that the CPU stage 2 cannot digest the stage 1 output in real time (on a single thread) :smile:
* the best e values are somewhat higher than the estimation printed by msieve at the beginning, which was, IIRC, ~4.5e-12.

For fun, I think I should try running 24h of GPU-based polynomial selection on one of the TI-Z80/TI-68k 512-bit RSA signing keys, and compare the output with the polynomial we used for sieving on RSALS...

pinhodecarlos 2012-01-30 10:58

I can't find Batalov's comment about this but I know I red it somewhere. It's about how to resume msieve jobs with less disk transfers. For RSALS post-processing I usually have two batch files, one that starts the job and other that resumes the job in case something happens and the machine is rebooted.

First batch is

[code]
start /low /min msieve.exe -i 383_89_minus1.ini -s 383_89_minus1.dat -nf 383_89_minus1.fb -v -nc -t 4
[/code]Second batch (resume work) is

[code]
start /low /min msieve.exe -i 383_89_minus1.ini -s 383_89_minus1.dat -nf 383_89_minus1.fb -nc3 -ncr -v -t 4
[/code]

So are the flags correct for the second batch? I keep always all files in the folder except the gzip one.

Thank you in advance,

Carlos

Batalov 2012-03-01 22:31

[QUOTE=Batalov;277194]Clarification: I did link to 1.2.3 and observed Lionel's effect, and I also did link to 1.2.5 as well as used NO_ZLIB=1. Ran a not-so-small filtering job 5 times for each scenario. (If you are using a NAS drive, do that too, to remove variability.) Ran valgrind (callgrind) for a while only to see that gzread was called an inordinate amount of times (see the quoted [URL="http://stackoverflow.com/questions/2832485/zlib-gzgets-extremely-slow"]answer[/URL], it is indeed one gzread call per byte! He's right!)

Result: 1.2.3 is bad (about as bad as Lionel described - almost 2 times slower), but 1.2.5 has the same speed as NO_ZLIB=1 for plain files and negligible overhead on gzipped .dat file (which on systems may actually be a negative overhead, not to mention smaller footprint). Use zlib 1.2.5.[/QUOTE]
[URL="http://zlib.net/"]zlib[/URL] 1.2.6 is out.

Version 1.2.6 has many changes over 1.2.5, including [URL="http://zlib.net/"]improvements[/URL].

debrouxl 2012-03-02 08:28

Yup. And this time, unlike what occurred for zlib 1.2.5, zlib 1.2.6 immediately made its way to Debian unstable, from where it was pushed into testing :smile:

chris2be8 2012-03-23 18:27

Hello,

I've been factoring small SNFS composites from factordb with my old P4 (all it can usefully cope with) and found a few had had very little ECM done against them. This can cause msieve problems as in the log below:

Fri Mar 23 18:05:57 2012 Msieve v. 1.49 (SVN 18sep2011)
Fri Mar 23 18:05:57 2012 random seeds: ef4ea53f 50c26a13
Fri Mar 23 18:05:57 2012 factoring 392897621444950534760946692014491292757040653381539546673010579841680740738641371664069 (87 digits)
Fri Mar 23 18:05:58 2012 searching for 15-digit factors
Fri Mar 23 18:05:59 2012 commencing number field sieve (81-digit input)
Fri Mar 23 18:05:59 2012 warning: NFS input not found in factor base file
Fri Mar 23 18:05:59 2012 p7 factor: 1453411
Fri Mar 23 18:05:59 2012 c81 factor: 270327953651754758124815824301929249714664780562098089716543069951776022569418679
Fri Mar 23 18:05:59 2012 elapsed time 00:00:02

Apart from "don't do that then" is there a way to make msieve work in this case?

Chris

jasonp 2012-03-23 21:43

Not as written; small factors are always pulled out of any input, and I don't know what would happen if you forced NFS to happen anyway. Fortunately even using QS you should be done with a c81 in ~10 minutes (even faster with YAFU's QS).

chris2be8 2012-03-24 17:22

The thing that's bitten me a few times was that factMsieve.pl didn't notice the messages, so kept sieving relations and calling msieve expecting it to create the cols file. Which wasted several hours overnight since I wasn't watching the screen.

I'll probably update the script calling factMsieve.pl to run ecm to 20 digits first. That should stop it.

I've also had a few cases where it didn't always find the factor by ECM. I've had to re-run the square root stage for a few numbers until ECM failed and the square root worked.

I've also had cases where after removing all the algebraic and small factors from the composite provided by factordb I was left with a prime number. But that's not msieve's fault.

Chris

chris2be8 2012-05-01 16:20

linear algebra loop.
 
I've just come home and found the following screen output:

[code] =>nice -n 19 "/home/chris/ggnfs/bin/msieve" -s d1532_34_2.dat -l ggnfs.log -i d1532_34_2.ini -v -nf d1532_34_2.fb -t 2 -nc2


Msieve v. 1.49 (SVN 18sep2011)
Tue May 1 12:48:30 2012
random seeds: a5a730ef e813bf12
factoring 75535181078074845018437161937882151367685849773415089372524470237498845846659369667108907186263191136699 (104 digits)
searching for 15-digit factors
commencing number field sieve (104-digit input)
R0: -30343810840966799284043776
R1: 1
A0: 1
A1: 0
A2: 766
A3: 0
A4: 1173512
skew 0.03, size 5.861e-12, alpha 1.196, combined = 8.786e-08 rroots = 0

commencing linear algebra
read 61231 cycles
cycles contain 186631 unique relations
read 186631 relations
using 20 quadratic characters above 33539634
building initial matrix
memory use: 21.7 MB
read 61231 cycles
matrix is 61053 x 61231 (18.6 MB) with weight 5560889 (90.82/col)
sparse part has weight 4193873 (68.49/col)
filtering completed in 2 passes
matrix is 61050 x 61228 (18.6 MB) with weight 5560797 (90.82/col)
sparse part has weight 4193853 (68.50/col)
matrix starts at (0, 0)
matrix is 61050 x 61228 (18.6 MB) with weight 5560797 (90.82/col)
sparse part has weight 4193853 (68.50/col)
saving the first 48 matrix rows for later
matrix includes 64 packed rows
matrix is 61002 x 61228 (17.3 MB) with weight 4389705 (71.69/col)
sparse part has weight 3921569 (64.05/col)

using block size 24400 for processor cache size 1024 kB
commencing Lanczos iteration
memory use: 12.7 MB
linear algebra at 78.7%, ETA 0h 0m61228 dimensions (78.7%, ETA 0h 0m)
linear algebra completed 21442763 of 61228 dimensions (35021.2%, ETA 925h 2m)
[/code]
Notice completed dimensions is much larger than total dimensions. I've killed it since it was going nowhere, but saved all the files it created.

Any idea what was wrong?

Chris

PS. The SVN 18sep2011 is because I don't have SVN set up on this system, I tweaked the makefile to make it say 18sep2011 because I downloaded msieve then.

jasonp 2012-05-01 16:39

Looks like the LA is not converging.

It's a small enough dataset that I can download it if you post it somewhere, but offhand I'm wondering how stable the machine is. A small matrix like this does not have error checking enabled, so if it's a transient glitch you can try rerunning.


All times are UTC. The time now is 04:52.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.