mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Long vectors work well for me (https://www.mersenneforum.org/showthread.php?t=23550)

fivemack 2018-08-04 12:40

Long vectors work well for me
 
After spending most of yesterday shaving yaks (mainly building gcc-8.2.0, since gcc-5.4 doesn't generate vector instructions for the VBITS=256 code in msieve-lacuda), I have 256-bit-wide vectors working on my Skylake Xeon box, and they're substantially faster than 64-bit-wide vectors: about 40 hours rather than about 60 hours for a 13M matrix.

pinhodecarlos 2018-08-04 13:27

Windows binaries in here:

[url]http://www.mersenneforum.org/showpost.php?p=479462&postcount=61[/url]

Think I’m using the 256 bits version as well.

wombatman 2018-08-04 19:25

Would using a 128 or 256 bit compilation of msieve help on an ivybridge?

ATH 2018-08-04 20:16

[QUOTE=pinhodecarlos;493128]Windows binaries in here:

[url]http://www.mersenneforum.org/showpost.php?p=479462&postcount=61[/url]

Think I’m using the 256 bits version as well.[/QUOTE]

Are you sure those work properly? They were not compiled with GCC 8.2.0 but with GCC 7.3.0. Will these 256 bit vectors only work on Skylake processors?

pinhodecarlos 2018-08-04 20:29

My last 10 msieve post processing were done with it, no issues. Laptop has an ivy bridge processor

fivemack 2018-08-04 20:30

The makefile does -march=native so a build on a Skylake Xeon might well not work on IVB or even on a Skylake non-Xeon. I’ll try doing a build on my IVB machine tomorrow and see if I can produce some timings.

pinhodecarlos 2018-08-04 20:39

Apologies, I was typing through my iphone. I've just confirmed checking my folder that the msieve version I've been using lately is msieve-svn1018-vbits128-sandybridge. At least that's the only one I have unzipped in a separate folder although I have on the root the three versions.

ATH 2018-08-04 21:58

[QUOTE=pinhodecarlos;493174]Apologies, I was typing through my iphone. I've just confirmed checking my folder that the msieve version I've been using lately is msieve-svn1018-vbits128-sandybridge. At least that's the only one I have unzipped in a separate folder although I have on the root the three versions.[/QUOTE]

What I meant was does the 128 bit vectors give any improvement over 64 bit when it was compiled on GCC 7.3.0 ?

pinhodecarlos 2018-08-04 22:41

[QUOTE=ATH;493180]What I meant was does the 128 bit vectors give any improvement over 64 bit when it was compiled on GCC 7.3.0 ?[/QUOTE]

This is what I recall:

[url]http://www.mersenneforum.org/showpost.php?p=479752&postcount=14[/url]

fivemack 2018-08-05 14:08

I fixed up the code so that VBITS=512 compiled and successfully started the linear algebra, but (on a machine with AVX512) it is less than half the speed of VBITS=256; my suspicion is that all the time is now spent in the B*N * N*B matrix multiply, which probably can be improved but would be a bit of work.

Ah, doing an objdump on the executable indicates that it's using pairs of ymm registers rather than single zmm registers; will need to fiddle around more with compilation options. Simply using -march=skylake-avx512 doesn't help: still nothing in 'objdump -d msieve | grep zmm'

fivemack 2018-08-06 10:18

Here are some timings on an Ivy Bridge machine (six-core machine, running -t6, 13.52M matrix from C234_138_83 because that's what I had available)

[code]
SIZE PREFETCH Mdim/day
64 N 3.35
128 N 4.07
256 N 3.86
64 Y 3.41
128 Y 4.15
256 Y 4.00
[/code]

So on this machine, running msieve-lacuda SVN1022 compiled with gcc-8.2.0 -march=native, MANUAL_PREFETCH is a noticeable plus, 128-bit vectors are better than 64-bit, but 256-bit vectors are less good than 128-bit


All times are UTC. The time now is 01:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.