![]() |
Long vectors work well for me
After spending most of yesterday shaving yaks (mainly building gcc-8.2.0, since gcc-5.4 doesn't generate vector instructions for the VBITS=256 code in msieve-lacuda), I have 256-bit-wide vectors working on my Skylake Xeon box, and they're substantially faster than 64-bit-wide vectors: about 40 hours rather than about 60 hours for a 13M matrix.
|
Windows binaries in here:
[url]http://www.mersenneforum.org/showpost.php?p=479462&postcount=61[/url] Think I’m using the 256 bits version as well. |
Would using a 128 or 256 bit compilation of msieve help on an ivybridge?
|
[QUOTE=pinhodecarlos;493128]Windows binaries in here:
[url]http://www.mersenneforum.org/showpost.php?p=479462&postcount=61[/url] Think I’m using the 256 bits version as well.[/QUOTE] Are you sure those work properly? They were not compiled with GCC 8.2.0 but with GCC 7.3.0. Will these 256 bit vectors only work on Skylake processors? |
My last 10 msieve post processing were done with it, no issues. Laptop has an ivy bridge processor
|
The makefile does -march=native so a build on a Skylake Xeon might well not work on IVB or even on a Skylake non-Xeon. I’ll try doing a build on my IVB machine tomorrow and see if I can produce some timings.
|
Apologies, I was typing through my iphone. I've just confirmed checking my folder that the msieve version I've been using lately is msieve-svn1018-vbits128-sandybridge. At least that's the only one I have unzipped in a separate folder although I have on the root the three versions.
|
[QUOTE=pinhodecarlos;493174]Apologies, I was typing through my iphone. I've just confirmed checking my folder that the msieve version I've been using lately is msieve-svn1018-vbits128-sandybridge. At least that's the only one I have unzipped in a separate folder although I have on the root the three versions.[/QUOTE]
What I meant was does the 128 bit vectors give any improvement over 64 bit when it was compiled on GCC 7.3.0 ? |
[QUOTE=ATH;493180]What I meant was does the 128 bit vectors give any improvement over 64 bit when it was compiled on GCC 7.3.0 ?[/QUOTE]
This is what I recall: [url]http://www.mersenneforum.org/showpost.php?p=479752&postcount=14[/url] |
I fixed up the code so that VBITS=512 compiled and successfully started the linear algebra, but (on a machine with AVX512) it is less than half the speed of VBITS=256; my suspicion is that all the time is now spent in the B*N * N*B matrix multiply, which probably can be improved but would be a bit of work.
Ah, doing an objdump on the executable indicates that it's using pairs of ymm registers rather than single zmm registers; will need to fiddle around more with compilation options. Simply using -march=skylake-avx512 doesn't help: still nothing in 'objdump -d msieve | grep zmm' |
Here are some timings on an Ivy Bridge machine (six-core machine, running -t6, 13.52M matrix from C234_138_83 because that's what I had available)
[code] SIZE PREFETCH Mdim/day 64 N 3.35 128 N 4.07 256 N 3.86 64 Y 3.41 128 Y 4.15 256 Y 4.00 [/code] So on this machine, running msieve-lacuda SVN1022 compiled with gcc-8.2.0 -march=native, MANUAL_PREFETCH is a noticeable plus, 128-bit vectors are better than 64-bit, but 256-bit vectors are less good than 128-bit |
| All times are UTC. The time now is 01:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.