![]() |
|
|
#1 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
144238 Posts |
After spending most of yesterday shaving yaks (mainly building gcc-8.2.0, since gcc-5.4 doesn't generate vector instructions for the VBITS=256 code in msieve-lacuda), I have 256-bit-wide vectors working on my Skylake Xeon box, and they're substantially faster than 64-bit-wide vectors: about 40 hours rather than about 60 hours for a 13M matrix.
|
|
|
|
|
|
#2 |
|
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
3·17·97 Posts |
Windows binaries in here:
http://www.mersenneforum.org/showpos...2&postcount=61 Think I’m using the 256 bits version as well. |
|
|
|
|
|
#3 |
|
I moo ablest echo power!
May 2013
29×61 Posts |
Would using a 128 or 256 bit compilation of msieve help on an ivybridge?
|
|
|
|
|
|
#4 | |
|
Einyen
Dec 2003
Denmark
35·13 Posts |
Quote:
Last fiddled with by ATH on 2018-08-04 at 20:17 |
|
|
|
|
|
|
#5 |
|
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
3·17·97 Posts |
My last 10 msieve post processing were done with it, no issues. Laptop has an ivy bridge processor
|
|
|
|
|
|
#6 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72·131 Posts |
The makefile does -march=native so a build on a Skylake Xeon might well not work on IVB or even on a Skylake non-Xeon. I’ll try doing a build on my IVB machine tomorrow and see if I can produce some timings.
|
|
|
|
|
|
#7 |
|
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
3×17×97 Posts |
Apologies, I was typing through my iphone. I've just confirmed checking my folder that the msieve version I've been using lately is msieve-svn1018-vbits128-sandybridge. At least that's the only one I have unzipped in a separate folder although I have on the root the three versions.
Last fiddled with by pinhodecarlos on 2018-08-04 at 20:41 |
|
|
|
|
|
#8 | |
|
Einyen
Dec 2003
Denmark
61278 Posts |
Quote:
|
|
|
|
|
|
|
#9 | |
|
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
494710 Posts |
Quote:
http://www.mersenneforum.org/showpos...2&postcount=14 |
|
|
|
|
|
|
#10 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
I fixed up the code so that VBITS=512 compiled and successfully started the linear algebra, but (on a machine with AVX512) it is less than half the speed of VBITS=256; my suspicion is that all the time is now spent in the B*N * N*B matrix multiply, which probably can be improved but would be a bit of work.
Ah, doing an objdump on the executable indicates that it's using pairs of ymm registers rather than single zmm registers; will need to fiddle around more with compilation options. Simply using -march=skylake-avx512 doesn't help: still nothing in 'objdump -d msieve | grep zmm' Last fiddled with by fivemack on 2018-08-05 at 16:18 |
|
|
|
|
|
#11 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
191316 Posts |
Here are some timings on an Ivy Bridge machine (six-core machine, running -t6, 13.52M matrix from C234_138_83 because that's what I had available)
Code:
SIZE PREFETCH Mdim/day 64 N 3.35 128 N 4.07 256 N 3.86 64 Y 3.41 128 Y 4.15 256 Y 4.00 Last fiddled with by fivemack on 2018-08-06 at 10:19 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Linear algebra with large vectors | jasonp | Msieve | 15 | 2018-02-12 23:40 |
| very long int | davar55 | Lounge | 60 | 2013-07-30 20:26 |
| Using long long's in Mingw with 32-bit Windows XP | grandpascorpion | Programming | 7 | 2009-10-04 12:13 |
| I think it's gonna be a long, long time | panic | Hardware | 9 | 2009-09-11 05:11 |
| Too long time to work ... ??? | Joël Harismendy | Software | 18 | 2005-05-16 15:05 |