mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2018-08-04, 12:40   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default Long vectors work well for me

After spending most of yesterday shaving yaks (mainly building gcc-8.2.0, since gcc-5.4 doesn't generate vector instructions for the VBITS=256 code in msieve-lacuda), I have 256-bit-wide vectors working on my Skylake Xeon box, and they're substantially faster than 64-bit-wide vectors: about 40 hours rather than about 60 hours for a 13M matrix.
fivemack is offline   Reply With Quote
Old 2018-08-04, 13:27   #2
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

135316 Posts
Default

Windows binaries in here:

http://www.mersenneforum.org/showpos...2&postcount=61

Think I’m using the 256 bits version as well.
pinhodecarlos is offline   Reply With Quote
Old 2018-08-04, 19:25   #3
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

29×61 Posts
Default

Would using a 128 or 256 bit compilation of msieve help on an ivybridge?
wombatman is online now   Reply With Quote
Old 2018-08-04, 20:16   #4
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

61278 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Windows binaries in here:

http://www.mersenneforum.org/showpos...2&postcount=61

Think I’m using the 256 bits version as well.
Are you sure those work properly? They were not compiled with GCC 8.2.0 but with GCC 7.3.0. Will these 256 bit vectors only work on Skylake processors?

Last fiddled with by ATH on 2018-08-04 at 20:17
ATH is offline   Reply With Quote
Old 2018-08-04, 20:29   #5
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3·17·97 Posts
Default

My last 10 msieve post processing were done with it, no issues. Laptop has an ivy bridge processor
pinhodecarlos is offline   Reply With Quote
Old 2018-08-04, 20:30   #6
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

The makefile does -march=native so a build on a Skylake Xeon might well not work on IVB or even on a Skylake non-Xeon. I’ll try doing a build on my IVB machine tomorrow and see if I can produce some timings.
fivemack is offline   Reply With Quote
Old 2018-08-04, 20:39   #7
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

10011010100112 Posts
Default

Apologies, I was typing through my iphone. I've just confirmed checking my folder that the msieve version I've been using lately is msieve-svn1018-vbits128-sandybridge. At least that's the only one I have unzipped in a separate folder although I have on the root the three versions.

Last fiddled with by pinhodecarlos on 2018-08-04 at 20:41
pinhodecarlos is offline   Reply With Quote
Old 2018-08-04, 21:58   #8
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

35×13 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Apologies, I was typing through my iphone. I've just confirmed checking my folder that the msieve version I've been using lately is msieve-svn1018-vbits128-sandybridge. At least that's the only one I have unzipped in a separate folder although I have on the root the three versions.
What I meant was does the 128 bit vectors give any improvement over 64 bit when it was compiled on GCC 7.3.0 ?
ATH is offline   Reply With Quote
Old 2018-08-04, 22:41   #9
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3×17×97 Posts
Default

Quote:
Originally Posted by ATH View Post
What I meant was does the 128 bit vectors give any improvement over 64 bit when it was compiled on GCC 7.3.0 ?
This is what I recall:

http://www.mersenneforum.org/showpos...2&postcount=14
pinhodecarlos is offline   Reply With Quote
Old 2018-08-05, 14:08   #10
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72×131 Posts
Default

I fixed up the code so that VBITS=512 compiled and successfully started the linear algebra, but (on a machine with AVX512) it is less than half the speed of VBITS=256; my suspicion is that all the time is now spent in the B*N * N*B matrix multiply, which probably can be improved but would be a bit of work.

Ah, doing an objdump on the executable indicates that it's using pairs of ymm registers rather than single zmm registers; will need to fiddle around more with compilation options. Simply using -march=skylake-avx512 doesn't help: still nothing in 'objdump -d msieve | grep zmm'

Last fiddled with by fivemack on 2018-08-05 at 16:18
fivemack is offline   Reply With Quote
Old 2018-08-06, 10:18   #11
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

144238 Posts
Default

Here are some timings on an Ivy Bridge machine (six-core machine, running -t6, 13.52M matrix from C234_138_83 because that's what I had available)

Code:
SIZE	PREFETCH	Mdim/day
64	N		3.35
128	N		4.07
256	N		3.86
64	Y		3.41
128	Y		4.15
256	Y		4.00
So on this machine, running msieve-lacuda SVN1022 compiled with gcc-8.2.0 -march=native, MANUAL_PREFETCH is a noticeable plus, 128-bit vectors are better than 64-bit, but 256-bit vectors are less good than 128-bit

Last fiddled with by fivemack on 2018-08-06 at 10:19
fivemack is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Linear algebra with large vectors jasonp Msieve 15 2018-02-12 23:40
very long int davar55 Lounge 60 2013-07-30 20:26
Using long long's in Mingw with 32-bit Windows XP grandpascorpion Programming 7 2009-10-04 12:13
I think it's gonna be a long, long time panic Hardware 9 2009-09-11 05:11
Too long time to work ... ??? Joël Harismendy Software 18 2005-05-16 15:05

All times are UTC. The time now is 01:07.


Sat Jul 17 01:07:02 UTC 2021 up 49 days, 22:54, 1 user, load averages: 1.93, 1.87, 1.60

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.