mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-06-21, 15:58   #56
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

173E16 Posts
Default

To this point I have only been using extended FPU and SSE routines within the sieving code. Starting with code written by Ernst, I have written AVX routines that will improve performance on the CPU. I estimate between 30% and 50% faster sieving. Now that I understand AVX much better, AVX512 is a possibility, but I don't have access to a CPU with AVX512 support.

It will take time for me to integrate into the various sieves and some sieves will not be a good candidate for the AVX routines. One example of this is cksieve. I will need to evaluate that separately.
rogue is offline   Reply With Quote
Old 2018-06-25, 20:36   #57
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×52×7×17 Posts
Default

I have released mtsieve 1.6. Here are the changes:

Code:
   Fixed an error with factor rate calculation when less than 1 per second.
   Fixed an issue with gfndsieve when continuing a sieve and k < n.
   For kbbsieve, added some checks for algebraic factorizations.
   Added gcwsieve for Cullens and Woodalls.  This sieve is GPU enabled.   
   Renamed all ASM routines to easily distinguish FPU/SSE/AVX.

   Added AVX asm code for use by the Worker classes.
   Added a mini-chunk mode that can be used when the worker classes
   handles primes in chunks, such as AVX mode, which is chunks of 16 primes.

   gcwsieve supports AVX.  The CPU-only code is about 30% faster than Geoff Reynold's version.
   xyyxsieve supports AVX.  The CPU-only code is about 2.5x faster than the previous version.
You can get the latest code with Windows builds here.

I expect the AVX routines to fail on non-Windows OSes. If they do then I know what I need to fix. It is a matter of finding the time. I've been doing some refactoring with the hope that using this framework becomes easier for others once that is done.

If anyone is truly interested in helping me, I need help adding AVX support to the other sieves. If interested, please contact me via PM or e-mail.
rogue is offline   Reply With Quote
Old 2018-06-26, 08:25   #58
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

22×1,433 Posts
Default

I need to give you my code for b_1^{n_1}b_2^{n_2}+c at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier.

It also needs to be converted to use SSE2/AVX where possible.
henryzz is offline   Reply With Quote
Old 2018-06-26, 16:36   #59
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

173E16 Posts
Default

Quote:
Originally Posted by henryzz View Post
I need to give you my code for b_1^{n_1}b_2^{n_2}+c at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier.

It also needs to be converted to use SSE2/AVX where possible.
Good to know that someone is trying to use my framework for their own benefit.

You need to call CpuSupportsAvx() to determine if the CPU has AVX support. This is declared in Worker.h. If it returns false then you need to code with the FPU or SSE routines.

I suggest that you a look at avx-asm-x86.h as well as CullenWoodallWorker.cpp to help familiarize yourself with the AVX routines.

One other caveat is that you cannot use the AVX code and have data of type double (double * is fine). This is because the AVX routines don't save the xmm registers upon entry and restore them upon exit. I'll address that limitation in an upcoming release.
rogue is offline   Reply With Quote
Old 2018-07-21, 21:44   #60
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

10111001111102 Posts
Default

I have posted mtsieve 1.7 to my website. Here are the changes:

Code:
Added a timestamp to liens written to the log.
Canged usage of some registers in the AVX code to avoid ymm0-ymm3 being passed between calls to AVX routines.
Added psieve for primorials.

psieve supports AVX and is about 30% faster than fpsieve.
rogue is offline   Reply With Quote
Old 2018-07-25, 08:34   #61
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

22×337 Posts
Default

fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms
p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07


So simple math show that speed of this sieve is not 25350000 but much higher 952380952


Or 25.35M p/sec means something different?

Last fiddled with by pepi37 on 2018-07-25 at 08:35
pepi37 is online now   Reply With Quote
Old 2018-07-25, 08:51   #62
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

22·1,433 Posts
Default

Quote:
Originally Posted by pepi37 View Post
fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms
p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07


So simple math show that speed of this sieve is not 25350000 but much higher 952380952


Or 25.35M p/sec means something different?
Primes vs range of p?
henryzz is offline   Reply With Quote
Old 2018-07-25, 09:40   #63
axn
 
axn's Avatar
 
Jun 2003

5×23×41 Posts
Default

Quote:
Originally Posted by henryzz View Post
Primes vs range of p?
Sure sounds like it. log(92471161196761) ~= 32. So there will be a factor of 32 between the prime range method and the prime count method. But OP's calculation has a factor of 37 -- don't know how.
(122230982980247 - 92230982980247 )*0.8% / 308 ~= 779m, not 952m.
axn is offline   Reply With Quote
Old 2018-07-25, 10:21   #64
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

3·23·43 Posts
Default

2 minor cosmetic errors with fkbnsieve:
The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32.
The -c and -C are also called kmin and kmax instead of cmin and cmax.
ATH is offline   Reply With Quote
Old 2018-07-25, 16:18   #65
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×52×7×17 Posts
Default

Quote:
Originally Posted by henryzz View Post
Primes vs range of p?
Correct. "range of p" is really primes between -p and -P. The software only tests primes in that range, so counting primes that were tested makes a lot more sense.
rogue is offline   Reply With Quote
Old 2018-07-25, 16:23   #66
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×52×7×17 Posts
Default

Quote:
Originally Posted by ATH View Post
2 minor cosmetic errors with fkbnsieve:
The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32.
The -c and -C are also called kmin and kmax instead of cmin and cmax.
Thanks for the heads-up. I'll fix in the next release.
rogue is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 04:35.

Mon Oct 26 04:35:17 UTC 2020 up 46 days, 1:46, 0 users, load averages: 2.39, 2.62, 2.65

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.