mersenneforum.org mtsieve
 Register FAQ Search Today's Posts Mark Forums Read

 2018-06-21, 15:58 #56 rogue     "Mark" Apr 2003 Between here and the 173E16 Posts To this point I have only been using extended FPU and SSE routines within the sieving code. Starting with code written by Ernst, I have written AVX routines that will improve performance on the CPU. I estimate between 30% and 50% faster sieving. Now that I understand AVX much better, AVX512 is a possibility, but I don't have access to a CPU with AVX512 support. It will take time for me to integrate into the various sieves and some sieves will not be a good candidate for the AVX routines. One example of this is cksieve. I will need to evaluate that separately.
 2018-06-25, 20:36 #57 rogue     "Mark" Apr 2003 Between here and the 2×52×7×17 Posts I have released mtsieve 1.6. Here are the changes: Code:  Fixed an error with factor rate calculation when less than 1 per second. Fixed an issue with gfndsieve when continuing a sieve and k < n. For kbbsieve, added some checks for algebraic factorizations. Added gcwsieve for Cullens and Woodalls. This sieve is GPU enabled. Renamed all ASM routines to easily distinguish FPU/SSE/AVX. Added AVX asm code for use by the Worker classes. Added a mini-chunk mode that can be used when the worker classes handles primes in chunks, such as AVX mode, which is chunks of 16 primes. gcwsieve supports AVX. The CPU-only code is about 30% faster than Geoff Reynold's version. xyyxsieve supports AVX. The CPU-only code is about 2.5x faster than the previous version. You can get the latest code with Windows builds here. I expect the AVX routines to fail on non-Windows OSes. If they do then I know what I need to fix. It is a matter of finding the time. I've been doing some refactoring with the hope that using this framework becomes easier for others once that is done. If anyone is truly interested in helping me, I need help adding AVX support to the other sieves. If interested, please contact me via PM or e-mail.
 2018-06-26, 08:25 #58 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 22×1,433 Posts I need to give you my code for $b_1^{n_1}b_2^{n_2}+c$ at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier. It also needs to be converted to use SSE2/AVX where possible.
2018-06-26, 16:36   #59
rogue

"Mark"
Apr 2003
Between here and the

173E16 Posts

Quote:
 Originally Posted by henryzz I need to give you my code for $b_1^{n_1}b_2^{n_2}+c$ at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier. It also needs to be converted to use SSE2/AVX where possible.
Good to know that someone is trying to use my framework for their own benefit.

You need to call CpuSupportsAvx() to determine if the CPU has AVX support. This is declared in Worker.h. If it returns false then you need to code with the FPU or SSE routines.

I suggest that you a look at avx-asm-x86.h as well as CullenWoodallWorker.cpp to help familiarize yourself with the AVX routines.

One other caveat is that you cannot use the AVX code and have data of type double (double * is fine). This is because the AVX routines don't save the xmm registers upon entry and restore them upon exit. I'll address that limitation in an upcoming release.

 2018-07-21, 21:44 #60 rogue     "Mark" Apr 2003 Between here and the 10111001111102 Posts I have posted mtsieve 1.7 to my website. Here are the changes: Code: Added a timestamp to liens written to the log. Canged usage of some registers in the AVX code to avoid ymm0-ymm3 being passed between calls to AVX routines. Added psieve for primorials. psieve supports AVX and is about 30% faster than fpsieve.
 2018-07-25, 08:34 #61 pepi37     Dec 2011 After milion nines:) 22×337 Posts fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07 So simple math show that speed of this sieve is not 25350000 but much higher 952380952 Or 25.35M p/sec means something different? Last fiddled with by pepi37 on 2018-07-25 at 08:35
2018-07-25, 08:51   #62
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT/BST)

22·1,433 Posts

Quote:
 Originally Posted by pepi37 fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 2018-07-25 21:07 So simple math show that speed of this sieve is not 25350000 but much higher 952380952 Or 25.35M p/sec means something different?
Primes vs range of p?

2018-07-25, 09:40   #63
axn

Jun 2003

5×23×41 Posts

Quote:
 Originally Posted by henryzz Primes vs range of p?
Sure sounds like it. log(92471161196761) ~= 32. So there will be a factor of 32 between the prime range method and the prime count method. But OP's calculation has a factor of 37 -- don't know how.
(122230982980247 - 92230982980247 )*0.8% / 308 ~= 779m, not 952m.

 2018-07-25, 10:21 #64 ATH Einyen     Dec 2003 Denmark 3·23·43 Posts 2 minor cosmetic errors with fkbnsieve: The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32. The -c and -C are also called kmin and kmax instead of cmin and cmax.
2018-07-25, 16:18   #65
rogue

"Mark"
Apr 2003
Between here and the

2×52×7×17 Posts

Quote:
 Originally Posted by henryzz Primes vs range of p?
Correct. "range of p" is really primes between -p and -P. The software only tests primes in that range, so counting primes that were tested makes a lot more sense.

2018-07-25, 16:23   #66
rogue

"Mark"
Apr 2003
Between here and the

2×52×7×17 Posts

Quote:
 Originally Posted by ATH 2 minor cosmetic errors with fkbnsieve: The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32. The -c and -C are also called kmin and kmax instead of cmin and cmax.
Thanks for the heads-up. I'll fix in the next release.