20180621, 15:58  #56 
"Mark"
Apr 2003
Between here and the
173E_{16} Posts 
To this point I have only been using extended FPU and SSE routines within the sieving code. Starting with code written by Ernst, I have written AVX routines that will improve performance on the CPU. I estimate between 30% and 50% faster sieving. Now that I understand AVX much better, AVX512 is a possibility, but I don't have access to a CPU with AVX512 support.
It will take time for me to integrate into the various sieves and some sieves will not be a good candidate for the AVX routines. One example of this is cksieve. I will need to evaluate that separately. 
20180625, 20:36  #57 
"Mark"
Apr 2003
Between here and the
2×5^{2}×7×17 Posts 
I have released mtsieve 1.6. Here are the changes:
Code:
Fixed an error with factor rate calculation when less than 1 per second. Fixed an issue with gfndsieve when continuing a sieve and k < n. For kbbsieve, added some checks for algebraic factorizations. Added gcwsieve for Cullens and Woodalls. This sieve is GPU enabled. Renamed all ASM routines to easily distinguish FPU/SSE/AVX. Added AVX asm code for use by the Worker classes. Added a minichunk mode that can be used when the worker classes handles primes in chunks, such as AVX mode, which is chunks of 16 primes. gcwsieve supports AVX. The CPUonly code is about 30% faster than Geoff Reynold's version. xyyxsieve supports AVX. The CPUonly code is about 2.5x faster than the previous version. I expect the AVX routines to fail on nonWindows OSes. If they do then I know what I need to fix. It is a matter of finding the time. I've been doing some refactoring with the hope that using this framework becomes easier for others once that is done. If anyone is truly interested in helping me, I need help adding AVX support to the other sieves. If interested, please contact me via PM or email. 
20180626, 08:25  #58 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2^{2}×1,433 Posts 
I need to give you my code for at some point. I need to work out whether it is TestPrimeChunk or BuildBNRemainders taking more time. If it is BuildBNRemainders then I will probably add a third variable base and n. I do want to at least add in support for a fixed multiplier.
It also needs to be converted to use SSE2/AVX where possible. 
20180626, 16:36  #59  
"Mark"
Apr 2003
Between here and the
173E_{16} Posts 
Quote:
You need to call CpuSupportsAvx() to determine if the CPU has AVX support. This is declared in Worker.h. If it returns false then you need to code with the FPU or SSE routines. I suggest that you a look at avxasmx86.h as well as CullenWoodallWorker.cpp to help familiarize yourself with the AVX routines. One other caveat is that you cannot use the AVX code and have data of type double (double * is fine). This is because the AVX routines don't save the xmm registers upon entry and restore them upon exit. I'll address that limitation in an upcoming release. 

20180721, 21:44  #60 
"Mark"
Apr 2003
Between here and the
1011100111110_{2} Posts 
I have posted mtsieve 1.7 to my website. Here are the changes:
Code:
Added a timestamp to liens written to the log. Canged usage of some registers in the AVX code to avoid ymm0ymm3 being passed between calls to AVX routines. Added psieve for primorials. psieve supports AVX and is about 30% faster than fpsieve. 
20180725, 08:34  #61 
Dec 2011
After milion nines:)
2^{2}×337 Posts 
fbncsieve v1.3.1, a program to find factors of k*b^n+c numbers for fixed b, n, and c and variable k
Sieve started: 92230982980247 < p < 122230982980247 with 33803 terms p=92471161196761, 25.35M p/sec, 1 factors found at 308 sec per factor, 0.8% done. ETC 20180725 21:07 So simple math show that speed of this sieve is not 25350000 but much higher 952380952 Or 25.35M p/sec means something different? Last fiddled with by pepi37 on 20180725 at 08:35 
20180725, 08:51  #62  
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2^{2}·1,433 Posts 
Quote:


20180725, 09:40  #63 
Jun 2003
5×23×41 Posts 
Sure sounds like it. log(92471161196761) ~= 32. So there will be a factor of 32 between the prime range method and the prime count method. But OP's calculation has a factor of 37  don't know how.
(122230982980247  92230982980247 )*0.8% / 308 ~= 779m, not 952m. 
20180725, 10:21  #64 
Einyen
Dec 2003
Denmark
3·23·43 Posts 
2 minor cosmetic errors with fkbnsieve:
The variable showing the number of terms it is about to sieve is a signed 32 bit variable, but it still works even if the c interval is above 2^31 and above 2^32. The c and C are also called kmin and kmax instead of cmin and cmax. 
20180725, 16:18  #65 
"Mark"
Apr 2003
Between here and the
2×5^{2}×7×17 Posts 

20180725, 16:23  #66  
"Mark"
Apr 2003
Between here and the
2×5^{2}×7×17 Posts 
Quote:

