![]() |
[QUOTE=henryzz;570419]Is that comparison without sr1sieve using a Legendre symbol cache? As far as I can tell srsieve2 with sr1sieve logic is spending around 30% of its time calculating legendre symbols. I get the following message if I try to turn it on "[COLOR="Red"]Ingoring [/COLOR]-L option since Legendre tables cannot be used"
Also I get a seg fault after running "./srsieve2 -P 1e9 -n 1 -N 100000 -s "19920911*2^n+1"" This is using r95 of the code on Sourceforge.[/QUOTE] -L isn't supported (yet). By default it will create a Legendre table and you can use -l to disable, but I actually haven't verified that is working correctly. I found the error and committed a change to sourceforge. I have updated srsieve2.7z over at sourceforge as well. |
The current SVN version fails to run on Kubuntu 20.04:
[code]$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1" srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors) Sieving one sequence where abs(c) = 1 for p >= 37803 Split 1 base 2 sequence into 94 base 2^180 sequences. malloc(): corrupted top size Aborted (core dumped) [/code] |
[QUOTE=Happy5214;570707]The current SVN version fails to run on Kubuntu 20.04:
[code]$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1" srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors) Sieving one sequence where abs(c) = 1 for p >= 37803 Split 1 base 2 sequence into 94 base 2^180 sequences. malloc(): corrupted top size Aborted (core dumped) [/code][/QUOTE] It crashes on Windows as well, so it shouldn't be too hard to track down and fix. |
I found and fixed the problem. The changes are committed to sourceforge.
|
There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.
|
[QUOTE=rogue;570755]There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.[/QUOTE]
This is now fixed. |
BTW, now with this change the speed of srsieve2 (for CisOne logic) is within 5% of the speed of sr1sieve (with x86 asm) and about 10% faster than the speed of sr1sieve (with no x86 asm). By "within" I mean that sometimes it is faster and sometimes it is slower. The speed difference appears to be one of cache usage and CPU load on the machine overall. Note this was only tested with a single sequence so it is possible that other sequences will yield different results.
I will have to play around with unrolling some of the loops in srsieve2 to see if I can do better, but right now I'm pleased to see that it is performing so well considering it didn't look so well earlier this week. My intention is to post a build after I track down the issue with the CisOne logic in srsieve2cl. |
Great news! I have tracked down and squashed the known bugs in srsieve2 and srsieve2cl. I have some benchmarks to share.
The CPU is an Intel i78-8550H at 2.6 GHz and the GPU is an NVIDIA Quadro P3200. I was running no other CPU/GPU intensive processes during this test. All runs yielded the same set of factors. I sieved 37803*2^n-1 for n from 5e4 to 25e4 up to 1e6. I then ran the file thru sr1sieve, sr2sieve, and sr2sievecl taking the average of 5 runs. Here are the results: [code] srsieve2 -i b2_n.in -P1e10 504 srsieve2 -i b2_n.in -P1e10 -l 647 srsieve2cl -i b2_n.in -P1e10 355 srsieve2cl -i b2_n.in -P1e10 -l 353 srsieve2cl -i b2_n.in -P1e10 -g100 221 srsieve2cl -i b2_n.in -P1e10 -g100 -1 210 srsieve2cl -i b2_n.in -P1e10 -g1000 184 srsieve2cl -i b2_n.in -P1e10 -g1000 -l 183 sr1sieve -i b2_n.in -P1e10 -ffact.out (asm) 460 sr1sieve -i b2_n.in -P1e10 -ffact.out -x (asm) 562 sr1sieve -i b2_n.in -P1e10 -ffact.out (no asm) 455 sr1sieve -i b2_n.in -P1e10 -ffact.out -x (no asm) 549 [/code] As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables". This corresponds to -x from sr1sieve. The OpenCL code in srsieve2cl supports Legendre lookup tables, but you can see that it doesn't provide any benefit for this k. It is clear that srsieve2cl with -g1000 clearly beats out everything else. With -g1000 it uses less than 500 MB of GPU memory (per Windows Task Manager. It will be interesting to see this run on lower GPUs to see how they compare. So with this report, mtsieve 2.1.6 is now released. Here are the changes: [code] framework: Add largestPrimeTested parameter to NotifyAppToRebuild() as the app cannot rely on accurately determining that value. srsieve2, srsieve2cl: version 1.5 Fixed remaining known issues with CisOne logic (sequences where abs(c) = 1) for a single CisOne sequence (sr1sieve). Added OpenCL code for CisOne logic. Added Legendre table lookups for CisOne logic. [/code] |
[QUOTE=rogue;570815]
[code] srsieve2cl -i b2_n.in -P1e10 -g100 -1 210 [/code]As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables". [/QUOTE] And what does the "-1" means? :razz: OTOH, good job! |
Does srsieve2cl with -g1000 kill srsieve1 in speed?
|
[QUOTE=pepi37;570838]Does srsieve2cl with -g1000 kill srsieve1 in speed?[/QUOTE]
Based upon the single sequence I tested given the hardware specs I provided, sriseve2cl with -g1000 is more than twice as fast as sr1sieve. With -g100 it is slightly more than twice is faster as sr1sieve. With a higher value with -g, it could possible be 3x faster, but that is on this hardware. |
| All times are UTC. The time now is 19:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.