mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-01-29, 18:09   #518
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

3·7·13·23 Posts
Default

Quote:
Originally Posted by henryzz View Post
Is that comparison without sr1sieve using a Legendre symbol cache? As far as I can tell srsieve2 with sr1sieve logic is spending around 30% of its time calculating legendre symbols. I get the following message if I try to turn it on "Ingoring -L option since Legendre tables cannot be used"

Also I get a seg fault after running "./srsieve2 -P 1e9 -n 1 -N 100000 -s "19920911*2^n+1""

This is using r95 of the code on Sourceforge.
-L isn't supported (yet). By default it will create a Legendre table and you can use -l to disable, but I actually haven't verified that is working correctly.

I found the error and committed a change to sourceforge. I have updated srsieve2.7z over at sourceforge as well.

Last fiddled with by rogue on 2021-01-29 at 18:37
rogue is offline   Reply With Quote
Old 2021-02-02, 08:59   #519
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

2·3·5·19 Posts
Default

The current SVN version fails to run on Kubuntu 20.04:

Code:
$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1"
srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors)
Sieving one sequence where abs(c) = 1 for p >= 37803
Split 1 base 2 sequence into 94 base 2^180 sequences.
malloc(): corrupted top size
Aborted (core dumped)
Happy5214 is offline   Reply With Quote
Old 2021-02-02, 13:21   #520
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11000100001112 Posts
Default

Quote:
Originally Posted by Happy5214 View Post
The current SVN version fails to run on Kubuntu 20.04:

Code:
$ ./srsieve2 -W "3" -n "50e3" -N "230e3" -P "1e9" -o 't17_b2.prp' -f B -s "37803*2^n-1"
srsieve2 v1.5, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e9 with 180001 terms (50000 < n < 230000, k*2^n+c) (expecting 170458 factors)
Sieving one sequence where abs(c) = 1 for p >= 37803
Split 1 base 2 sequence into 94 base 2^180 sequences.
malloc(): corrupted top size
Aborted (core dumped)
It crashes on Windows as well, so it shouldn't be too hard to track down and fix.
rogue is offline   Reply With Quote
Old 2021-02-02, 15:47   #521
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

188716 Posts
Default

I found and fixed the problem. The changes are committed to sourceforge.
rogue is offline   Reply With Quote
Old 2021-02-02, 19:24   #522
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11000100001112 Posts
Default

There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.
rogue is offline   Reply With Quote
Old 2021-02-03, 14:07   #523
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

142078 Posts
Default

Quote:
Originally Posted by rogue View Post
There seems to be an issue with the Legendre lookup table. If you do not use -l, then it will miss factors. It should be easy to track down, but one never knows. Note that -l disables the building of the Legendre lookup tables. It is enabled by default.
This is now fixed.
rogue is offline   Reply With Quote
Old 2021-02-03, 15:30   #524
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

3·7·13·23 Posts
Default

BTW, now with this change the speed of srsieve2 (for CisOne logic) is within 5% of the speed of sr1sieve (with x86 asm) and about 10% faster than the speed of sr1sieve (with no x86 asm). By "within" I mean that sometimes it is faster and sometimes it is slower. The speed difference appears to be one of cache usage and CPU load on the machine overall. Note this was only tested with a single sequence so it is possible that other sequences will yield different results.

I will have to play around with unrolling some of the loops in srsieve2 to see if I can do better, but right now I'm pleased to see that it is performing so well considering it didn't look so well earlier this week.

My intention is to post a build after I track down the issue with the CisOne logic in srsieve2cl.
rogue is offline   Reply With Quote
Old 2021-02-04, 01:40   #525
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

3·7·13·23 Posts
Default

Great news! I have tracked down and squashed the known bugs in srsieve2 and srsieve2cl. I have some benchmarks to share.

The CPU is an Intel i78-8550H at 2.6 GHz and the GPU is an NVIDIA Quadro P3200. I was running no other CPU/GPU intensive processes during this test. All runs yielded the same set of factors.

I sieved 37803*2^n-1 for n from 5e4 to 25e4 up to 1e6. I then ran the file thru sr1sieve, sr2sieve, and sr2sievecl taking the average of 5 runs. Here are the results:

Code:
srsieve2 -i b2_n.in -P1e10                         504
srsieve2 -i b2_n.in -P1e10 -l                      647

srsieve2cl -i b2_n.in -P1e10                       355
srsieve2cl -i b2_n.in -P1e10 -l                    353

srsieve2cl -i b2_n.in -P1e10 -g100                 221
srsieve2cl -i b2_n.in -P1e10 -g100 -1              210

srsieve2cl -i b2_n.in -P1e10 -g1000                184
srsieve2cl -i b2_n.in -P1e10 -g1000 -l             183

sr1sieve -i b2_n.in -P1e10 -ffact.out     (asm)    460
sr1sieve -i b2_n.in -P1e10 -ffact.out  -x (asm)    562

sr1sieve -i b2_n.in -P1e10 -ffact.out     (no asm) 455
sr1sieve -i b2_n.in -P1e10 -ffact.out -x  (no asm) 549
As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables". This corresponds to -x from sr1sieve. The OpenCL code in srsieve2cl supports Legendre lookup tables, but you can see that it doesn't provide any benefit for this k.

It is clear that srsieve2cl with -g1000 clearly beats out everything else. With -g1000 it uses less than 500 MB of GPU memory (per Windows Task Manager.

It will be interesting to see this run on lower GPUs to see how they compare.

So with this report, mtsieve 2.1.6 is now released. Here are the changes:

Code:
   framework:
      Add largestPrimeTested parameter to NotifyAppToRebuild() as the app cannot rely
      on accurately determining that value.
   
   srsieve2, srsieve2cl:  version 1.5
      Fixed remaining known issues with CisOne logic (sequences where abs(c) = 1) for
      a single CisOne sequence (sr1sieve).
      Added OpenCL code for CisOne logic.
      Added Legendre table lookups for CisOne logic.
rogue is offline   Reply With Quote
Old 2021-02-04, 02:05   #526
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

33×347 Posts
Default

Quote:
Originally Posted by rogue View Post
Code:
srsieve2cl -i b2_n.in -P1e10 -g100 -1              210
As a reminder -l with srsieve2/srsieve2cl means "do not use Legendre lookup tables".
And what does the "-1" means?
OTOH, good job!
LaurV is offline   Reply With Quote
Old 2021-02-04, 12:25   #527
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

101100010112 Posts
Default

Does srsieve2cl with -g1000 kill srsieve1 in speed?
pepi37 is online now   Reply With Quote
Old 2021-02-04, 13:23   #528
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

3×7×13×23 Posts
Default

Quote:
Originally Posted by pepi37 View Post
Does srsieve2cl with -g1000 kill srsieve1 in speed?
Based upon the single sequence I tested given the hardware specs I provided, sriseve2cl with -g1000 is more than twice as fast as sr1sieve. With -g100 it is slightly more than twice is faster as sr1sieve. With a higher value with -g, it could possible be 3x faster, but that is on this hardware.
rogue is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 14:53.

Wed Apr 14 14:53:27 UTC 2021 up 6 days, 9:34, 0 users, load averages: 1.62, 1.65, 1.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.