mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-01-08, 04:35   #100
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

52×7×53 Posts
Default

To clarify, srsieve is "general" tool, srXsieve are more "specific" tools. Of course the last are faster, but they can't handle all the cases the former can.

But what you (Mark) could do from my "lazy man" perspective here, is to call sr1sieve or sr2sieve directly from srsieve and srsieve2, if the conditions suffice (i.e. only one or few k's, sufficient high n, etc). Also, consider pre-calling srfile, if the format does not match (i need to use it sometimes, as sr2sieve does only support a limited number of formats for the input file).
LaurV is offline   Reply With Quote
Old 2021-01-08, 13:04   #101
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

186816 Posts
Default

Quote:
Originally Posted by LaurV View Post
To clarify, srsieve is "general" tool, srXsieve are more "specific" tools. Of course the last are faster, but they can't handle all the cases the former can.

But what you (Mark) could do from my "lazy man" perspective here, is to call sr1sieve or sr2sieve directly from srsieve and srsieve2, if the conditions suffice (i.e. only one or few k's, sufficient high n, etc). Also, consider pre-calling srfile, if the format does not match (i need to use it sometimes, as sr2sieve does only support a limited number of formats for the input file).
One could write a simple script to do as you suggest.

My long term goal is that srsieve2(cl) as "one-stop shopping". It will start with the generic case then switch to the specialized cases (sr2sieve logic, sr1sieve logc, opencl logic) when the correct conditions are met. This way you don't need to run multiple programs to get the results you currently need to run before PRP testing. The framework supports this. I just have to code to sr1sieve/sr2sieve logic, but that is a lot easier said than done. They rely on global variables and the code is really difficult to navigate. I just need enough motivation to complete the task. The main motivator will be believing that I can make srsieve2 faster than the sr1sieve/sr2sieve as that will be the motivator for people to switch. The other half is finding enough time to dedicate to the task rather than "an hour here" and "an hour there". It takes larger chunks of my time to do the work and those are hard to come by.
rogue is offline   Reply With Quote
Old 2021-01-09, 10:46   #102
KEP
Quasi Admin Thing
 
KEP's Avatar
 
May 2005

312 Posts
Default

Quote:
Originally Posted by rogue View Post
The other half is finding enough time to dedicate to the task rather than "an hour here" and "an hour there". It takes larger chunks of my time to do the work and those are hard to come by.
Just take your time. What you did with srsieve2, in practical almost doubled our sievespeed compared to srsieve - that has been a huge motivator on the 1000+ k conjectures. Also the multithread function that srsieve2 supports and handles well, was a great motivator for most Windows users to switch to your excellant program.

As Ian said, he would rather do a slower and correct job than a fast and faulty job. It was not his exact words, but the point is, that you have to do as you do now, in stead of rushing and risk throwing out a bad program that we risk believing for ever is doing a great job. Primegrid can confirm the price it cost, when a sieve program misses factors and a lot of work has to be redone Point being, keep doing a great job, even if it means the rest of us has to wait a little longer for the end product

Take care, stay safe and stay along for a long time, WE really need you and your skills
KEP is offline   Reply With Quote
Old 2021-01-19, 18:30   #103
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

141508 Posts
Default

I have finally taken the time and have a build of srsieve2 with sr1sieve logic. There are a couple of TODOs left before I can start testing. There was a lot of refactoring of the srsieve2 to support what sr1sieve does. This means that I have to retest the "generic" code paths. I don't expect that to yield any problems that I have introduced, but one never knows.

sr1sieve and sr2sieve have a lot of conditionally compiled code which makes figuring out the "normal" case hard. The task was made harder by the fact that sr1sieve and sr2sieve have a lot of global variables which do not have meaningful names so understanding this usage is frustratingly hard since I go to great lengths to avoid global variables in the framework. There is also "vectorized" code for mulmods which I'm not supporting at this time because I'm using Montgomery mulmod logic which supports larger p than sr1sieve.

I do not expect it to be as fast as sr1sieve. I'm expecting about 10% slower out of the box, but I won't know until I get to the point and I'm sure that there are many bugs in the code that have to be squashed before I even get that far.

When I get this working, the next step is to support the sr1sieve logic in OpenCL. There is one four-dimensional array in sr1sieve that has to be flattened to three dimensions before I can use it in the GPU. That shouldn't be too hard, but I also have to be cognizant of memory usage in the GPU, but I'm feeling confident that it won't be an issue. Based upon what I have seen with the generic sieving logic in OpenCL, I would expect a 5x or 6x speed bump, but I won't make any promises for that.

The step following that is sr2sieve logic. I have the pieces in place (due to the refactoring) to add support for that. It should go a lot faster than the sr1sieve logic, but I'm uncertain if about putting that in the GPU without spending more time in the sr2sieve code.
rogue is offline   Reply With Quote
Old 2021-01-19, 23:08   #104
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

5×101 Posts
Default

Quote:
Originally Posted by rogue View Post
I do not expect it to be as fast as sr1sieve. I'm expecting about 10% slower out of the box, but I won't know until I get to the point and I'm sure that there are many bugs in the code that have to be squashed before I even get that far.
Do you know what's keeping it from reaching the speed of sr1sieve?
Happy5214 is offline   Reply With Quote
Old 2021-01-19, 23:18   #105
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

23×11×71 Posts
Default

Quote:
Originally Posted by Happy5214 View Post
Do you know what's keeping it from reaching the speed of sr1sieve?
I don't know what speed it will get. I haven't gotten that far in testing. I have made some changes that will either help the speed or hurt the speed and those changes might help for some sequences, but hurt for others. I won't know for a while yet. If I do get within 10% with the current code, I believe that I can get it as fast or faster than sr1sieve (in the CPU), but that will take some experimentation.

The 10% is based upon the Montgomery mulmod that sr1sieve uses for the non-x86 code path. I am using Montgomery mulmod for srsieve2 because it requires no asm code.

Note that this means that I have the tools I need to port many of the sievers built upon the framework to ARM.

Last fiddled with by rogue on 2021-01-19 at 23:20
rogue is offline   Reply With Quote
Old 2021-01-20, 12:13   #106
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3×1,601 Posts
Default

Quote:
Originally Posted by rogue View Post

Note that this means that I have the tools I need to port many of the sievers built upon the framework to ARM.
ET_ is offline   Reply With Quote
Old 2021-01-21, 22:00   #107
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

27×11 Posts
Default

Rogue,excuse me for stupid question, but I must ask.
I enjoy using yours tools from mtsieve especially twinsieve (using it most of times).
I also read about your efforts for srsieve2.
Question: if you can make twinsieve fast as it is now, can you build sieve that will replace sr1sieve with same speed. I dont know math, and doesnot know will and in what percentage speed will be changes if sieve search for fixed n or for fixed k.
If answer is in srsieve2, then forget what I ask, I will wait :)

Last fiddled with by pepi37 on 2021-01-21 at 22:00
pepi37 is offline   Reply With Quote
Old 2021-01-21, 22:46   #108
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

186816 Posts
Default

Quote:
Originally Posted by pepi37 View Post
Rogue,excuse me for stupid question, but I must ask.
I enjoy using yours tools from mtsieve especially twinsieve (using it most of times).
I also read about your efforts for srsieve2.
Question: if you can make twinsieve fast as it is now, can you build sieve that will replace sr1sieve with same speed. I dont know math, and doesnot know will and in what percentage speed will be changes if sieve search for fixed n or for fixed k.
If answer is in srsieve2, then forget what I ask, I will wait :)
twinsieve is "fixed n and variable k" and the srsieve programs are "fixed k and variable n". They require completely different logic.

In theory if one has a very small range of n, then twinsieve could be modified to handle multiple n and could be faster than sr1sieve. I don't know how small the range of n would need to be, but I'm guessing less than 1000 between n_min and n_max.
rogue is offline   Reply With Quote
Old 2021-02-01, 18:59   #109
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

23·11·71 Posts
Default

I have committed my first cut at the GPU code for the CisOne logic (sr1sieve). Preliminary testing shows that it is missing some factors. On the plus side it isn't reporting any invalid factors, so the mistake might be trivial. Also on the plus side is that it appears to be about 4x faster than the CPU code, which means that it is about 3x faster than sr1sieve (on the laptop I am testing on). This could change when the bug causing the missing factors is found, but I do not know if it will make it faster or slower than it is now.

To support this I had to do some refactoring on the CisOne log to replace a multi-dimensional array with a single-dimensional array which is used by both the CPU and GPU logic. If you are brave enough to look at the sources you will see how similar the CPU and GPU code are. This should make it easier for me to track down the problem. If anyone is willing to take a look at the code in an effort to see where I have gone amiss, I would appreciate it. I just don't know how long it will take to track down and squash the bug on my own.
rogue is offline   Reply With Quote
Old 2021-02-17, 17:17   #110
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

624810 Posts
Default

I have committed some changes to gfndsieve/gfndsievecl to support ppsieve type sieving. Although I can only compare against an old version of OpenCL enabled ppsieve (version cl-0.2.3e), it is about 3x faster than it. I do not have any way to compare its speed against the speed of the CUDA enabled ppsieve. If someone has a current version of OpenCL enabled ppsieve, please share.

Is anyone interesting is doing some testing to compare the speed of either the CUDA or OpenCL ppsieve with gfndsievecl?

I posted a Windows build of gfndsievecl at sourceforge. Use the -r option to bypass building of the bitmap.

I have some more changes to boost the speed further, but I want to see where gnfdsievecl stacks up first.
rogue is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mtsieve rogue Software 543 2021-02-27 18:43
srsieve/sr2sieve enhancements rogue Software 287 2021-01-16 08:02
LLRnet enhancements kar_bon No Prime Left Behind 10 2008-03-28 11:21
TODO list and suggestions/comments/enhancements Greenbank Octoproth Search 2 2006-12-03 17:28
Suggestions for future enhancements Reboot It Software 16 2003-10-17 01:31

All times are UTC. The time now is 07:09.

Mon Mar 8 07:09:42 UTC 2021 up 95 days, 3:21, 0 users, load averages: 2.36, 2.40, 2.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.