The Multi-threaded Sieve Framework


The Multi-threaded Sieve Framework (called mtsieve) is a framework that I have been using for many of the sieving programs that I have written over the years.  Although those programs reference the framework, it was never promoted for other people to take advantage of.  The main idea of this framework is to help optimize the search for factors of a collection of numbers that follow a well-defined pattern.  Some examples are:


This framework can also be used to do other functions with primes, such as determining if each prime is a Wieferich Prime.

You can d/l mtsieve and all sieving programs using that framework from here.

Open and Portable

I cannot tell you how many times I wanted to search for primes on my Mac only to find that before I could do that I would have to sieve the range on a Windows OS.  Two of the most notorious examples are newpgen and fermfact.  Although the newpgen source is available, that source is a complete mess.  I understand that it was great during its heyday on 32-bit CPUs, there was nothing in it to take advantage of 64-bit CPUs.  On the other hand fermfact source is not available despite my requests to the original author.  This also has 32-bit limitations and the command line arguments are confusing at best after one has used other programs that share the same command line arguments.  And even though I am the author of MultiSieve, it was 32-bit and Windows GUI only.  mtsieve replaces fermfact in its entirely, but only covers some of the functionality of newpgen and MultiSieve.  If I get requests, I will provide work on extending mtsieve to replace other functions from newpgen and MultiSieve.

Since mtsieve is also 64-bit the memory limitations of the old 32-bit programs are gone.

With mtsieve, we now have a framework that can built and run on Windows, Linux, and Mac OS X.  For Windows I build with mingw64 so that I can use many of the standard libraries that many "old time programmers" are familiar with rather than doing something the "Visual Studio" way.  For Mac and Linux, the software should build out of the box with the given makefile.

What about the GPU?

I have already moved the GPU code into afsieve and others are to follow.  For sieves that using a discrete log, I will not create an OpenCL version.  There are two things about discrete logs that make them unsuitable to the GPU.  The first is memory.  Each "thread" requires a memory to build the hashtables, typically more than was a single thread has access to in local memory.  The second is that the GPU is very well suited to work units that can execute in the same amount of time.  For a discrete log, each prime can take a wildly different amount of time to find the solution and the GPU can only return when the prime that needs the most time to find a solutions gets that solution and while that is happening, the CPU is waiting for the results and can't do anything.

What about ARM?

Some people have expressed an interest in support ARM CPUs.  That would require ARM specific assembly routines and I don't know ARM assembly.  Fortunately mtsieve does not rely on inline assembler, so the routines that would need to be replaced are all found in the x86_64_asm folder, none of which are overly lengthy.  If someone wants to work with me to extend mtsieve to run on ARM, I would greatly appreciate it.

What about srsieve/srfile/sr1sieve/sr2sieve?

As the one who maintains these programs, they are definitely on my "to do" list.  I cannot guarantee that all features of those programs will be kept when I port them to this framework.

What about <xxx>?

It depends upon what <xxx> is.  If it is something I wrote, I'll get around to it.  If you are a developer, I'll help you use the framework to create your application.  If not, send an e-mail to me at rogue_at_wi.rr.com (change _at_ to @) and we'll discuss it.

What about 32-bit?

All of the current programs using the framework create 64-bit builds.  Nobody has asked me for a 32-bit build in years, so I have no intention of supporting one.

What the Framework Provides

The main goal of the framework is to abstract Windows, Linux, and OS X functionality from the developer using the framework.  For example, if you are a Windows software developer, you don't need to know how to create a mutex on Linux.  Likewise if you are a Mac software developer, you don't need to know how Windows threading works.  As a developer you only need to know the interface (the .h file), and not the execution (the .cpp file).  The sources are there so that you can see how it is done on other platforms, but you won't need to understand the details, you just have to call the correct functions.

The Source

Classes in core

The classes in this directory are the ones shared by all of the program using the framework.  These classes include:

Please refer to App.h and WorkerThread.h to see which method are abstract.  These are the methods that must be written by any concrete class that extends App or WorkerThread.

Classes in opencl

The classes in this directory are the ones shared by all of the program using the framework that rely on GPU.  These classes include:

Please refer to App.h and WorkerThread.h to see which method are abstract.  These are the methods that must be written by any concrete class that extends App or WorkerThread.

Classes in sieve

This is the sieveing source from primesieve 6.3, a library whose sole purpose is to generate a list of primes as quickly as possible.  Per the primsieve license, no changes should be made to this code.

Runtime Options

mtsieve has a few command line options that are available to all programs using the framework.  These are:

Programs with GPU support have some additional options available to them:

Programs extending FactorApp (instead of App) have some additional options available to them:

Each program using this framework, whether it extends App or FactorApp, can add its own command line options, but it cannot replace any that are built into the framework.

Software Using the mtsieve Framework

Here is a list of programs I've written the use this framework.  All are bundled with mtsieve and are available from the download link.

afsieve

This program searches for factors of Alternating Factorials.  This program supports these additional parameters:

mfsieve

This program searches for factors of MultiFactorials.  This program supports these additional parameters:

cksieve

This program searches for factors of Carol / Kynea numbers. These numbers a form of Near Square numbers with the form (b^n-1)^2-2 and (b^n+1)^2-2.  This program supports these additional parameters:

pixsieve

This program searches for factors of numbers that are a substring of a long decimal string where each successive term adds on decimal digit to the end of the previous decimal term.  This program supports these additional parameters:

fbncsieve

This program searches for factors of numbers in the form k*b^n+1 and k*b^n-1.  It is specifically designed to replace similar functionality in newpgen.  This program supports these additional parameters:

fkbnsieve

This program searches for factors of the form k*b^n+c for fixed k, b, and n and variable c.   This program supports these additional parameters:

gfndsieve

This program searches for factors of numbers in the form k*2^n+1 for a range of k and a range of k.  It is specifically designed to replace fermfact.  The output from this sieve should be used with pfgw and the -gxo switch to find GFN divisors. This program supports these additional parameters:

kbbsieve

This program searches for factors of numbers of the form k*b^b+1 or k*b^b-1 for fixed k and variable b.  This program supports these additional parameters:

xyyxsieve

This program searches for factors of x^y+y^x and x^y-y^x numbers.  This program supports these additional parameters:

gcwsieve

This program searches for factors of Cullen and Woodall numbers.  This program supports these additional parameters:

psieve

This program searches for factors of primorials.  This program supports these additional parameters:

twinsieve

This program searches for factors of twin numbers of the form k*b^n+1 and k*b^n-1.  This program supports these additional parameters:

dmdsieve

This program searches for factors of number of the form 2*k*(2^p-1)+1.  Numbers that are not removed from the sieve are potential divisors of Double Mersenne numbers.  This program supports these additional parameters:

How To Write Your Own Sieve Using the Framework

The best way to create your own sieve is to start with an existing sieve that is close to what you need.  This will save you a lot of time.

1)  Copy the folder for a sieve similar to what you want.
2)  Rename the folder to something meaningful (no spaces).
3)  Inside that folder, rename the .h and .cpp files, but keep the App.h, App.cpp, Worker.h, Worker.cpp portions of the name.
4)  Using NotePad++ open those files.
5)  Use the "find and replace all open files" to rename the class names to the same as the file names.
6)  Update the makefile and create a new object list for your sieve, using the App and Worker names, but with the .o suffix instead of .cpp.
7)  Update the makefile and add an entry for your program near the end.  This will tell make what objects are needed for the executable.

For example, let's say that you want a new sieve similar to cksieve.  Let's call it mcsieve, short for "my custom sieve".
1)  Copy the carol_kynea folder and rename as my_custom
2)  Rename CarolKyneaApp.h to MyCustomApp.h.
3)  Rename CarolKyneaApp.cpp to MyCustomApp.cpp.
4)  Rename CarolKyneaWorker.h to MyCustomWorker.h.
5)  Rename CarolKyneaWorker.cpp to MyCustomWorker.cpp.
6)  Edit those four files in NotePad++.
7)  Use "find and replace in all files" to  change "CarolKynea" to "MyCustom" and save.
8)  Add MC_OBJS to makefile setting to "my_custom/MyCustomApp.o my_custom/MyCustomWorker.o"
9)  Copy the entry for "cksieve" and name as "mcsieve".
10)  For mcsieve, change CK_OBJS to MC_OBJS
11)  Type "make mcsieve" from the command line and it should build without errors.

Now for the fun part, writing your custom code.

MyApp constructor

Call SetBanner() to set the banner, which is printed, when the program runs.  Call SetLogFileNam() to set the name of the log file used by this program.  Set the variables specific to the program that can.

MyApp::Help()

Call FactorApp::ParentHelp() first, then print help for each of the program specific command line options.

MyApp::AddCommandLineOptions(string &shortOpts, struct option *longOpts)

Call FactorApp::ParentAddCommandLineOptions first, then append shortOpts as necessary. For each short option, add a colon after the option to signify that the option takes a parameter.  Before returning call AppendLongOpt to add long options that will be supported by the command line.

MyApp::ParseOption(int opt, char *arg, const char *source)

Call FactorApp::ParentParseOption() first and if it returns P_UNSUPPORTED, then parse the argument for the commnad line option.  For numeric options, use Parser::Parse passing the argument along with the min and min values for the option and the variable to hold the value that is parsed. Return the status for parsing the option.

MyApp::ValidateOptions(void)

Use this method to apply additional validations to the input numbers, such as verifying that one input is less than another input.  It can also set or adjust some command line options if they were not specified on the command line.  Call FatalError() if the value for a parameter is invalid.
This method is responsible for creating the bit map representing candidates used throughout the application.
Call FactorApp::ParentValidateOptions() before returning.

MyApp::CreateWorker(uint32_t id, bool gpuWorker)

This method will create an instance of the correct worker class and return it to the caller.  The gpuWorker flag indicates if this method should create a GPU worker instead of a CPU worker.  GPU worker can only be true if ib_SupportsGPU was set to true in the constructor.

MyApp::ProcessInputTermsFile(bool haveBitMap) 

This method is called twice.  The first time it is called to determine the min/max of the candidates in the source file.  The second time it will apply the candidates from the source file to the bit map created by ValidateOptions().

MyApp::WriteOutputTermsFile(uint64_t largestPrime)

This method will create a file for the program that does the PRP testing.  This method must lock ip_FactorAppLock while accessing the list of candidates then release ip_FactorAppLock before returning.  The largestPrime is typically written to the first line of the output file and will be used as the starting prime if sieving is resumed from the file.

MyApp::ApplyFactor(const char *term)

This method is called at start up to apply factors from an input file.  The input term represents a candidate number, i.e. the actual string representing the number tested by the PRP program.  Parse this string to find the candidate then remove that candidate from the from the bitmap.  Decrement il_TermCount for each candidate removed.

MyApp::ReportFactor()

Each sieving application will need a version of this method.  In other words each application will call this method with a different list of parameters.  This method will take the inputs and turn of the bit in the bitmap.  For each found factor, it must decrement il_TermCount and increment il_FactorCount.  This method must lock ip_FactorAppLock while accessing the list of candidates then release ip_FactorAppLock before returning.
This method should return a boolean indicating if the term was removed from the bitmap.

MyWorker constructor

Allocate memory or resourced needed by this worker.  This can also be used to set instance variables from MyApp, such as the min/max for the candidates being tested.

MyWorker::TestPrimeChunk(uint64_t &largestPrimeTested, uint64_t &primesTested)

This is where the heavy lifting goes.  The code in this method will iterate thru the it_Primes vector to find factors for the candidates being sieved.  It must set largestPrimeTested and primesTested before returning to the caller.

MyWorker::CleanUp(void)

This method will free memory and other resources allocated or created by the constructor.

MyWorker::VerifyFactor()

Each sieving application will need a version of this method.  In other words each application will call this method with a different list of parameters.  If ReportFactor() returns true, then this function will use a non-optimized method to verify that the factor truly devides the candidates.  It will call FatalError() if the prime does not divide the candidate.

Version History

1.0.0 - Feburary 10, 2018

Initial Release

1.1.0 - Feburary 21, 2018

Add an internal flag that guarantee that suspends all but one Worker when processing the first chunk of primes. This is used to improve performance when there is a high factor density for low primes. This will also suppress any on screen reporting or checkpointing until that chunk is processed.
Fix issue in computing CPU utilization.
Changed -c (chunksize) option to -w (worksize).
Change output to use shorter notation for min and max primes.
cksieve - Fixed.
gfndsieve - Enable the flag mentioned above.
fbncsieve - Enable the flag mentioned above.
fkbnsieve - Added, but not tested.

1.2.0 - Feburary 23, 2018

fkbnsieve is now working.
Modify cksieve to detect candidates that are prime and to log them.
Fixed an asm bug that at worst causes factors to be missed by fbncsieve and gfndsieve. It will nor result in invalid factors and if it did, they would be caught at runtime due to built-in factor checking that relies on completely different code.
Added -A option to apply factors (or reformat candidate file) and exit immdiately without sieving.

Added GPU classes. This adds the following command line options:
-D - to select the GPU platform
-d - to select the GPU device
-G - to specify the number of GPU workers
-g - to set multiple of workgroupsize which is used to compute the number of primes per GPU worker

Added GPU workers to afsieve.

1.3.0 - March 11, 2018

Ensure that "ENABLE_GPU=no" in makefile builds all programs without error. cksieve no longer gives a fatal error if the computed root is not an actual root. This condition rarely happens, but is okay when it does.
Overriding -p from the command line should now work when starting with an input file.
Added GPU workers to xyyxsieve. When using GPU workers, an overflow with collecting factors can cause xyyxsieve to crash. If that happens override -S and/or -g or sieve more deeply with the CPU before adding GPU workers. This will be addressed in a future release.
Added GPU workers to pixsieve. It has not been tested yet.

1.4.0 - April 9, 2018

Some common functionality for GPU sieving has been moved to Worker.cpp.
All GPU workers validate factors found by the GPU.
The xyyxsieve GPU sieving issue has been resolved.
The pixsieve GPU sieving code has been tested.
GPU sieving has been added to mfsieve. It has been tested.
GPU sieving has been added to gfndsieve. It has been tested.
Add kbbsieve, for the form k*b^b+/-1 for fixed k and variable b. It has been partially tested.

1.5.0 - April 10, 2018

kbbsieve is more fully tested. It now uses a powmod that is limited to 52 bits (switching from extended floating point to SSE), which should be at least 10% faster.

1.6.0 - June 25, 2018

Fixed an error with factor rate calculation when less than 1 per second.
Fixed an issue with gfndsieve when continuing a sieve and k is less than n.
For kbbsieve, added some checks for algebraic factorizations.
Added gcwsieve for Cullens and Woodalls. This sieve is GPU enabled.
Renamed all ASM routines to easily distinguish FPU/SSE/AVX.
Added AVX asm code for use by the Worker classes.
Added a mini-chunk mode that can be used when the worker classes handles primes in chunks, such as AVX mode, which is chunks of 16 primes.
gcwsieve supports AVX. The CPU-only code is about 30% faster than Geoff Reynold's version.
xyyxsieve supports AVX. The CPU-only code is about 2.5x faster than the previous version.

1.7.0 - July 4, 2018

Added a timestamp to lines written to the log.
Changed usage of some registers in AVX code to avoid ymm0-ymm3 being passed between calls to AVX routines.
Added psieve for primorials.
psieve supports AVX and is about 30% faster than fpsieve.

1.7.1 - July 25, 2018

Fixed output for number of terms remaaining to support values > 2^32.
Modified psieve to support an input file created by fpsieve.
Fixed a bug in the non-AVX primorial ASM code that causes it to crash.

1.7.2 - July 27, 2018

Not released - fixes absorbed into 1.7.3
Fixed issue with reading ABCD files as input lines with 1 character would be ignored.
Fixed a crash upon exit of fbncsieve.
Allow override with -p when starting fbncsieve and fkbnsieve from an input file.

1.7.3 - August 3, 2018

Fixed a memory exception that affects GPU workers for all sieves.
Re-enable AVX support with psieve as accidentally disabled in 1.7.1.
Do not output factor rate if no factors found.
Fixed another issue in non-AVX psieve code that causes it to crash.