mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Prime Gap Searches

Reply
 
Thread Tools
Old 2021-09-07, 09:06   #23
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

32×101 Posts
Default

Quote:
Originally Posted by MJansen View Post
I have been looking at Dana's Perl code, but not sure this is the correct 64 bit code: https://metacpan.org/dist/Math-Prime...ime/Util/PP.pm
No, that is fall-back code if using big numbers without GMP. It works but it's both ugly and slow. I admit some of the C code mis ugly as well, and the whole thing kind of grew organically from a tiny project.

Quote:
And it seems like he uses a small wheel (30) and a Miller_Rabin base 2 test followed by a full Lucas, so in effect a full BPSW test? Is this a correct assumption?
That's mostly true, but not for inputs > 120 bits. The part about BPSW is true. For 64-bit inputs, the results are deterministic (like most math libraries/packages). For larger results, they have passed extra-strong BPSW.


64-bit:

64-bit next_prime: https://github.com/danaj/Math-Prime-...b8/util.c#L127

64-bit prev_prime: https://github.com/danaj/Math-Prime-...b8/util.c#L155

The algorithm is:
1. If the value is tiny, use a fixed data set.
2. If the value is in our small saved sieve, use that.
3. Skip forward in a mod-30 wheel doing a primality test.

Steps 1 and 2 are useful for the many calls done with small values.

The primality test: https://github.com/danaj/Math-Prime-...mality.c#L1274

That's hard-coded tiny trial division followed by "AES" BPSW, which is deterministic for 64-bit inputs. The MR base 2 test followed by a fast Lucas test.

On an x86 for native inputs, there seems to be no benefit in running a Fermat test rather than a Miller-Rabin test. For that matter, even with GMP I wasn't able to measure any advantage.

For the full test, a hashed Miller-Rabin set (2 or 3 total tests needed) is slightly faster for base-2 pseudoprimes, but not obviously faster for primes. On platforms without fast Montgomery math, it may be different.


Larger than 64-bit:

next_prime: https://github.com/danaj/Math-Prime-...mp_main.c#L332

prev_prime: https://github.com/danaj/Math-Prime-...mp_main.c#L360

surround_primes: https://github.com/danaj/Math-Prime-...mp_main.c#L389

For next and prev prime, with values less than 121 bits we use a very similar simple algorithm. Skip forward using a mod-30 wheel, do trivial trial division, then call the primality test.

primality test: https://github.com/danaj/Math-Prime-...mality.c#L1471

which is pretests (trial division), then ES BPSW (base-2 Miller-Rabin then extra-strong Lucas test).

pretests: https://github.com/danaj/Math-Prime-...mp_main.c#L108

which is overly complicated amounts of trial division. This includes cached large GCDs (because GMP does those very quickly), and also a choice between simple trivial division and a treesieve routine (D Bernstein describes the concept at a high level, Jens K Andersen has a nice detailed description of the concept, my implementation may or may not be particularly efficient, but it's much faster than one-at-a-time trial division for large enough inputs).


This leaves numbers larger than 120 bits, as well as surround_primes, which is meant for prime gaps. Here we do a partial sieve with a width of about 30 merits and a convoluted "educated guess" as to a depth that will give us the best performance. After the sieve, the ES BPSW test is done on candidates in order. It repeats the process if no result is found of course, though at 30 merits this isn't common.

surround_primes sieves 20 merits on either side of the given input, then tests candidates. If both candidates are not found, it repeats with 40, 80 merits, etc. to find whichever one (or both) haven't been found.


I have had on my todo list for a long time, going over the method in https://arxiv.org/abs/2012.03771, which is claimed to be faster.
danaj is offline   Reply With Quote
Old 2021-09-07, 22:10   #24
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

36410 Posts
Default

Quote:
Originally Posted by danaj View Post
I have had on my todo list for a long time, going over the method in https://arxiv.org/abs/2012.03771, which is claimed to be faster.
That's my paper :)

Nothing in it helps with very small primorials or single calls to surround_primes. But it has a number of useful speedups for running many sequential surround_primes.

I did recently add a GPU tester to my repository that uses CGBN which I've found to be very useful for doing math on 256-2048 bit numbers. the CGBN repository had a pre-written miller-rabin implementation I made use, on my 1080ti it's much faster using all the 12 cores of my Ryzen 3900x
SethTro is offline   Reply With Quote
Old 2021-09-08, 06:35   #25
CraigLo
 
Mar 2021

53 Posts
Default

Quote:
Originally Posted by MJansen View Post
I bought a laptop with a GPU (Geforce 1650 mobile, 1024 cuda cores, 16 SM's of 64 cores each) for testing.
Nice. I started with an old laptop that had a 940mx gpu (384 cores I think).

Quote:
Originally Posted by SethTro View Post
I did recently add a GPU tester to my repository that uses CGBN which I've found to be very useful for doing math on 256-2048 bit numbers.
I haven't had a chance to look at the CGBN library yet. It wasn't available when I started a few years ago. One of the big improvements I made to my 65-bit code was a 65-bit multiplier. Instead of representing the 65-bit number as 3 32-bit numbers which requires 9 multiplies

Code:
(x1 x2 x3) * (y1 y2 y3) = x1y1 x1y2+x2y1 x1y3+x2y2+x3y1 x2y3+x3y2 x3y3
I use

Code:
(1 x2 x3) * (1 y2 y3) = 1 x2+y2 x2y2+x3+y3 x2y3+x3y2 x3y3
which requires 4 multiplies and 4 additions. For squaring, 6 multiplies becomes 3 multiplies. This gives me about a factor of 2 improvement. Do you know if CGBN does this or if it would be possible to add it?
CraigLo is offline   Reply With Quote
Old 2021-09-08, 17:13   #26
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

22·7·13 Posts
Default

Quote:
Originally Posted by CraigLo View Post
which requires 4 multiplies and 4 additions. For squaring, 6 multiplies becomes 3 multiplies. This gives me about a factor of 2 improvement. Do you know if CGBN does this or if it would be possible to add it?
CGBN doesn't have BITS+1 optimizations or and I doubt the author will want to add them. It's much more oriented around large input numbers.
SethTro is offline   Reply With Quote
Old 2021-09-09, 07:37   #27
MJansen
 
Jan 2018

2×29 Posts
Default

I have not been looking at the forum too regularly lately, lots of posts:

Quote:
Originally Posted by danaj View Post
No, that is fall-back code if using big numbers without GMP. It works but it's both ugly and slow. I admit some of the C code mis ugly as well, and the whole thing kind of grew organically from a tiny project.
....
Thanks Dana, appreciate the answer and the pointers, will look into them!

@Dana/@Craig I have a question regarding the bignum library in C++ under windows, let me elaborate a little first:
The fastest library for bignum calculations is GMP and GMP is native under Unix/Linux, but not under windows. Perl has a GMP library incorporated in the strawberry core files. So under Perl it is relatively easy to call on GMP functions.

C++ itself does not seem to have a bignum library incorporated (that I could find, maybe I did not search in the right place?). @Craig: how did you solve this? What bignum library are you using in cuda C++?

If one would like to use GMP under C++ windows, google will show some links about MinGw and using GMP from that instance. Does anybody have any experience with setting that up? Help would be appreciated!

Kind regards
Michiel
MJansen is offline   Reply With Quote
Old 2021-09-09, 17:44   #28
CraigLo
 
Mar 2021

53 Posts
Default

I use GMP for my CPU code but I've only used linux. I have my laptop set up so I can boot into either linux or Windows. I'm not sure if that will work for you.

For the GPU stuff I have written all my own bignum routines. It looks like CGBN is a good option now but it didn't exist when I started. I haven't tried it yet. There might be some instances where a specialized routine will be faster (e.g. 65-bit numbers).
CraigLo is offline   Reply With Quote
Old 2021-09-09, 17:49   #29
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

19×103 Posts
Default

Quote:
Originally Posted by MJansen View Post
I have not been looking at the forum too regularly lately, lots of posts:



Thanks Dana, appreciate the answer and the pointers, will look into them!

@Dana/@Craig I have a question regarding the bignum library in C++ under windows, let me elaborate a little first:
The fastest library for bignum calculations is GMP and GMP is native under Unix/Linux, but not under windows. Perl has a GMP library incorporated in the strawberry core files. So under Perl it is relatively easy to call on GMP functions.

C++ itself does not seem to have a bignum library incorporated (that I could find, maybe I did not search in the right place?). @Craig: how did you solve this? What bignum library are you using in cuda C++?

If one would like to use GMP under C++ windows, google will show some links about MinGw and using GMP from that instance. Does anybody have any experience with setting that up? Help would be appreciated!

Kind regards
Michiel
I made the migration to Linux under Windows 10 operating system using Ubuntu 20.04 LTS. I don't really understand what I'm doing, but with help from others I can now run Linux codes.
robert44444uk is offline   Reply With Quote
Old 2021-09-09, 21:14   #30
MJansen
 
Jan 2018

2·29 Posts
Default

Thnx Graig, I was wondering, after installing Strawberry Perl, a gmp.h file is installed at the laptop in the directory C:\Strawberry\c\include. The C++ code seems to accept a reference to gmp.h (include gmp.h). Could this work? I will try later with some test code, but what are your thoughts? Worth a try? Or does the GPU not work welll with GMP in your experience?

I spotted this thread online: https://stackoverflow.com/questions/...ng-gmp-library

If GMP and Cuda do not mix, I am interested in your homebrew code ;-)

I found some references to other libraries for GPU:
CUMP https://github.com/skystar0227/CUMP
XMP https://github.com/NVlabs/xmp
Campary https://homepages.laas.fr/mmjoldes/campary/

Any thoughts on those?

And an article regarding GMP implementation:
http://individual.utoronto.ca/haojun...724_Report.pdf

I get a feeling programming for a GPU will require a lot of programming and research ...

Kind regards
Michiel



Quote:
Originally Posted by CraigLo View Post
I use GMP for my CPU code but I've only used linux. I have my laptop set up so I can boot into either linux or Windows. I'm not sure if that will work for you.

For the GPU stuff I have written all my own bignum routines. It looks like CGBN is a good option now but it didn't exist when I started. I haven't tried it yet. There might be some instances where a specialized routine will be faster (e.g. 65-bit numbers).
MJansen is offline   Reply With Quote
Old 2021-09-10, 06:39   #31
CraigLo
 
Mar 2021

53 Posts
Default

I think you will be better off with a library that is designed to work well with the GPU architecture instead of trying to run GMP in the GPU.

CUMP looks like it is for floating point numbers.
XMP looks like an old version of CGBN (XMP 2.0)
Campary doesn't have any documentation that I could find.

I would start with CGBN. It is written by NVIDIA and well tested. It will handle larger numbers than my code at the moment and it is a more complete library.

You will probably still want GMP for your CPU code. I haven't used MinGW or run GMP/CUDA on Windows so I can't help there.
CraigLo is offline   Reply With Quote
Old 2021-09-10, 06:47   #32
CraigLo
 
Mar 2021

53 Posts
Default

Quote:
Originally Posted by SethTro View Post
CGBN doesn't have BITS+1 optimizations or and I doubt the author will want to add them. It's much more oriented around large input numbers.
I was reading through some of the CGBN documentation.

It says CGBN currently requires 4, 8, 16 or 32 thread per CGBN group. What is a CGBN group? Is it a single bignum?

Also, the size must be evenly divisible by 32. Does this mean it won't work for a 65 bit number? Or can you treat it as a 96 bit number with leading 0s and everything will work correctly?
CraigLo is offline   Reply With Quote
Old 2021-09-10, 07:46   #33
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

22·7·13 Posts
Default

Quote:
Originally Posted by CraigLo View Post
It says CGBN currently requires 4, 8, 16 or 32 thread per CGBN group. What is a CGBN group? Is it a single bignum?
It's a single bignum (made up of 32 bit limbs)

Quote:
Originally Posted by CraigLo View Post
Also, the size must be evenly divisible by 32. Does this mean it won't work for a 65 bit number? Or can you treat it as a 96 bit number with leading 0s and everything will work correctly?
IMO CGBN isn't going to be of much use for 65 bit numbers, maybe 96 and defiantly 128 bit numbers but not 65 bits.
SethTro is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML5 prime number searching? Dale Mahalko Software 7 2015-03-21 19:55
Elementary Literature on probabilistic aspects of prime searching hhh Math 7 2014-03-18 14:37
Question about multi-core prime searching. Trilo Riesel Prime Search 33 2013-09-19 21:21
idea for twin prime searching Mini-Geek Twin Prime Search 8 2011-01-30 00:18
Searching for user BlueSoda (no its not a prime) ltd Prime Sierpinski Project 3 2006-03-23 19:27

All times are UTC. The time now is 03:31.


Sun Oct 17 03:31:36 UTC 2021 up 85 days, 22 hrs, 0 users, load averages: 0.66, 0.81, 0.97

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.