mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-01-10, 04:35   #485
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22·3·523 Posts
Default

Quote:
Originally Posted by MisterBitcoin View Post
Code:
  p=24662984657, 428.2K p/sec, 2771 factors found at 13.13 sec per factor, 98.6% done. ETC 2021-01-10 03:06
Sieve completed at p=25000000013.
Processor time: 9358.14 sec. (15.48 sieving) (3.87 cores)
I wonder how he took 15,48 for sieving. Maybe its just an error, i used srsieve v 1.1 with the -W 4 flag.

This is the first time sieving goes above 1 for me :D
The framework differentiates the time used to generate a list of primes for testing vs the rest of the execution time. You can see that a tiny percentage of time is needed for sieving compare to the function that looks for factors given a list of primes.
rogue is online now   Reply With Quote
Old 2021-01-10, 19:04   #486
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22·3·523 Posts
Default

I have released 2.1.4. Here are the changes:

Code:
   framework:
      Fixed an issue with creating GPU kernels on OS X.

   srseive2cl:  new release
      Finally an OpenCL version of srsieve2.  srsieve2cl is at least 3x faster than srsieve2,
      On my GPU it is limited to about 5000 sequences due to GPU memory limitations.  I do not
      know what the limits are for other GPUs.  It will switch to the GPU at p>1e6.
On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.
rogue is online now   Reply With Quote
Old 2021-01-10, 19:30   #487
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

2·3·461 Posts
Default

Quote:
Originally Posted by rogue View Post
I have released 2.1.4. Here are the changes:

Code:
   framework:
      Fixed an issue with creating GPU kernels on OS X.

   srseive2cl:  new release
      Finally an OpenCL version of srsieve2.  srsieve2cl is at least 3x faster than srsieve2,
      On my GPU it is limited to about 5000 sequences due to GPU memory limitations.  I do not
      know what the limits are for other GPUs.  It will switch to the GPU at p>1e6.
On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.
How much VRAM is used for 5000 sequences and 80000?
rebirther is offline   Reply With Quote
Old 2021-01-10, 19:43   #488
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11000100001002 Posts
Default

Quote:
Originally Posted by rebirther View Post
How much VRAM is used for 5000 sequences and 80000?
3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager).

I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB.

Last fiddled with by rogue on 2021-01-10 at 19:46
rogue is online now   Reply With Quote
Old 2021-01-11, 01:35   #489
Citrix
 
Citrix's Avatar
 
Jun 2003

1,579 Posts
Default

Quote:
Originally Posted by rogue View Post
I have released 2.1.4. Here are the changes:

Code:
   framework:
      Fixed an issue with creating GPU kernels on OS X.

   srseive2cl:  new release
      Finally an OpenCL version of srsieve2.  srsieve2cl is at least 3x faster than srsieve2,
      On my GPU it is limited to about 5000 sequences due to GPU memory limitations.  I do not
      know what the limits are for other GPUs.  It will switch to the GPU at p>1e6.
On an older GPU, srsieve2cl struggles with 1000 sequences causing significant lag in the display. But that GPU is also much slower, so it isn't worth running on it.
I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected?
Citrix is offline   Reply With Quote
Old 2021-01-11, 03:24   #490
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22×3×523 Posts
Default

Quote:
Originally Posted by Citrix View Post
I am getting a speed of 4kp/sec for 11 sequences from n=1M to 20M. Sr2sieve and srsieve2 are both significantly faster. Is this what is expected?
I do not look at p/sec as it is calculated differently. I look at factors per second. It is far more accurate. Nevertheless srsieve2 and sr2sieve can be faster if your GPU isn't particularly fast.
rogue is online now   Reply With Quote
Old 2021-01-11, 19:00   #491
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

3×191 Posts
Default

Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used:
  • improved caching of primes
  • improved switch statement in EratSmall and EratMedium
  • cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)
Dylan14 is online now   Reply With Quote
Old 2021-01-11, 19:04   #492
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

2·3·461 Posts
Default

Quote:
Originally Posted by rogue View Post
3257 sequences (9383 subsequences) using the GPU takes about 37 MB of RAM in the CPU and about 6 GB dedicated memory in the GPU (per Task Manager).

I do not recall how much CPU memory was used with 80000 sequences, but I thought it was around 2 GB.

Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.



srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB



2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds


The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.
rebirther is offline   Reply With Quote
Old 2021-01-11, 19:36   #493
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

188416 Posts
Default

Quote:
Originally Posted by rebirther View Post
Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.



srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB



2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds


The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.
Try using a lower value for -g (10 is the default). That should reduce some of the GPU memory usage..
rogue is online now   Reply With Quote
Old 2021-01-11, 19:38   #494
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

22×3×523 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
Might it be possible to update the primesieve code used by mtsieve to version 7.6? It seems to provide some improvements over 7.3 which is currently used:
  • improved caching of primes
  • improved switch statement in EratSmall and EratMedium
  • cache size detection improved on Linux and with the Apple Silicon CPU's (which could be useful for compiling this for ARM)
That shouldn't be too hard to do.
rogue is online now   Reply With Quote
Old 2021-01-11, 19:54   #495
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

2·3·461 Posts
Default

Quote:
Originally Posted by rebirther View Post
Tried now the cl version on a RTX 5500XT with 8GB RAM but hit the limit, there was a driver timeout because of too much RAM used, I think it was 7.4GB.

srsieve2cl.exe -n2501 -N10000 -P1e9 -M 15000 -spl_remain.txt -fB

2021-01-11 19:57:22: Sieve completed at p=1000071173. Primes tested 50772480. Found 87459308 factors. 16098192 terms remaining. Time 239.43 seconds

The speed is awesome, still running this on 16 cores srsieve2 to compare. Could be much better on faster cards with 16-24GB RAM.
vs Ryzen 3950X with 16 cores

srsieve2 -n2501 -N10000 -P1e9 -W16 -spl_remain.txt -fB

2021-01-11 20:50:35: Sieve completed at p=1000000007. Primes tested 50847420. Found 92827983 factors. 10729517 terms remaining. Time 4990.80 seconds

The CPU reduces the sievefile a bit more than GPU.
rebirther is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 18:59.

Sat Apr 10 18:59:28 UTC 2021 up 2 days, 13:40, 1 user, load averages: 1.48, 1.55, 1.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.