mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2021-08-06, 21:07   #3499
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

3×112 Posts
Default

Math is hard


Code:
$ cat worktodo.txt
 Factor=N/A,960477823,66,67
Factor=N/A,960477823,67,68


 $ ./mfaktc.exe
 mfaktc v0.21 (64bit built)

...

got assignment: exp=960477823 bit_min=66 bit_max=67 (0.02 GHz-days)
Starting trial factoring M960477823 from 2^66 to 2^67 (0.02 GHz-days)
 k_min =  38411594760
 k_max =  76823196254
Using GPU kernel "barrett76_mul32_gs"
M960477823 has a factor: 147602823780943516039
found 1 factor for M960477823 from 2^66 to 2^67 [mfaktc 0.21 barrett76_mul32_gs]

WARNING: ignoring line 1 in "worktodo.txt"! Reason: doesn't begin with Factor=
WARNING: ignoring line 2 in "worktodo.txt"! Reason: doesn't begin with Factor=
got assignment: exp=960477823 bit_min=67 bit_max=68 (0.03 GHz-days)
Starting trial factoring M960477823 from 2^67 to 2^68 (0.03 GHz-days)
 k_min =  76823194140
 k_max =  153646392509
M960477823 has a factor: 147602823780943516039
found 1 factor for M960477823 from 2^67 to 2^68 [mfaktc 0.21 barrett76_mul32_gs]

$ python -c 'import math; print(math.log2(147602823780943516039))'
67.00028221952357
SethTro is offline   Reply With Quote
Old 2021-08-07, 00:47   #3500
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

12468 Posts
Default

Yes, there is an overlap in the k_max (76823196254) of the 67-bit range and k_min (76823194140) of the 68-bit range. So the k of this composite factor, being 76838225853, can be found in both ranges.
Viliam Furik is offline   Reply With Quote
Old 2021-08-12, 15:45   #3501
Siegmund
 
Siegmund's Avatar
 
Mar 2014

22×13 Posts
Default Attempting to set up new system

I have just received a shiny new laptop, and am having some trouble getting it set up.


I have run mfaktc and mfakto, once each, on previous machines, and remember it mostly being a matter of having drivers up to date and picking the right one of mfaktc or mfakto. This time has been harder. Would appreciate some advice:


Windows 10, it-10750H CPU@2.60Ghz. 32GB RAM. NVIDIA GeForce GTX 1660 Ti video card.


I downloaded and unzipped mfaktc-0.21.win_cuda11.2-2047.zip.
I downloaded and installed NVIDIA's latest set of tools (cuda_11.4.1_471.41_win10.exe), and when that didn't work, uninstalled it and tried again with cuda_11.2.0_460.89_win10.exe.
I grabbed cudart64_110.dll off the web (I think off this forum!) and tried placing it various places - in the system32 directory, in the mfaktc directory, in the same place as the other NVIDIA dlls.


Each time the self-test exits with error 209: no kernel image is available for execution on the device.


Any suggestions what to try next welcome. Is there a cudart64_112.dll I need? (I didn't run across one on the web.) A different directory I need to place the cudart file in? Something else obvious I did wrong?



Complete self-test result pasted below:


D:\grb\math\mfaktc>mfaktc-win-64 -st
mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 2047Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 11.20
CUDA runtime version 11.20
CUDA driver version 11.20

CUDA device info
name GeForce GTX 1660 Ti
compute capability 7.5
max threads per block 1024
max shared memory per MP 65536 byte
number of multiprocessors 24
clock rate (CUDA cores) 1590MHz
memory clock rate: 6001MHz
memory bus width: 192 bit

Automatic parameters
threads per grid 786432
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

########## testcase 1/2867 ##########
Starting trial factoring M50804297 from 2^67 to 2^68 (0.59 GHz-days)
Using GPU kernel "75bit_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Aug 12 09:26 | 3387 0.1% | 0.001 n.a. | n.a. 82485 n.a.%
ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device

D:\grb\math\mfaktc>
Siegmund is offline   Reply With Quote
Old 2021-08-12, 16:12   #3502
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3,499 Posts
Default

Quote:
Originally Posted by Siegmund View Post
I downloaded and unzipped mfaktc-0.21.win_cuda11.2-2047.zip
Someone who knows more than I will probably correct me, but I believe the CUDA 11.2 builds are only for RTX-30xx series.
I would perhaps try mfaktc-0.21.win_cuda100-2047
Easy downloads for mfaktc (and CUDA DLLs if you need them, which should be in the mfaktc directory).
James Heinrich is offline   Reply With Quote
Old 2021-08-12, 16:12   #3503
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29×199 Posts
Default

Quote:
Originally Posted by Siegmund View Post
I have just received a shiny new laptop, and am having some trouble getting it set up.

I have run mfaktc and mfakto, once each, on previous machines, and remember it mostly being a matter of having drivers up to date and picking the right one of mfaktc or mfakto. This time has been harder. Would appreciate some advice:

Windows 10, it-10750H CPU@2.60Ghz. 32GB RAM. NVIDIA GeForce GTX 1660 Ti video card.

I downloaded and unzipped mfaktc-0.21.win_cuda11.2-2047.zip.
I downloaded and installed NVIDIA's latest set of tools (cuda_11.4.1_471.41_win10.exe), and when that didn't work, uninstalled it and tried again with cuda_11.2.0_460.89_win10.exe.
I grabbed cudart64_110.dll off the web (I think off this forum!) and tried placing it various places - in the system32 directory, in the mfaktc directory, in the same place as the other NVIDIA dlls.

Each time the self-test exits with error 209: no kernel image is available for execution on the device.

Any suggestions what to try next welcome.
Use a CUDA10 version of mfaktc.
Use the reference info, such as
https://www.mersenneforum.org/showpo...18&postcount=1
https://www.mersenneforum.org/showpo...1&postcount=11
to better understand compatibility requirements. Good luck.
Code:
mfaktc v0.21 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  SIEVE_SIZE_LIMIT          32kiB
  SIEVE_SIZE                193154bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled

Runtime options
  SievePrimes               25000
  SievePrimesAdjust         1
  SievePrimesMin            5000
  SievePrimesMax            100000
  NumStreams                4
  CPUStreams                3
  GridSize                  3
  GPUSievePrimes            92000
  GPUSieveSize              2047Mi bits
  GPUSieveProcessSize       32Ki bits
  Checkpoints               enabled
  CheckpointDelay           600s
  WorkFileAddDelay          3600s
  Stages                    enabled
  StopAfterFactor           bitlevel
  PrintMode                 full
  V5UserID                  kriesel
  ComputerID                asrock-gtx1650Super
  ProgressHeader            "Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait"
  ProgressFormat            "%d %T | %C %p%% | %t  %e |   %g  %s  %W%%"
  AllowSleep                yes
  TimeStampInResults        yes

CUDA version info
  binary compiled for CUDA  10.0
  CUDA runtime version      10.0
  CUDA driver version       11.0

CUDA device info
  name                      GeForce GTX 1650 SUPER
  compute capability        7.5
  max threads per block     1024
  max shared memory per MP  65536 byte
  number of multiprocessors 20
  clock rate (CUDA cores)   1740MHz
  memory clock rate:        6001MHz
  memory bus width:         128 bit

Automatic parameters
  threads per grid          655360
  random selftest offset    11535
  GPUSievePrimes (adjusted) 92726
  GPUsieve minimum exponent 1197042

running a simple selftest...
Selftest statistics
  number of tests           107
  successfull tests         107

selftest PASSED!
kriesel is offline   Reply With Quote
Old 2021-08-12, 16:26   #3504
Siegmund
 
Siegmund's Avatar
 
Mar 2014

22·13 Posts
Default

Thanks, james and kriesel.


I am up and running with the cuda 10 version.


GRB
Siegmund is offline   Reply With Quote
Old 2021-08-27, 21:54   #3505
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

3×112 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Not really "hard" but alot of work. The exponent is represented in a single 32bit unsigned integer in mfaktc. Main task in the host code: extend the per class sieve initializations for bigger exponents. This part is not really performance critical as long as the job is "big enough" (runtime per class > than a few seconds). Simple approach: use libgmp. The GPU code "just needs bigger numbers", too. The bad news are that you have to rewrite the complete set of functions (add, sub, mul, div, mod, ...) for bigger numbers.



Simple approach: use libgmp on the CPU to check if factors are composite or prime. Cons:
  • mfaktc depends on (another) external library
  • makes it even harder (imposible?) to release a closed source version of mfaktc (with automated primenet support)

Currently I say: no fix planned.

Oliver
Quote:
Originally Posted by SethTro View Post
I have some knowledge here (I found one of the factors of 110393069).

It's easy to detect if a factor is composite because all factors are of the form (2*k*p+1) so if the factor is composite it must be of the form (2*k_1*p+1)*(2*k_2*p+1) which restricts k_1 and k_2 to be very small so it's easy to just check if the returned factor is divisible by (2*i*p+1) for i <= 1000. I asked about adding this to mfaktc but mfaktc doesn't have good client side big int support so I never coded it up.

I think this was asked about in https://www.mersenneforum.org/showpo...postcount=1148
I had a good insight today that I can do the check for small factor WITHOUT gmp.
I want to check if <factor> mod (2 * k * exponent + 1) == 0 where <factor> doesn't fit in a int64
I break factor up to it's base 10 representation (which is what I have in the char*): digit * 10^0 + digit_2 * 10^1 + digit_3 * 10^3 + digit_4 * 10^3... I sum each digit * (10^n mod (2*k*exponent+1) to get a congruent sum.

I only need to check a handful of divisions to remove 99% of composites.
I tested with -st and -st2 and also verified that a bunch of previously found "factors" are no longer found.

Let me know how I can help get this committed.
Attached Files
File Type: txt 0001-Try-to-avoid-composite-factors.txt (5.1 KB, 33 views)
SethTro is offline   Reply With Quote
Old 2021-08-28, 03:53   #3506
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

3×112 Posts
Default

I tested this over a wider range of assignments and realized a mistake. I mistakenly assumed k had to be odd.

And my patch needs this tiny change.

--- a/src/output.c
+++ b/src/output.c
@@ -403,8 +403,7 @@ int is_small_composite(uint64_t exponent, char *factor)
* composites > (4 * 10^8 * exponent^2) can pass, but require much high bitlevels.
*/
int len = strlen(factor);
-
- for (uint64_t k = 1; k <= 10000; k += 2)
+ for (uint64_t k = 1; k <= 10000; k++)
SethTro is offline   Reply With Quote
Old 2021-09-18, 00:04   #3507
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

201608 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
2 - Install cuda.

sudo dnf install cuda-nvcc-11-3

We chose 11.3 because it is the newest available.
This works better for step 2 because you get more supporting packages:

sudo dnf install cuda-11-4

Xyzzy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1680 2021-09-13 17:01
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 08:33.


Sat Oct 16 08:33:15 UTC 2021 up 85 days, 3:02, 0 users, load averages: 0.74, 1.00, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.