grmfaktc: a CUDA program for generalized repunits prefactoring
1 Attachment(s)
Hi,
finally I completed the generalized repunit version of mfaktc. Changes compared to mfaktc0.21:  implemented factoring of generalized repunits  Removed Barrett and 72 bit kernels  Removed Wagstaff related stuff  Added 64 bit kernels  Compiling with moreclasses flag seem to be slightly faster, thus it is switched on  allowed are all bases >= 2, program might crash if base is larger than roughly 100,000  implemented special cases for bases 2, 3, 5, 6, 7, 8, 10, 11, 12  dropped lower limit for exponents from 100,000 to 50,000 The zip file contains the source code and executables for Linux and Windows (both 64 bit). Check if it runs correctly first .[CODE] ./grmfaktc.exe st [/CODE]Which takes a few minutes and should give a similar output at the end: [CODE] Selftest statistics number of tests 31127 successfull tests 31127 kernel  success  fail ++ UNKNOWN kernel  0  0 64bit_mul32  4633  0 75bit_mul32  5712  0 95bit_mul32  5918  0 64bit_mul32_gs  4190  0 75bit_mul32_gs  5248  0 95bit_mul32_gs  5426  0 selftest PASSED! [/CODE] Running [CODE]./grmfaktc.exe tf 23 3300019 1 60[/CODE]Example Output: [CODE]got assignment: base=23 exp=3300019 bit_min=1 bit_max=60 (0.05 GHzdays) Starting trial factoring R[23]3300019 from 2^1 to 2^60 (0.05 GHzdays) k_min = 0 k_max = 174684070698 Using GPU kernel "64bit_mul32_gs" Date Time  class Pct  time ETA  GHzd/day Sieve  Exp Base bitrange Oct 31 21:57  6 0.1%  0.009 n.a.  232.71 22837  3300019 23 1:60 R[23]3300019 has a factor: 39600229 Oct 31 21:57  1347 29.1%  0.008 n.a.  261.80 22837  3300019 23 1:60 R[23]3300019 has a factor: 1021252834106707 Oct 31 21:57  4619 100.0%  0.011 n.a.  190.40 22837  3300019 23 1:60 found 2 factors for R[23]3300019 from 2^ 1 to 2^60 [mfaktc 0.21 64bit_mul32_gs] tf(): total time spent: 19.370s [/CODE]Or running without parameters, then it uses the worktodo.txt file: [CODE] Factor=bla,66362159,64,68 Factor=bla,base=17,1055167,1,64 [/CODE]The bla string is optional. First line defaults to base=10 I attached the compiled versions of grmfaktc for Linux and Windows (both 64 bit). Executables are compiled with [CODE] NVCCFLAGS += generatecode arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code NVCCFLAGS += generatecode arch=compute_60,code=sm_60 # CC 6.0 GPUs will use this code NVCCFLAGS += generatecode arch=compute_61,code=sm_61 # CC 6.1 GPUs will use this code NVCCFLAGS += generatecode arch=compute_70,code=sm_70 # CC 7.x GPUs will use this code NVCCFLAGS += generatecode arch=compute_75,code=sm_75 # CC 7.5 GPUs will use this code [/CODE]I am using grmfaktc to find factors of base 10 repunits as the presieving step for the PRP tests. Recently the search reached the 4000000 digits milestone, but so far no new prime was found (after R270343). Help is always welcome, pm me if you want to join the search. Let me know if there are any issues. Have fun finding new factors. Cheers, Danilo 
hi,
the win64 version of grmfaktc does not work all selftests failed 
[QUOTE=lalera;529392]hi,
the win64 version of grmfaktc does not work all selftests failed[/QUOTE] What GPU do you use? On my own system (GeForce GTX 1080, compute capability 6.1 & CUDA 10.1) it works without issues. I know that a GTX 1660 was giving problems running the previous mfaktcrepunit version with the exact same compilation settings as grmfaktc, did not find the problem yet. There might be an issue with my compilation setup or the drivers for compute capabilities > 6.1. The same GPU could run mfaktc ([URL]https://download.mersenne.ca/mfaktc/mfaktc0.21/mfaktc0.21.win.cuda100.zip[/URL]) without issues. Maybe somebody else could try to compile it for windows and upload the binary here? I have no idea what the issue could be, since I cannot test it myself on Turing cards. 
hi,
i do use win10 x64 (1903) with a gtx1660ti drivers 426.00 with cuda 10.1 
hi,
i tried another machine with win10 x64 gtx1050ti nvidia driver 419.67 all selftests passed 
[QUOTE=lalera;529437]hi,
i tried another machine with win10 x64 gtx1050ti nvidia driver 419.67 all selftests passed[/QUOTE] Okay, so there seems to be pattern. Let's wait a bit, maybe we can isolate the issue with more feedback. 
I tried this on two machines. An older one running Windows 7 Pro x64 with a GTX 750Ti and a newer one running Windows 10 Pro x64, v1903, using a GTX 1080.
Using the version of mfaktc I had, the 1080 would run around 1050 GHzd/day. This one was 730 GHzd/day., more or less. This was with base 2. It seems that specifying the base in the [I]worktodo[/I] file would be problematic for [I]PrimeNet[/I] and [I]GPUto72[/I]. Perhaps it may be better to specify the base in the configuration file? I would not think many would be changing the base very often, or at all, given this projects goal of finding Mersenne prime numbers. :two cents: 
3 questions
1) Were you able to figure out the Legendre/Jacobi symbols to filter primes for all bases. 2) Is there a reason the app would crash for large bases >100,000? 3) Are negative bases also supported example 10^n+1 Thanks. 
[QUOTE=storm5510;529500]I tried this on two machines. An older one running Windows 7 Pro x64 with a GTX 750Ti and a newer one running Windows 10 Pro x64, v1903, using a GTX 1080.
Using the version of mfaktc I had, the 1080 would run around 1050 GHzd/day. This one was 730 GHzd/day., more or less. This was with base 2. It seems that specifying the base in the [I]worktodo[/I] file would be problematic for [I]PrimeNet[/I] and [I]GPUto72[/I]. Perhaps it may be better to specify the base in the configuration file? I would not think many would be changing the base very often, or at all, given this projects goal of finding Mersenne prime numbers. :two cents:[/QUOTE] When I started the fork of mfaktc I was only considering base 10 repunits, so I had to 'deoptimize' some code. I removed the Barrett kernels as they seemed unsuited to fit the base 10 and also more general bases. I needed to generalize some methods that where using the better performant shl instruction (optional_mul). Also implementing the 64 bit kernel was for speeding up lower exponents. Mersenne numbers are already factored far beyond this point.That considered the current version is definitively not optimal for factoring Mersenne numbers, but tries to focus on other bases and smaller exponents. Reading the default base from the configuration file is a good idea, will implement this soon. However, I am not sure if grmfaktc should be a complete replacement for mfaktc (it is still the project from TheJudger), or if it should be thought as an orthogonal project that puts the focus on general repunits. I could certainly try to start from scratch againg and cherrypick my changes while leaving the Mersenne & Wagstaff number stuff mainly untouched and just add more functionality. Maybe TheJudger has some thoughts about this... 
[QUOTE=Citrix;529522]
1) Were you able to figure out the Legendre/Jacobi symbols to filter primes for all bases.[/QUOTE] I figured out the symbols for bases 3, 5, 6, 7, 8, 10, 11 and 12. All other bases are just testing all remaining possible numbers. Look for methods 'class_needed_<base>' in mfaktc.c. I can certainly try to write it up here in this wiki if it is wished for. [QUOTE=Citrix;529522] 2) Is there a reason the app would crash for large bases >100,000? [/QUOTE] I have not checked in detail yet, will do this when I have a bit more time. [QUOTE=Citrix;529522] 3) Are negative bases also supported example 10^n+1 [/QUOTE] Not yet, I have to look into the Wagstaff code and try to generalize this. Hopefully this is not to complicated, except maybe for the Legendre/Jacobi symbols. 
[QUOTE=MrRepunit;529531]When I started the fork of mfaktc I was only considering base 10 repunits, so I had to 'deoptimize' some code. I removed the Barrett kernels as they seemed unsuited to fit the base 10 and also more general bases. I needed to generalize some methods that where using the better performant shl instruction (optional_mul). Also implementing the 64 bit kernel was for speeding up lower exponents. Mersenne numbers are already factored far beyond this point.That considered the current version is definitively not optimal for factoring Mersenne numbers, but tries to focus on other bases and smaller exponents.
[U][I]Reading the default base from the configuration file is a good idea, will implement this soon.[/I][/U] However, I am not sure if grmfaktc should be a complete replacement for mfaktc (it is still the project from TheJudger), or if it should be thought as an orthogonal project that puts the focus on general repunits. I could certainly try to start from scratch againg and cherrypick my changes while leaving the Mersenne & Wagstaff number stuff mainly untouched and just add more functionality. Maybe TheJudger has some thoughts about this...[/QUOTE] Thank you for the reply! If anyone wants to run Base 2, having the setting in the configuration file will remove any ambiguity and assignments, as presented by [I]PrimeNet[/I], would run without any modifications. Your project goes in a different direction so I would not fret much over base two. I find the possibility of being able to run different bases quite interesting. :smile: 
[QUOTE=storm5510;529500]Using the version of mfaktc I had, the 1080 would run around 1050 GHzd/day. This one was 730 GHzd/day., more or less. This was with base 2.
[/QUOTE]So, the ~30% performance loss in grmfaktc relative to mfaktc means Mersenne factorers should stick with the mainstream mfaktc. 
[QUOTE=kriesel;529549]So, the ~30% performance loss in grmfaktc relative to mfaktc means Mersenne factorers should stick with the mainstream mfaktc.[/QUOTE]
For now yes. At a later point I might create a version that uses the original code path for Mersenne primes, probably once I have included negative bases... 
Something I was thinking about late yesterday evening: Below is a line from your work example file in the archive.
[CODE]Factor=base=10,1055167,1,64[/CODE]Question: Are the start and end bits of the same power, as in 10[SUP]1[/SUP] to 10[SUP]64[/SUP]? 
[QUOTE]Question: Are the start and end bits of the same power, as in 10[SUP]1[/SUP] to 10[SUP]64[/SUP]?[/QUOTE]
The search boundaries are always a power of 2. 
hi,
i tried out grmfaktc0.21 with base=6, n=600k to 1000k on a gtx1050ti and the performance is good 
[QUOTE=lalera;530497]hi,
i tried out grmfaktc0.21 with base=6, n=600k to 1000k on a gtx1050ti and the performance is good[/QUOTE] Up to 2^64 the performance should be really good because there is a special 64 bit GPU kernel. To get optimal performance you should disable the stages in mfaktc.ini: [CODE]Stages=0[/CODE] If the desktop should get too unresponsive, then also lower the GPUSieveSize: [CODE]GPUSieveSize=8[/CODE] 
hi,
i think that i need stages=1 because i do use StopAfterFactor=2 in mfaktc.ini 
[QUOTE=lalera;530514]hi,
i think that i need stages=1 because i do use StopAfterFactor=2 in mfaktc.ini[/QUOTE] I have used "Stages" set to zero with "StopAfterFactor" set to 2 on some tests, like the example below. [CODE]Factor=N/A,96751147,73,75[/CODE] There is no separation, so if it finds a factor, it is done regardless of where it is in the process. 
[QUOTE=MrRepunit;529384]...[COLOR=Gray]finally I completed the generalized repunit version of mfaktc.
Changes compared to mfaktc0.21:  implemented factoring of generalized repunits  Removed Barrett and 72 bit kernels  Removed Wagstaff related stuff  Added 64 bit kernels  Compiling with moreclasses flag seem to be slightly faster, thus it is switched on  allowed are all bases >= 2, program might crash if base is larger than roughly 100,000  implemented special cases for bases 2, 3, 5, 6, 7, 8, 10, 11, 12[/COLOR] [B] dropped lower limit for exponents from 100,000 to 50,000 [/B] [/QUOTE] [U]Question[/U]: Would you be willing to do a custom build for a single individual? 
[QUOTE=storm5510;532794][U]Question[/U]: Would you be willing to do a custom build for a single individual?[/QUOTE]
Hi. Yes, I can do it if the wishedfor changes are doable in some shorter time. My guess is that you want to lower the minimal exponent limit. Can do it, but than I cannot promise it still works in all cases. Also the program will waste more time because the presieving depth is affected. But anyway, let me know what you need... 
[QUOTE=MrRepunit;532944]Hi. Yes, I can do it if the wishedfor changes are doable in some shorter time. My guess is that you want to lower the minimal exponent limit. Can do it, but than I cannot promise it still works in all cases. Also the program will waste more time because the presieving depth is affected.
But anyway, let me know what you need...[/QUOTE] Thank you for the reply. This was just a passing thought. As time has gone by, the bottom end seems to have crept up on everything. Many are 100,000, leaving the smaller exponents with ECM and not much more. Then, on only a few programs. The downside is the time required to run the small ones to higher bit sizes. An example: I have a very old factoring program called Factor5. It uses the CPU only. A while back, I gave it a small exponent, M1619 I believe it was. Start and end bits in the mid 60's. It stayed at 0.000% for 20 minutes or so before it changed. I did the math to 100%. Hundreds of years, or maybe thousands. So, it would not be anywhere near practical to do anything like this, even with a GPU. 
[QUOTE=storm5510;532948]An example: I have a very old factoring program called Factor5. It uses the CPU only.[/QUOTE]Mfactor is faster, another cpubased program.

[QUOTE=MrRepunit;529384]Hi,
finally I completed the generalized repunit version of mfaktc. Changes compared to mfaktc0.21:  implemented factoring of generalized repunits  Removed Barrett and 72 bit kernels  Removed Wagstaff related stuff  Added 64 bit kernels  Compiling with moreclasses flag seem to be slightly faster, thus it is switched on  allowed are all bases >= 2, program might crash if base is larger than roughly 100,000  implemented special cases for bases 2, 3, 5, 6, 7, 8, 10, 11, 12  dropped lower limit for exponents from 100,000 to 50,000 [/QUOTE]Nice work. Presumably this has the same 32bit exponent limit as mfaktc. If you have any plans to take that higher, a 67bit limit would be useful for a couple of exponents I've been trying to factor lately. (I'm currently using Mfactor for those. Mmff is not suitable for them since they are not doublemersennes.) Since there would be a performance hit, it's probably best to keep the 32bitexponent version available. 
1 Attachment(s)
Good news, finally I was able to implement negative bases.
Also the problem with the 1660 card should be fixed now. I attached the source code and 64 bit binaries for Linux and Windows. As usual test first if all tests are running successfully with [CODE]./grmfaktc.exe st[/CODE]It should give after some minutes and many lines of output [CODE]Selftest statistics number of tests 49113 successfull tests 49113 kernel  success  fail ++ UNKNOWN kernel  0  0 64bit_mul32  8631  0 75bit_mul32  9710  0 95bit_mul32  9915  0 64bit_mul32_gs  6188  0 75bit_mul32_gs  7246  0 95bit_mul32_gs  7423  0 selftest PASSED! [/CODE]Running from the command line would be like [CODE]./grmfaktc.exe tf 97 4956227 1 64[/CODE]If you want to use the worktodo.txt file it should be filled with lines like [CODE]Factor=4763923,60,61 Factor=base=127,1055167,1,64 Factor=base=97,1055167,1,64 Factor=base=17,1055167,1,64 Factor=base=10,1055167,1,64 Factor=4763923,60,61[/CODE]If no base is given the default is base 10. Some additional notes: I wrote a Mathematica notebook that allows to calculate the allowed remainders for any base. The script's source code can be extracted from the file allowedremaindersdata.c I give some results here: [CODE] base > {{<remainder list>}, <modulo value>}  13 > {{1, 7, 9, 11, 15, 17, 19, 25, 29, 31, 47, 49}, 52} 12 > {{1, 7, 13, 19}, 24}} 11 > {{1, 3, 5, 9, 15, 23, 25, 27, 31, 37}, 44} 10 > {{1, 7, 9, 11, 13, 19, 23, 37}, 40} 2 > {{1, 3}, 8} 2 > {{1, 7}, 8} 10 > {{1, 3, 9, 13, 27, 31, 37, 39}, 40} 11 > {{1, 5, 7, 9, 19, 25, 35, 37, 39, 43}, 44} 12 > {{1, 11, 13, 23}, 24} 13 > {{1, 3, 4, 9, 10, 12}, 13} [/CODE]Unfortunately due to the specific CUDA implementation not all relations can be used in grmfaktc. Have fun. Cheers, Danilo 
I found some problem.
In the result [I]grmfaktc 0.21[/I] I get factor. When I run [I]mprime 30.3[/I] I don't get factor. Sample: grmfacktc 0.21 [CODE]R[10]211584161 has a factor: 11109304798164647139787 [TF:73:74:mfaktc 0.21 75bit_mul32_gs] found 1 factor for R[10]211584161 from 2^73 to 2^74 [mfaktc 0.21 75bit_mul32_gs][/CODE]mprime 30.3 [CODE]M211584161 no factor from 2^73 to 2^74, Wh8: bla, AID: bla[/CODE]When I run [I]./grmfaktc.exe st[/I] all tests are running successfully. I have Ubuntu 20.04 Error in the [I]grmfaktc[/I] or maybe the settings need to be changed? 
[QUOTE=9970;562695]
[CODE]R[10]211584161 has a factor: 11109304798164647139787 [TF:73:74:mfaktc 0.21 75bit_mul32_gs] M211584161 no factor from 2^73 to 2^74, Wh8: bla, AID: bla[/CODE][/QUOTE] There is no contradiction here. R[SUB]10[/SUB]211584161 is a shorthand for (10^2115841611)/9. That's 211584161 "ones" in decimal notation. M211584161 is a shorthand for 2^2115841611. That's 211584161 "ones" in binary notation (and a much smaller number). Two different numbers. One has a factor and the other does not. You can test, using Pari/GP. [C]F=11109304798164647139787; print(Mod(10,F)^2115841611)[/C] Download gp, start gp, run these two lines. The result indeed confirms that it = 0, ergo F does divide R[SUB]10[/SUB]211584161 
Thank you, it worked
I changed the line with the assignment in [I]worktodo.txt to[/I] [CODE]Factor=bla,[B]base=2[/B],211584161,71,72[/CODE] Added [B]base=2[/B] 
Then you turned it into [C]mfaktc[/C] (which is its parent program).
Trouble is that more universal programs need extra registers to hold variables (that are in the stricter program a constant), and the class selection/enumeration code is probably more involved than in its parent [C]mfaktc[/C]. Are the registers going to be used better or worse when you are compiling a program that does more? Have you run timing tests? So it is unclear if this is simply slower than to run strict [C]mfaktc[/C] (where base=2 as a constant throughout the code, by definition). 
[QUOTE=Batalov;562866]Then you turned it into [C]mfaktc[/C] (which is its parent program).
Trouble is that more universal programs need extra registers to hold variables (that are in the stricter program a constant), and the class selection/enumeration code is probably more involved than in its parent [C]mfaktc[/C]. Are the registers going to be used better or worse when you are compiling a program that does more? Have you run timing tests? So it is unclear if this is simply slower than to run strict [C]mfaktc[/C] (where base=2 as a constant throughout the code, by definition).[/QUOTE] grmfaktc with [c]base=2[/c] is slower than vanilla mfaktc because the former currently uses a different code path for Mersenne numbers: [QUOTE=MrRepunit;529554]For now yes. At a later point I might create a version that uses the original code path for Mersenne primes, probably once I have included negative bases...[/QUOTE] 
[QUOTE=9970;562796]Thank you, it worked. I changed the line with the assignment in [I]worktodo.txt to[/I] Added [B]base=2[/B][/QUOTE]This is not the program you want for Mersenne factoring. Please use the normal [url=https://download.mersenne.ca/mfaktc/mfaktc0.21]mfaktc v0.21[/url].

[QUOTE=9970;562796]
I changed the line with the assignment in [I]worktodo.txt to[/I] [CODE]Factor=bla,[B]base=2[/B],211584161,71,72[/CODE]Added [B]base=2[/B][/QUOTE][URL]https://www.mersenne.org/report_exponent/?exp_lo=211584161&exp_hi=&full=1[/URL] shows it was already factored to 74 bits, days before the quoted post. 
[QUOTE=kriesel;562927][M]M211584161[/M] shows it was already factored to 74 bits, days before the quoted post.[/QUOTE]Actually the record of factoring 7374 is the same user ([url=https://www.mersenneforum.org/member.php?u=16557]b9970[/url]). I have had some discussions with him, and he promises to use the normal mfaktc now for his Mersenne TF work.

Can you compile a generalized version that can work for any base (positive or negative) with no limits on the size of the base (<32 bits). If it does not calculate the remainders (Legendre/Jacobi symbols) that would be fine.
Thanks. 
[QUOTE=Citrix;595625]Can you compile a generalized version that can work for any base (positive or negative) with no limits on the size of the base (<32 bits). If it does not calculate the remainders (Legendre/Jacobi symbols) that would be fine.
Thanks.[/QUOTE] Hi, there is no quick way in doing so since the fast GPU assembler routines are only using 32 bit integer multiplication. To extend this to 64 bit exponents one would have to rewrite all the routines which are using the exponent, e.g. mod_192_96() in tf_96bit.cu. I'll have a look into this but won't promise anything. 
[QUOTE=MrRepunit;596116]Hi,
there is no quick way in doing so since the fast GPU assembler routines are only using 32 bit integer multiplication. To extend this to 64 bit exponents one would have to rewrite all the routines which are using the exponent, e.g. mod_192_96() in tf_96bit.cu. I'll have a look into this but won't promise anything.[/QUOTE] I appreciate your efforts. I am looking to work on larger 32 bit bases and not larger 64 bit exponents. For example, the current program cannot handle factoring 10000001^999431+1 I just wanted to clarify to make sure we were talking about the same thing. 
1 Attachment(s)
[QUOTE=Citrix;596198]I appreciate your efforts. I am looking to work on larger 32 bit bases and not larger 64 bit exponents.
For example, the current program cannot handle factoring 10000001^999431+1 I just wanted to clarify to make sure we were talking about the same thing.[/QUOTE] Okay, your example helped to understand what the problem was: There was a buffer overflow due to the too long checkpoint file name generated from the base and exponent. I fixed this, see attached checkpoint.c file. Just replace it and recompile. If you need a Linux 64 Bit executable I can provide you with one, but chances are low that it runs on your system anyway. I hope this helps. 
1 Attachment(s)
I attached the executable for Linux 64 bit (CUDA 11.20).
[CODE] $ ./grmfaktc.exe tf 10000001 999431 1 60 mfaktc (generalized repunit edition) v0.21 (64bit built) Compiled on Dec 25 2021 <snip> CUDA version info binary compiled for CUDA 11.20 CUDA runtime version 11.20 CUDA driver version 11.20 <snip> got assignment: base=10000001 exp=999431 bit_min=1 bit_max=60 (0.15 GHzdays) Starting trial factoring R[10000001]999431 from 2^1 to 2^60 (0.15 GHzdays) INFO: No known remainders for base 10000001, falling back to simple trial factoring. INFO: Testing 1920 out of 4620 classes. k_min = 0 k_max = 576788945213 Using GPU kernel "64bit_mul32_gs" Date Time  class Pct  time ETA  GHzd/day Sieve  Exp Base bitrange Dec 26 22:51  4618 100.0%  0.020 n.a.  345.77 78133  999431 10000001 1:60 no factor for R[10000001]999431 from 2^1 to 2^60 [mfaktc 0.21 64bit_mul32_gs] tf(): total time spent: 40.149s [/CODE] 
Hi,
What's the link for the latest grmfaktc source code, if I want to compile myself for Linux? Thanks! 
I keep a mirror of sorts at [url]https://download.mersenne.ca/mfaktcgr[/url] but the most recent update I have is from over a year ago.

1 Attachment(s)
Here is the newest source code with included Linux 64 Bit binary, currently I cannot build a Windows binary. Not much has changed since last year, only the recent bugfix and some minor additional output.

All times are UTC. The time now is 07:46. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2022, Jelsoft Enterprises Ltd.