![]() |
[QUOTE=Prime95;321489]I minor tweak had a typo.[/QUOTE]
*snirk* |
[QUOTE=Prime95;321477]Minor update -- v 0.27:
As always previous savefiles wiln not work with 0.27 unless the -nocheck argument is used. [/QUOTE] And resumes with the same level savefile -- at least with Linux v.0.26 and below. (Haven't check 0.27 yet.) |
[QUOTE=Batalov;320508]CUDA is not [I]supported[/I] with GCC >= 4.6, but it doesn't mean that it doesn't work. It means that if you write to them with a bug report, they will not take it. I use gcc version 4.7.1 20120723 [gcc-4_7-branch revision 189773] (SUSE Linux) and ...[/QUOTE]
Exactly my setup. I remember using a compiler override switch during install. Seems to work but (as you say) don't bother with a trouble report. |
Thank you, we will!
And when I will PM you a link to the mmff-gfn source, would you repeat the previous exercise? TIA! I will then post to the mmff-gfn folder. 5 binaries, like last time. |
[QUOTE=Batalov;321529]Thank you, we will!
And when I will PM you a link to the mmff-gfn source, would you repeat the previous exercise? TIA! I will then post to the mmff-gfn folder. 5 binaries, like last time.[/QUOTE] I'm ready ;) Edit: Just did some testing, .27 fails with this error: [CODE]got assignment: k*2^101+1, k range 3334000000000 to 3335000000000 (143-bit factors) Starting trial factoring of k*2^101+1 in k range: 3334G to 3335G (143-bit factors) k_min = 3334000000000 k_max = 3335000000000 Using GPU kernel "mfaktc_barrett152_F96_127gs" [B]ERROR: Exponentiation failure[/B][/CODE] With this still left in the worktodo file: [CODE]FermatFactor=101,3334e9,3335e9 FermatFactor=111,141,142 FermatFactor=120,3e9,4e9 FermatFactor=135,880e8,881e8 FermatFactor=148,173,174 FermatFactor=149,175,176[/CODE] [B].26 works[/B] Edit2: It's only the [B]mfaktc_barrett152_F96_127gs[/B] kernel that fails, the rest work fine so far. These lines won't complete:[CODE]FermatFactor=101,3334e9,3335e9 FermatFactor=111,141,142 FermatFactor=120,3e9,4e9[/CODE] |
[QUOTE=Prime95;321477]Minor update -- v 0.27:
3) With Batalov's help, the next set of 32 n values in k*2^n+1 Fermat factor testing is available. [/QUOTE] ...and now the max range is...? |
~175
|
[QUOTE=Batalov;321561]~175[/QUOTE]
:bow: |
and still "limited" to 188 bits tf?
|
[QUOTE=flashjh;321530]Edit2: It's only the [B]mfaktc_barrett152_F96_127gs[/B] kernel that fails[/QUOTE]
Thanks for the QA I was too lazy to do. I've uploaded the fix. |
[QUOTE=firejuggler;321566]and still "limited" to 188 bits tf?[/QUOTE]
220 is the new limit on factor size |
In the new high range N~=170, the existing search limit was k=6e12 (>2^42), so mmff-0.27 cannot contribute to the range of N>=178 (because f would be > 2^220), but for the other values below 178, I currently have reached these limits (in e12 units):
[CODE]k*2^170+1 6 70.368744177663 k*2^171+1 6 70.368744177663 k*2^172+1 6 70.368744177663 k*2^173+1 6 70.368744177663 will max out at next bit k*2^174+1 6 70.368744177663e12 (maxed out) k*2^175+1 6 35.184372088831e12 (maxed out) k*2^176+1 6 17.592186044415e12 (maxed out) k*2^177+1 6 8.796093022207e12 (maxed out) [/CODE] (and I was able to finish the 140<=N<=149 ancient reservation comfortably) I will however finish those 174<=N<=179 leftovers conventionally (srsieve|newpgen + pfgw|llr sort of thing). When 180<=N<=199 k-limit was raised to 40e12, my motivation to continue the code extension significantly lessened (k~=6e12 was rather promising; sadly I didn't get lucky there). There is a lot of speed difference between N~=180 and N~=40. When I search for the new candidate ranges, I integrate probablilities (weighted by speed of computation); roughly speaking, k*N^2 is a reasonable estimate. |
1 Attachment(s)
[QUOTE=Prime95;321585]Thanks for the QA I was too lazy to do.
I've uploaded the fix.[/QUOTE] Newest mmff27 for Windows 32 and 64 attached. CUDA 4.2 dll files are [URL="http://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/CUDA-4.2-Libs-Windows.7z/download"]here[/URL], if you need them. Just put them in the mmff directory. See [URL="http://www.mersenneforum.org/showthread.php?p=312519#post312519"]here[/URL] for an example worktodo to use for testing. Can a mod delete post 295? |
Am I correct in thinking that currently mmff/mmf-gfn follows the following procedure?[LIST][*]Sieves a range of k upto the optimal level on the gpu[*]Tests whether this number is a factor of any of the possible numbers.[/LIST]Everything is also limited to 220-bits right?
What is the approximate sieve depth reached? How many candidates can be checked a second(excluding the sieving)? What proportion of candidates are actually prime? How long would it take the gpu to do a prp test on a candidate before using the candidate? I realize these answers will change with the size k and n. I am basically thinking about near to worst case scenarios(guessing large n with k meaning factor candidates are near 220-bits). I am wondering whether adding a prp test before using a candidate or adding more factoring methods(rho, ecm, fermat) would be a sensible idea. |
[QUOTE=henryzz;322503]I am wondering whether adding a prp test before using a candidate or adding more factoring methods(rho, ecm, fermat) would be a sensible idea.[/QUOTE]
Fermat divisibility test takes the same time as a prp test (a bit faster, really). More factoring methods could possibly help if they are tiny (will fit in GPU code) and faster than the main test. Adding a prp test [I]after[/I] finding a divisor was discussed - but because it is all too easy to do with external tools it wasn't high priority. It is very easy to add to the validator routine (not GPU code). |
[QUOTE=Batalov;322549]Fermat divisibility test takes the same time as a prp test (a bit faster, really).
More factoring methods could possibly help if they are tiny (will fit in GPU code) and faster than the main test. Adding a prp test [I]after[/I] finding a divisor was discussed - but because it is all too easy to do with external tools it wasn't high priority. It is very easy to add to the validator routine (not GPU code).[/QUOTE] Ok thanks :smile: |
I discovered a bug in mfakto/mfaktc ([URL="http://mersenneforum.org/showpost.php?p=344999&postcount=852"]http://mersenneforum.org/showpost.ph...&postcount=852[/URL]) when GPUSieveProcessSize=24.
I have not used mmff so far--Is that a valid configuration for mmff? If so, then there's a chance of skipping some FCs during the test, if the code is the same as mfakto and mfaktc. |
Next MMs
George, I wonder if you have any plans to develop the mmff in the future to cover some of the MMs bigger then MM127? For instance MM521, MM607, MM1279, MM2203 and MM2281?
|
Sorry, there are no further development plans for mmff
I stopped at MM127 because the grammar school multiplication used in mmff probably is not the best choice for MM521. Best is likely Karatsuba. Also, register pressure will be pretty severe. This makes further development non-trivial. |
Thank you George for the information. I really hope that you, in a few years time, will consider extending mmff if and when you have any spare time. It's a great program. I appreciate it very much!
|
[QUOTE=aketilander;351255]Thank you George for the information. I really hope that you, in a few years time, will consider extending mmff if and when you have any spare time. It's a great program. I appreciate it very much![/QUOTE]
Meanwhile, you can use Mark's gmp-doublemersenne... :wink: Luigi |
mmff for fermatfactor < 29
George,
As per Luigi's request, I am starting to compile some programs for Mac and am doing mmff first and was wondering what is the reason for the lower limit of fermatfactor of 29? Is it just because there is some coding work to be done or is there a technical or mathematical reason? Just wondering. Thanks for the great program, BTW. -Marv |
I'm not positive. I probably felt ECM is a better choice for small Fermats.
|
[QUOTE=Prime95;367998]I'm not positive. I probably felt ECM is a better choice for small Fermats.[/QUOTE]
So then it's just a matter of adding the code to handle those ranges n'est-ce pas? |
[QUOTE=tServo;368133]So then it's just a matter of adding the code to handle those ranges n'est-ce pas?[/QUOTE]
Yes. |
mmff v. 0.28 source
1 Attachment(s)
[B]Minor update -- v 0.28:[/B]
What's new: The next set of 32 N values in k*2^N+1 Fermat factor testing is available. The highest testable N is now 223, and the highest bitlevel is 252. Practically, because k<=2^45 are already tested, the highest usable N is 207 (was 175 is version 0.27), but double-checking [I]may [/I]find something (known Fermat factors for N=217 (Suyama, 1980) and N=207 (Keller, 1984) are recovered as one of the QC tests). As always, previous savefiles will not work with 0.28 unless the -nocheck argument is used. All seven new kernels are thoroughly tested, but let me know if you will get errors anyway. [I]Keep the factors coming![/I] |
[QUOTE=Batalov;376423][B]Minor update -- v 0.28:[/B]
What's new: The next set of 32 N values in k*2^N+1 Fermat factor testing is available. The highest testable N is now 223, and the highest bitlevel is 252. Practically, because k<=2^45 are already tested, the highest usable N is 207 (was 175 is version 0.27), but double-checking [I]may [/I]find something (known Fermat factors for N=217 (Suyama, 1980) and N=207 (Keller, 1984) are recovered as one of the QC tests). As always, previous savefiles will not work with 0.28 unless the -nocheck argument is used. All seven new kernels are thoroughly tested, but let me know if you will get errors anyway. [I]Keep the factors coming![/I][/QUOTE] Thank you Serge, I'm updating FermatSearch with ths new executable. 2 questions: - What do you mean with "[COLOR="Black"]The highest testable N is now 223, and the highest bitlevel is 252. Practically, because k<=2^45 are already tested, the highest usable N is 207[/COLOR]"? The previous limit of mmff-0.27 was N<174; is the actual limit equal to N<208? - Our previous version was limitd to k>2[sup]24[/sup]. Does this limit still stand? Luigi |
[QUOTE=Batalov;376423][B]Minor update -- v 0.28:[/B]
[...] [I]Keep the factors coming![/I][/QUOTE] As I am actually lacking a GPU system, may I ask for some volunteering effort? Our site would benefit from an executable version of mmff-0.28 for the main OSes. Thank you :smile: Luigi |
[QUOTE=ET_;376531]- What do you mean with "[COLOR="Black"]The highest testable N is now 223, and the highest bitlevel is 252. Practically, because k<=2^45 are already tested, the highest usable N is 207[/COLOR]"? The previous limit of mmff-0.27 was N<174; is the actual limit equal to N<208?[/QUOTE]
I mean exactly what I wrote. The previous (0.27) nominal limit was N<192. You could test N=191 for k's up to 29 bits is size. Current nominal limit is N<224. You could test N=223 for k's up to 29 bits is size. [QUOTE=ET_;376531] - Our previous version was limitd to k>2[sup]24[/sup]. Does this limit still stand?[/QUOTE] Maybe it was for very low values of N. For high values of N, 2[sup]14[/sup]<k<2[sup]63[/sup] are allowed, and indeed this input line [CODE]FermatFactor=217,16384,32767 ; Suyama (1980)[/CODE] works ...and recovers the factor. I think I want to modify the code a little bit to report factors in the normalized form. (Albeit it is not too hard to do by hand. But doing it all the time for GFN factors is getting a bit old for me and the output is looking horrendous, even if awe-inspiring) E.g. [CODE]GF(181,3) has a factor: 2269880559811350882108268285448756599351611328777559153780432306177 [TF:220:221:mmff-gfn3 0.28 mfaktc_barrett224_F160_191gs] GF(199,5) has a factor: 5240660087353831676253871671260590507538335119049430450977887836562259969 [TF:241:242:mmff-gfn5 0.28 mfaktc_barrett247_F192_223gs] GF(206,12) has a factor: 26264920974787795159990461827258813827760860586148343034700355283076513793 [TF:243:244:mmff-gfn12 0.28 mfaktc_barrett247_F192_223gs] [/CODE] will become [CODE]GF(181,3) has a factor: 370291543969*2^182+1 [TF:220:221:mmff-gfn3 0.28 mfaktc_barrett224_F160_191gs] GF(199,5) has a factor: 407658847371*2^203+1 [TF:241:242:mmff-gfn5 0.28 mfaktc_barrett247_F192_223gs] GF(206,12) has a factor: 15961621533*2^210+1 [TF:243:244:mmff-gfn12 0.28 mfaktc_barrett247_F192_223gs] [/CODE] |
Here is the patch. Some people will need the output be exactly as it was before, so they should not apply the patch.
[CODE]*** ../../mmff-0.28/src/output.c 2012-10-22 14:00:18.000000000 -0400 --- output.c 2014-06-23 14:20:52.416938792 -0400 *************** *** 438,454 **** if(factor_number < 10) { if(mystuff->mode != MODE_SELFTEST_SHORT) { if(mystuff->printmode == 1 && factor_number == 0)printf("\n"); ! printf("%s has a factor: %s\n", exponent_string, factor); } if(mystuff->mode == MODE_NORMAL) { #ifndef MORE_CLASSES ! fprintf(resultfile, "%s%s has a factor: %s [TF:%d:%d%s:mmff %s %s]\n", UID, exponent_string, factor, mystuff->bit_min, mystuff->bit_max_stage, ((mystuff->stopafterfactor == 2) && (mystuff->stats.class_counter < 96)) ? "*" : "" , MFAKTC_VERSION, mystuff->stats.kernelname); #else ! fprintf(resultfile, "%s%s has a factor: %s [TF:%d:%d%s:mmff %s %s]\n", UID, exponent_string, factor, mystuff->bit_min, mystuff->bit_max_stage, ((mystuff->stopafterfactor == 2) && (mystuff->stats.class_counter < 960)) ? "*" : "" , MFAKTC_VERSION, mystuff->stats.kernelname); #endif } } --- 438,473 ---- if(factor_number < 10) { + char k[155]; int carry, i, l, N; + + /* SB: don't want to mess with lower functions; I will simply do the k calculation here on a decimal string */ + /* factors are extremely rare, anyway */ + + l = strlen(factor)-1; + memcpy(k, factor,l+2); + for(N=0; N==0 || (k[l]%2)==0; N++) { /* factor = "k*2^N+1"; disregard last odd digit once */ + for(i=carry=0; i<=l; i++) { + int d = k[i] - '0' + 10 * carry; + carry = d & 1; + k[i] = d / 2 + '0'; + } + } + l++; /* now it is strlen */ + for(i=0; k[i]=='0'; i++); /* squeeze leading zeroes */ + if(i) { l -= i; memmove(k, k+i, l); } + sprintf(k+l, "*2^%d+1", N); + if(mystuff->mode != MODE_SELFTEST_SHORT) { if(mystuff->printmode == 1 && factor_number == 0)printf("\n"); ! printf("%s has a factor: %s\n", exponent_string, k); } if(mystuff->mode == MODE_NORMAL) { #ifndef MORE_CLASSES ! fprintf(resultfile, "%s%s has a factor: %s = %s [TF:%d:%d%s:mmff %s %s]\n", UID, exponent_string, k, factor, mystuff->bit_min, mystuff->bit_max_stage, ((mystuff->stopafterfactor == 2) && (mystuff->stats.class_counter < 96)) ? "*" : "" , MFAKTC_VERSION, mystuff->stats.kernelname); #else ! fprintf(resultfile, "%s%s has a factor: %s = %s [TF:%d:%d%s:mmff %s %s]\n", UID, exponent_string, k, factor, mystuff->bit_min, mystuff->bit_max_stage, ((mystuff->stopafterfactor == 2) && (mystuff->stats.class_counter < 960)) ? "*" : "" , MFAKTC_VERSION, mystuff->stats.kernelname); #endif } } [/CODE] |
1 Attachment(s)
Attached is the mmff-gfn v.0.28 source. I've tested it thoroughly on all new bit ranges, but let me know if you will encounter errors.
When building the binaries, build five times (after editing Makefile, the row "BASE = ..."), and do 'make clean' between builds. Jerry will probably help us build all Windows binaries as previously. Good hunting to all! |
[QUOTE=ET_;376533]As I am actually lacking a GPU system, may I ask for some volunteering effort? Our site would benefit from an executable version of mmff-0.28 for the main OSes.
Thank you :smile: Luigi[/QUOTE] Luigi, until yesterday I could only build and test on linux. I can build and post the linux64 binaries with 2.0, 2.1, 3.0 and 3.5 enabled and based on CUDA 5.0. I cannot build distributable Windows binaries (I tried - it sort of builds...); we'll have to ask Jerry. I will have to dig deep back into PMs where Xyzzy sent me instructions how to ftp binaries to the site. It's been 2 years - I don't remember any passwords at all. |
mmff and mmff-gfn 0.28 Windows binaries x86 and x64 posted to:
[URL]http://mersenneforum.org/mmff/[/URL] and [URL]http://mersenneforum.org/mmff-gfn/[/URL] Everything is CUDA 6.0, sm_20, 21, 30, 32, 35 and 50. If you need anything else, let me know. |
Many thanks! ;-)
|
[QUOTE=Batalov;377843]Many thanks! ;-)[/QUOTE]
No problem :smile: |
[QUOTE=flashjh;377841]mmff and mmff-gfn 0.28 Windows binaries x86 and x64 posted to:
[URL]http://mersenneforum.org/mmff/[/URL] and [URL]http://mersenneforum.org/mmff-gfn/[/URL] Everything is CUDA 6.0, sm_20, 21, 30, 32, 35 and 50. If you need anything else, let me know.[/QUOTE] Thank you! :bow: I will soon update FermatSearch with your executable. Now, only Linux and MAC executables are missing... :help: Luigi |
New CUDA version, new mmff required.
A friend of us required a mmff version compiled for his GTX 1060 and CUDA 8. I have no Nvidia SDK or boards on my PC at the moment, so I can provide neither Linux nor Windows ones. I hope I will have a Windows version soon (thanks to Jerry). Thank you!! Luigi |
1 Attachment(s)
I have compiled mmff-gfn with CUDA 8.0 on linux with sm_30, 35, 50, 52, 61. The binary is mmff-gfn3.exe. Hopefully, this is helpful.
The source of the file is [url=https://www.mersenneforum.org/showpost.php?p=376931&postcount=322]here[/url]. |
[QUOTE=RichD;497116]I have compiled mmff-gfn with CUDA 8.0 on linux with sm_30, 35, 50, 52, 61. The binary is mmff-gfn3.exe. Hopefully, this is helpful.
The source of the file is [url=https://www.mersenneforum.org/showpost.php?p=376931&postcount=322]here[/url].[/QUOTE] That friend of mine tried to compile the source of mmff 0.28 under Windows for a GTX 1060 with no success. Source is available to mersenneforum/mmff, download.mersenne.ca and doublemersennes.org. It is the executable for Windows (and Linux) for CUDA 8 I am looking for, as I have no CUDA platform available at this time. :smile: |
1 Attachment(s)
Attached is a Linux build (with .exe) using CUDA 8.0 and sm_20, 30, 35, 50, 52, 61.
|
[QUOTE=RichD;497151]Attached is a Linux build (with .exe) using CUDA 8.0 and sm_20, 30, 35, 50, 52, 61.[/QUOTE]
Thanks RichD! :smile: :tu: |
1 Attachment(s)
I recalled this from the earlier days. You might be able to run mmff without installing the entire CUDA suite by referencing the dynamic library. I added a "lib" folder to the previous package. It can be run by:
[CODE]LD_LIBRARY_PATH=./lib ./mmff.exe[/CODE] I reference this [url=https://www.mersenneforum.org/showpost.php?p=410994&postcount=3]post[/url]. For Linux x86_64 & CUDA 8.0. |
[QUOTE=ET_;497050]New CUDA version, new mmff required.
A friend of us required a mmff version compiled for his GTX 1060 and CUDA 8. I have no Nvidia SDK or boards on my PC at the moment, so I can provide neither Linux nor Windows ones. I hope I will have a Windows version soon (thanks to Jerry). Thank you!! Luigi[/QUOTE] :bump2: :bump: :bump2: |
2 Attachment(s)
Attached is a Windows 64bit executable and source for mmff set for CUDA 10. I post this here since this executable still gives me the error "Class problems. Factor divisible by 2, 3, 5, 7, or 11" even though the makefile "Makefile.win" is set to produce code for CC 3.0 and above (including 6.1 which covers Pascal cards, which is what I have) and I want to see if others can replicate the issue and provide me a fix, as Luigi was unable to help. Note you will need the CUDA 10 cudart dll, it can be found here: [URL]https://www.mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda100.zip[/URL].
|
[QUOTE=Dylan14;505723]Attached is a Windows 64bit executable and source for mmff set for CUDA 10. I post this here since this executable still gives me the error "Class problems. Factor divisible by 2, 3, 5, 7, or 11" even though the makefile "Makefile.win" is set to produce code for CC 3.0 and above (including 6.1 which covers Pascal cards, which is what I have) and I want to see if others can replicate the issue and provide me a fix, as Luigi was unable to help. Note you will need the CUDA 10 cudart dll, it can be found here: [URL]https://www.mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.win.cuda100.zip[/URL].[/QUOTE]
Can't test here, since all installations are CUDA8 or lower. (Lots of old model cards in the fleet that CUDA10 aware drivers don't support) |
Errors in the mmff code?
I downloaded the mmff-0.28 source code.
In mfaktc.c, there are lines like this (line 166 to 168): [CODE] if (exp == 31) { unsigned int exp_mod8, exp_mod3, exp_mod5, exp_mod7, exp_mod11; exp_mod8 = 7; exp_mod3 = 1; exp_mod5 = 2; exp_mod7 = 1; exp_mod11 = 1; [/CODE] But aren't those values incorrect? Actually 31 mod n for n={8,3,5,7,11} = 7 1 1 3 9, not 7 1 2 1 1 Similarly for exp==61, exp==89, exp==107. But exp==127 does seems to use the right numbers. This seems like a misguided optimization attempt. The code for exp == 31, 61, 89, 107, 127 could be removed, and just let the default code remain (the code that explicitly uses [c]exp % 8[/c] and [c]exp % 3[/c] etc.) This is the code that selects classes. If it doesn't select the right classes, it won't find factors. |
[QUOTE=GP2;520111]I downloaded the mmff-0.28 source code.
In mfaktc.c, there are lines like this (line 166 to 168): [CODE] if (exp == 31) { unsigned int exp_mod8, exp_mod3, exp_mod5, exp_mod7, exp_mod11; exp_mod8 = 7; exp_mod3 = 1; exp_mod5 = 2; exp_mod7 = 1; exp_mod11 = 1; [/CODE] But aren't those values incorrect? Actually 31 mod n for n={8,3,5,7,11} = 7 1 1 3 9, not 7 1 2 1 1 Similarly for exp==61, exp==89, exp==107. But exp==127 does seems to use the right numbers. This seems like a misguided optimization attempt. The code for exp == 31, 61, 89, 107, 127 could be removed, and just let the default code remain (the code that explicitly uses [c]exp % 8[/c] and [c]exp % 3[/c] etc.) This is the code that selects classes. If it doesn't select the right classes, it won't find factors.[/QUOTE] Mmff has been written by George Woltman's from a concept of Oliver Weihe. I think it's hard to believe they made such a mistake years ago and nobody noticed it... |
[QUOTE=ET_;520124]Mmff has been written by George Woltman's from a concept of Oliver Weihe. I think it's hard to believe they made such a mistake years ago and nobody noticed it...[/QUOTE]
Yes, I made an elementary mistake. Obviously, it's not 31 mod n, it's 2^31−1 mod n that has to be calculated, and then the values mod 8, 3, 5, 7, 11 are 7, 1, 2, 1, 1 as expected. |
[QUOTE=ET_;520124]I think it's hard to believe they made such a mistake years ago and nobody noticed it...[/QUOTE]
Well evidence is quite good that they are guilty of choosing a poor name for a variable. |
hi,
please can someone compile mmff for nvidia turing cards for win64 ? |
[QUOTE=lalera;520275]hi,
please can someone compile mmff for nvidia turing cards for win64 ?[/QUOTE] I have tried to compile it on Windows 10 64 bit with CUDA 10 (which supports Turing class Nvidia cards) to no avail (I had the class problems issue which is usually averted by compiling with the correct CUDA version (8+ for a Pascal card). Then I talked to Luigi, who created [URL="https://www.mersenneforum.org/showthread.php?t=23989"]this thread[/URL]. In that thread it was suggested by George Woltman to contact TheJudger (who maintains mfaktc) to look at it, and he replicated the issue I had, on several different classes of card and CUDA versions. I sent Jerry (flashjh) a PM on 5/20 after he set up his CUDA environment again, as he was the one who provided the Windows executables, to direct him to the source. Unfortunately I haven’t heard anything since then. |
[QUOTE=Dylan14;520279]I have tried to compile it on Windows 10 64 bit with CUDA 10 (which supports Turing class Nvidia cards) to no avail (I had the class problems issue which is usually averted by compiling with the correct CUDA version (8+ for a Pascal card). Then I talked to Luigi, who created [URL="https://www.mersenneforum.org/showthread.php?t=23989"]this thread[/URL]. In that thread it was suggested by George Woltman to contact TheJudger (who maintains mfaktc) to look at it, and he replicated the issue I had, on several different classes of card and CUDA versions.
I sent Jerry (flashjh) a PM on 5/20 after he set up his CUDA environment again, as he was the one who provided the Windows executables, to direct him to the source. Unfortunately I haven’t heard anything since then.[/QUOTE] hi, thank you for the information |
I succeed at compiling mmff with CUDAv10.1 toolkit and Microsoft Visual Studio 2012 on Windows 10 without class problems. Thanks for clues provided by nomead! See: [url]https://www.mersenneforum.org/showpost.php?p=527991&postcount=39[/url]
Can anyone help me do further tests to confirm the realibility of the Windows executable? Thanks!:smile: |
I made this worktodo.txt long ago as an easy way to test mmff on these 41 known Fermat factors within its limits:
[CODE]FermatFactor=36,2e10,3e10 # F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 # F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 # F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 # F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 # F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 # F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 # F48: 2139543641769 * 2^50 + 1 FermatFactor=54,66,67 # F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 # F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 # F52: 81909357657279 * 2^54 + 1 FermatFactor=61,67,68 # F58: 95 * 2^61 + 1 FermatFactor=68,121089e10,121090e10 # F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 # F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 # F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 # F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 # F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 # F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 # F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 # F90: 198922467387 * 2^92 + 1 FermatFactor=93,103,104 # F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 # F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 # F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 # F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 # F116: 3433149787 * 2^120 + 1 FermatFactor=124,146,147 # F122: 5234775 * 2^124 + 1 FermatFactor=127,129,130 # F125: 5 * 2^127 + 1 FermatFactor=135,88e9,89e9 # F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 # F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 # F146: 37092477 * 2^148 + 1 FermatFactor=149,160,161 # F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 # F147: 124567335 * 2^149 + 1 FermatFactor=154,166,167 # F150: 5439 * 2^154 + 1 FermatFactor=157,167,168 # F150: 1575 * 2^157 + 1 FermatFactor=167,197,198 # F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 # F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 # F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 # F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 # F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 # F195: 48595346636925 * 2^197 + 1 FermatFactor=207,224,227 # F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 # F215: 32111 * 2^217 + 1 [/CODE] I tested on my very old GTX Titan with Compute Capability 3.5, hopefully soon I can test on my 2080, I'm waiting to get it repaired or get a new one on warranty. Your version found 32 of the 41 factors. These 6 failed with "[B]ERROR: Class problems. Factor divisible by 2, 3, 5, 7, or 11[/B]" and it is probably because it is very small k-values, which might not be meant to work in mmff-0.28: [CODE]FermatFactor=54,66,67 # F52: 4119 * 2^54 + 1 FermatFactor=61,67,68 # F58: 95 * 2^61 + 1 FermatFactor=93,103,104 # F91: 1421 * 2^93 + 1 FermatFactor=149,160,161 # F147: 3125 * 2^149 + 1 FermatFactor=154,166,167 # F150: 5439 * 2^154 + 1 FermatFactor=157,167,168 # F150: 1575 * 2^157 + 1[/CODE] They also failed in an old mmff-0.28 version compiled with CUDA6, but they work in an even older mmff-0.27 version with CUDA4.2. These 3 failed with "[B]ERROR: Exponentiation failure[/B]", it might be my old card that can't handle these large exponents / k-values: [CODE]FermatFactor=97,482e9,483e9 # F94: 482524552001 * 2^97 + 1 FermatFactor=207,224,227 # F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 # F215: 32111 * 2^217 + 1[/CODE] They also failed in 0.28 CUDA6, but the first one worked in 0.27 CUDA4.2, but the last 2 was out of range of version 0.27. |
[QUOTE=ATH;528016]I made this worktodo.txt long ago as an easy way to test mmff on these 41 known Fermat factors within its limits:
[CODE]FermatFactor=36,2e10,3e10 # F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 # F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 # F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 # F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 # F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 # F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 # F48: 2139543641769 * 2^50 + 1 FermatFactor=54,66,67 # F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 # F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 # F52: 81909357657279 * 2^54 + 1 FermatFactor=61,67,68 # F58: 95 * 2^61 + 1 FermatFactor=68,121089e10,121090e10 # F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 # F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 # F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 # F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 # F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 # F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 # F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 # F90: 198922467387 * 2^92 + 1 FermatFactor=93,103,104 # F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 # F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 # F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 # F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 # F116: 3433149787 * 2^120 + 1 FermatFactor=124,146,147 # F122: 5234775 * 2^124 + 1 FermatFactor=127,129,130 # F125: 5 * 2^127 + 1 FermatFactor=135,88e9,89e9 # F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 # F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 # F146: 37092477 * 2^148 + 1 FermatFactor=149,160,161 # F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 # F147: 124567335 * 2^149 + 1 FermatFactor=154,166,167 # F150: 5439 * 2^154 + 1 FermatFactor=157,167,168 # F150: 1575 * 2^157 + 1 FermatFactor=167,197,198 # F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 # F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 # F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 # F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 # F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 # F195: 48595346636925 * 2^197 + 1 FermatFactor=207,224,227 # F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 # F215: 32111 * 2^217 + 1 [/CODE] I tested on my very old GTX Titan with Compute Capability 3.5, hopefully soon I can test on my 2080, I'm waiting to get it repaired or get a new one on warranty. Your version found 32 of the 41 factors. These 6 failed with "[B]ERROR: Class problems. Factor divisible by 2, 3, 5, 7, or 11[/B]" and it is probably because it is very small k-values, which might not be meant to work in mmff-0.28: [CODE]FermatFactor=54,66,67 # F52: 4119 * 2^54 + 1 FermatFactor=61,67,68 # F58: 95 * 2^61 + 1 FermatFactor=93,103,104 # F91: 1421 * 2^93 + 1 FermatFactor=149,160,161 # F147: 3125 * 2^149 + 1 FermatFactor=154,166,167 # F150: 5439 * 2^154 + 1 FermatFactor=157,167,168 # F150: 1575 * 2^157 + 1[/CODE] They also failed in an old mmff-0.28 version compiled with CUDA6, but they work in an even older mmff-0.27 version with CUDA4.2. These 3 failed with "[B]ERROR: Exponentiation failure[/B]", it might be my old card that can't handle these large exponents / k-values: [CODE]FermatFactor=97,482e9,483e9 # F94: 482524552001 * 2^97 + 1 FermatFactor=207,224,227 # F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 # F215: 32111 * 2^217 + 1[/CODE] They also failed in 0.28 CUDA6, but the first one worked in 0.27 CUDA4.2, but the last 2 was out of range of version 0.27.[/QUOTE] Well, it seems acceptable since these problems are also known to be in old version Windows executbles and no new fatal errors found. Thank you! Does any one have ideas about whether further test needed to confirm that the new compiled version is not flaky? |
My GTX 2080 found 33 of the 41 factors with your version, also missing all the ones with very small k-values, which is probably not meant to work in mmff-0.28, since those have been searched with other programs anyway.
[CODE]FermatFactor=54,66,67 # F52: 4119 * 2^54 + 1 FermatFactor=61,67,68 # F58: 95 * 2^61 + 1 FermatFactor=93,103,104 # F91: 1421 * 2^93 + 1 FermatFactor=149,160,161 # F147: 3125 * 2^149 + 1 FermatFactor=154,166,167 # F150: 5439 * 2^154 + 1 FermatFactor=157,167,168 # F150: 1575 * 2^157 + 1 FermatFactor=207,224,227 # F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 # F215: 32111 * 2^217 + 1[/CODE] |
F205 and F215 -- I'm not sure mmff can factor Fermat numbers that large.
|
[QUOTE=Prime95;528631]F205 and F215 -- I'm not sure mmff can factor Fermat numbers that large.[/QUOTE]
According to the [URL="http://www.doublemersennes.org/download.php"] double mersennes download page[/URL] mmff should be able to handle up to F223. I haven't tried to go that high with it though - usually I'd use Feromant_CUDA for those. |
[QUOTE=Dylan14;528647]According to the [URL="http://www.doublemersennes.org/download.php"] double mersennes download page[/URL] mmff should be able to handle up to F223. I haven't tried to go that high with it though - usually I'd use Feromant_CUDA for those.[/QUOTE]
Thanks for the correction. I remembered the upper limit was somewhere near F200. |
Actually, for the last two, my cards exit immediately with errors.
|
Yeah F223 is the maximum:
[QUOTE]WARNING: Exponents >= 224 are not supported in Fermat factoring! [/QUOTE] But 251 bits factors is also the highest it can do (2^251 to 2^252), so for F223 that is only k<=536,870,911 [QUOTE]WARNING: bit range isn't supported! Ignoring TF exponent 223 from 2^252 to 2^253![/QUOTE] |
hi,
many thanks to fan ming (and nomead) for the new win x64 cuda 10.1 executable of mmff v0.28 |
Maximum limits for mmff-0.28 for Fermat factoring. Tested on the Windows CUDA 10.1 version built by Fan Ming:
[url]https://www.mersenneforum.org/showpost.php?p=527991&postcount=39[/url] The ultimate limit is k < 2[SUP]64[/SUP] but for some exponents the limits is lower than that. [CODE] 28 <= n <= 223 n=28-119: k*2[SUP]n[/SUP]+1 < 2[SUP]n+64[/SUP] (92-183) k<2[SUP]64[/SUP] n=120-127: k*2[SUP]n[/SUP]+1 < 2[SUP]183[/SUP] k<2[SUP]63[/SUP] to k<2[SUP]56[/SUP] n=128-151: k*2[SUP]n[/SUP]+1 < 2[SUP]n+64[/SUP] (192-215) k<2[SUP]64[/SUP] n=152-159: k*2[SUP]n[/SUP]+1 < 2[SUP]215[/SUP] k<2[SUP]63[/SUP] to k<2[SUP]56[/SUP] n=160-183: k*2[SUP]n[/SUP]+1 < 2[SUP]n+64[/SUP] (224-247) k<2[SUP]64[/SUP] n=184-191: k*2[SUP]n[/SUP]+1 < 2[SUP]247[/SUP] k<2[SUP]63[/SUP] to k<2[SUP]56[/SUP] n=192-223: k*2[SUP]n[/SUP]+1 < 2[SUP]252[/SUP] k<2[SUP]60[/SUP] to k<2[SUP]29[/SUP] [/CODE] |
I found it curious that Andreas got errors when trying to verify the factors of F205 and F215, since Serge said he verified these factors when he released version 0.28 ([URL]https://www.mersenneforum.org/showpost.php?p=376423&postcount=317[/URL]). So I did some testing.
I confirmed that Andreas's FermatFactor=207,224,225 range dies with ERROR: Exponentiation failure. The next smaller bit range is not supported, while higher bit ranges run to completion. I then tried testing individual values of K in the 225-bit factor range, and found that 207,232905,232905 correctly finds the factor of F205. However about 30% of individual K values die with ERROR: Exponentiation failure. [CODE] // Trying bit ranges //FermatFactor=207,223,224 // WARNING: bit range isn't supported! //FermatFactor=207,224,225 // ERROR: Exponentiation failure: k range: 131072 to 262143 (225-bit factors) //FermatFactor=207,225,226 // Runs: k range: 262144 to 524287 (226-bit factors) //FermatFactor=207,226,227 // Runs: k range: 524288 to 1048575 (227-bit factors) //FermatFactor=207,227,228 // Runs: k range: 1048576 to 2097151 (228-bit factors) // Trying individual values of K in the 225 bit factor range //FermatFactor=207,232885,232885 // Runs //FermatFactor=207,232887,232887 // ERROR: Exponentiation failure //FermatFactor=207,232889,232889 // Runs //FermatFactor=207,232891,232891 // Runs //FermatFactor=207,232893,232893 // Runs //FermatFactor=207,232895,232895 // Runs //FermatFactor=207,232897,232897 // Runs //FermatFactor=207,232899,232899 // Runs //FermatFactor=207,232901,232901 // ERROR: Exponentiation failure //FermatFactor=207,232903,232903 // Runs //FermatFactor=207,232905,232905 // Runs, finds F205 factor //FermatFactor=207,232907,232907 // ERROR: Exponentiation failure //FermatFactor=207,232909,232909 // Runs //FermatFactor=207,232911,232911 // Runs //FermatFactor=207,232913,232913 // Runs //FermatFactor=207,232915,232915 // Runs //FermatFactor=207,232917,232917 // Runs //FermatFactor=207,232919,232919 // ERROR: Exponentiation failure //FermatFactor=207,232921,232921 // Runs //FermatFactor=207,232923,232923 // Runs //FermatFactor=207,232925,232925 // ERROR: Exponentiation failure [/CODE]For the 226-bit factor range, while the full 207,262144,524287 range runs without error, about 30% of individual K values continue to die with ERROR: Exponentiation failure. I also found that any range of K that contains a failing K also fails, up to that point that the range contains more than about 160000 K, at which point mmff runs to completion without error. [CODE] // Trying individual values of K in the 226 bit factor range //FermatFactor=207,419987,419987 // ERROR: Exponentiation failure //FermatFactor=207,419989,419989 // Runs //FermatFactor=207,419991,419991 // ERROR: Exponentiation failure //FermatFactor=207,419993,419993 // Runs //FermatFactor=207,419995,419995 // Runs //FermatFactor=207,419997,419997 // ERROR: Exponentiation failure //FermatFactor=207,419999,419999 // Runs //FermatFactor=207,420001,420001 // Runs //FermatFactor=207,420003,420003 // Runs //FermatFactor=207,420005,420005 // Runs //FermatFactor=207,420007,420007 // Runs //FermatFactor=207,420009,420009 // ERROR: Exponentiation failure // Trying ranges of K in the 226 bit factor range //FermatFactor=207,419999,420007 // Runs //FermatFactor=207,419997,420007 // ERROR: Exponentiation failure //FermatFactor=207,410000,420000 // ERROR: Exponentiation failure //FermatFactor=207,300000,420000 // ERROR: Exponentiation failure //FermatFactor=207,300000,450000 // ERROR: Exponentiation failure //FermatFactor=207,300000,460000 // Runs //FermatFactor=207,262144,524287 // Runs [/CODE]I see the same thing happening in recent "production" search ranges. In Andreas's recent range, individual K or small K ranges die with either ERROR: Exponentiation failure or ERROR: Class problems Factor divisible by ..., and the error will vary randomly on repeating the same test multiple times. [CODE] // Trying Andreas's full range ***** //FermatFactor=205,130000000000000,140737488355327 // Runs // Trying individual values of K in the 252 bit factor range //FermatFactor=205,130000000000001,130000000000001 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000003,130000000000003 // Runs //FermatFactor=205,130000000000005,130000000000005 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000007,130000000000007 // Runs //FermatFactor=205,130000000000009,130000000000009 // Runs //FermatFactor=205,130000000000011,130000000000011 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000013,130000000000013 // Runs //FermatFactor=205,130000000000015,130000000000015 // Runs //FermatFactor=205,130000000000017,130000000000017 // Runs //FermatFactor=205,130000000000019,130000000000019 // Exp failure OR Factor divisible (random) // Trying ranges of K in the 252 bit factor range //FermatFactor=205,130000000000000,130000000000100 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000001000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000010000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000180000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000190000 // Runs [/CODE]Also in Peter's recent range with 171 bit factors. [CODE] // Trying ranges of K in the 171 bit factor range *** //FermatFactor=120,1527888802614000,1527888802615000 // ERROR: Exponentiation failure //FermatFactor=120,1527888802600000,1527888802700000 // ERROR: Exponentiation failure //FermatFactor=120,1527888802500000,1527888802700000 // Runs, finds F118 factor [/CODE]Of course this might a problem with my system. I am running Ubuntu 18.04 LTS with Cuda 10.1 on an RTX 2080. Could someone else verify some of the results above (just comment out individual lines). If it persists, hopefully this is an mmff problem that only affects small ranges of K. But looking at the source, it appears that only a tiny fraction of K values are checked for accuracy by calling validate_exponentiation(), for obvious performance reasons. So is it possible, if highly unlikely, that undetected errors are occurring for larger ranges of K? George or Serge, would one of you have time to investigate this? For hardware validation, by using single K values and adding some recent factors, here is an expanded version of Andreas's worktodo file that should verify 41 known Fermat factors. [CODE] // Check the known Fermat factors within the ranges of mmff // Ranges supported: 28 <= exp <= 223; 64 bit <= factor size <= 252 bit; K min/max vary with exp // K min/max < 1000 are interpreted as factor bit size min/max, >= 1000 as K min/max FermatFactor=36,2e10,3e10 // F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 // F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 // F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 // F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 // F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 // F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 // F48: 2139543641769 * 2^50 + 1 FermatFactor=54,4119,4119 // F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 // F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 // F52: 81909357657279 * 2^54 + 1 //FermatFactor=61,67,68 // F58: 95 * 2^61 + 1 ***No way to specify FermatFactor=68,121089e10,121090e10 // F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 // F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 // F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 // F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 // F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 // F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 // F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 // F90: 198922467387 * 2^92 + 1 FermatFactor=93,1421,1421 // F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 // F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 // F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 // F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 // F116: 3433149787 * 2^120 + 1 FermatFactor=120,1527888802500000,1527888802700000 // F118: 1527888802614951 * 2^120 + 1 FermatFactor=124,146,147 // F122: 5234775 * 2^124 + 1 //FermatFactor=127,129,130 // F125: 5 * 2^127 + 1 ***No way to specify FermatFactor=135,1075441212600000,1075441212800000 // F132: 1075441212722595 * 2^135 + 1 FermatFactor=135,88e9,89e9 // F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 // F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 // F146: 37092477 * 2^148 + 1 FermatFactor=149,3125,3125 // F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 // F147: 124567335 * 2^149 + 1 FermatFactor=157,1575,1575 // F150: 1575 * 2^157 + 1 FermatFactor=154,5439,5439 // F150: 5439 * 2^154 + 1 FermatFactor=167,197,198 // F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 // F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 // F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 // F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 // F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 // F195: 48595346636925 * 2^197 + 1 FermatFactor=207,232905,232905 // F205: 232905 * 2^207 + 1 FermatFactor=217,32111,32111 // F215: 32111 * 2^217 + 1 [/CODE] |
My guess is that when the k-range is very small then sieving might remove all candidates and there is no candidate left to do the exponentiation.
In your single k tests the ones that work are probably the ones without any small factors. I might check later but I do not have time right now. |
1 Attachment(s)
[QUOTE=ATH;531841]My guess is that when the k-range is very small then sieving might remove all candidates and there is no candidate left to do the exponentiation.
In your single k tests the ones that work are probably the ones without any small factors. I might check later but I do not have time right now.[/QUOTE] Looks like you are right! When running small ranges, each time before an error occurs the number of factors surviving the sieve is zero (total_bit_count = 0 in the tf_*.h kernel). This causes the kernel to skip the calculations entirely, but it still copies the factor and final remainder for one value of K to the results array (RES) for validation. Since the factor and final remainder are function local variables that are never written, they contain garbage values. This explains why running the same test repeatedly produces various Factor divisible and Exponentiation failure errors. So the mystery is solved, and none of this raises any doubts about mmff correctness for large ranges of K (which I hoped and expected all along). I modified the kernels to set a flag in the results validation array (datalen = 0) when zero factors survive the sieve. Then in tf_validate.h the validation checks are skipped if datalen is zero. Hopefully this will eliminate the following errors for correctly working hardware: ERROR: Class problems. Factor divisible by 2, 3, 5, 7, or 11 ERROR: GPU sieve problems. Factor divisible by <int> ERROR: Exponentiation failure With these changes, all 43 known factors within the range of mmff can be verified using the following worktodo.txt file: [CODE] // Check the known Fermat factors within the ranges of mmff // Ranges supported: 28 <= exp <= 223; 64 bit <= factor size <= 252 bit; K min/max vary with exp // K min/max < 1000 are interpreted as factor bit size min/max, >= 1000 as K min/max FermatFactor=36,2e10,3e10 // F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 // F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 // F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 // F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 // F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 // F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 // F48: 2139543641769 * 2^50 + 1 FermatFactor=54,66,67 // F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 // F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 // F52: 81909357657279 * 2^54 + 1 FermatFactor=61,67,68 // F58: 95 * 2^61 + 1 FermatFactor=68,121089e10,121090e10 // F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 // F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 // F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 // F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 // F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 // F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 // F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 // F90: 198922467387 * 2^92 + 1 FermatFactor=93,103,104 // F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 // F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 // F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 // F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 // F116: 3433149787 * 2^120 + 1 FermatFactor=120,1527888e9,1527889e9 // F118: 1527888802614951 * 2^120 + 1 FermatFactor=124,146,147 // F122: 5234775 * 2^124 + 1 FermatFactor=127,129,130 // F125: 5 * 2^127 + 1 FermatFactor=135,1075441e9,1075442e9 // F132: 1075441212722595 * 2^135 + 1 FermatFactor=135,88e9,89e9 // F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 // F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 // F146: 37092477 * 2^148 + 1 FermatFactor=149,160,161 // F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 // F147: 124567335 * 2^149 + 1 FermatFactor=157,167,168 // F150: 1575 * 2^157 + 1 FermatFactor=154,166,167 // F150: 5439 * 2^154 + 1 FermatFactor=167,197,198 // F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 // F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 // F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 // F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 // F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 // F195: 48595346636925 * 2^197 + 1 FermatFactor=207,224,225 // F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 // F215: 32111 * 2^217 + 1 [/CODE]Here is source with these changes and a CUDA 10.1 Linux binary that will hopefully run on Kepler or later (--gpu-architecture=compute_30). I included Serge's patch to print factors found in K*2^N+1 form. If you want factors in the old format, use output.c from the 0.28 release. I also fixed a few other misc things, and changed the version to 0.28.1 to identify this binary. I am not sure who the current owner of mmff is, but if I changed anything in a "bad" way please feel free to fix it and re-post. |
@Gary: The original v0.28 version was posted by Serge ([URL]https://mersenneforum.org/showpost.php?p=376423&postcount=317[/URL]), so I would presume he is the current maintainer.
Do note, it has been 5 years since that has been posted. |
@Gary: I think it is a case of "you touch it, you own it". Congratulations.
|
1 Attachment(s)
Thanks for clues provided by Andreas!
The class problems and exp failure problems are indeed solved for mmff now, I post the source code here because I also did some other minor changes and still some problems with Windows binary. Attached file contains CUDA 10.1 binary compiled for linux-64bit and the source code. The code is based on 0.28 version, and the compiled binary can be used on Google colab. Note that changes for tf to fix the class problems in source codes are made before I saw the source files posted by Gary (I haven't check now), so notice me If I did some flaky/bad changes. Minor changes: (1) Fixed the class problems & exp failure caused by tf validate by set RES[RESULTS_ARRAY_VALIDATION_OFFSET] = 0 and do not copy other values if no candidate survives. If RES[RESULTS_ARRAY_VALIDATION_OFFSET] == 0 then just do not call validate function. Note I think that the "ERROR: Exponentiation failure" error message is somewhat unclear, so I changed it to : "ERROR: Verifying on CPU failed. Remainder didn\'t match. Possible problems exist." Please notice me if my understanding is incorrect. (2) Replaced all deprecated cudaThreadSynchronize() functions with cudaDeviceSynchronize() funtions in case they are not supported in the future. (3) In gpusieve.cu, the launch bounds for many functions are: [CODE]__global__ static void __launch_bounds__(256,6) blablabla....[/CODE] However, the maximum number of threads per stream multiprocessor for [B]Turing cards (CC 7.5)[/B] are [B]1024[/B] instead of [B]2048[/B] of all previous cards. Since it's [B]lower bound[/B] setting, this will cause overflow for Turing cards so the second parameter setting is ignored when compiling for Turing CC7.5 architecture using NVCC. I don't know if this lower bound setting is necessary, but I still changed all these launch bounds settings to: [CODE]#if __CUDA_ARCH__ < 750 __global__ static void __launch_bounds__(256,6) blablabla... #else __global__ static void __launch_bounds__(256,3) blablabla...[/CODE] Notice me if this change is incorrect. (4) Minor format reading problems fixes. The compiled binary for linux passed all 41 test cases provided by ATH: [CODE]FermatFactor=36,2e10,3e10 # F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 # F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 # F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 # F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 # F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 # F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 # F48: 2139543641769 * 2^50 + 1 FermatFactor=54,66,67 # F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 # F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 # F52: 81909357657279 * 2^54 + 1 FermatFactor=61,67,68 # F58: 95 * 2^61 + 1 FermatFactor=68,121089e10,121090e10 # F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 # F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 # F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 # F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 # F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 # F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 # F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 # F90: 198922467387 * 2^92 + 1 FermatFactor=93,103,104 # F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 # F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 # F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 # F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 # F116: 3433149787 * 2^120 + 1 FermatFactor=124,146,147 # F122: 5234775 * 2^124 + 1 FermatFactor=127,129,130 # F125: 5 * 2^127 + 1 FermatFactor=135,88e9,89e9 # F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 # F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 # F146: 37092477 * 2^148 + 1 FermatFactor=149,160,161 # F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 # F147: 124567335 * 2^149 + 1 FermatFactor=154,166,167 # F150: 5439 * 2^154 + 1 FermatFactor=157,167,168 # F150: 1575 * 2^157 + 1 FermatFactor=167,197,198 # F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 # F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 # F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 # F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 # F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 # F195: 48595346636925 * 2^197 + 1 FermatFactor=207,224,227 # F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 # F215: 32111 * 2^217 + 1[/CODE] Result: [CODE]F28 has a factor: 1766730974551267606529 [TF:70:71*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^36+1 in k range: 20G to 30G (71-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F31 has a factor: 46931635677864055013377 [TF:75:76*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^33+1 in k range: 5460G to 5470G (76-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F37 has a factor: 701179711390136401921 [TF:69:70*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^39+1 in k range: 1073741824 to 2147483647 (70-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F39 has a factor: 6300047635658008393597059073 [TF:92:93*:mmff 0.28 mfaktc_barrett96_F32_63gs] found 1 factor for k*2^41+1 in k range: 2864920G to 2864930G (93-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs] F42 has a factor: 3916660235220715932328394753 [TF:91:92*:mmff 0.28 mfaktc_barrett96_F32_63gs] found 1 factor for k*2^45+1 in k range: 111310G to 111320G (92-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs] F43 has a factor: 7482850493766970889994241 [TF:82:83*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^45+1 in k range: 210G to 220G (83-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F48 has a factor: 2408911986953445595315961857 [TF:90:91*:mmff 0.28 mfaktc_barrett96_F32_63gs] found 1 factor for k*2^50+1 in k range: 2130G to 2140G (91-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs] F52 has a factor: 74201307460556292097 [TF:66:67*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^54+1 in k range: 4096 to 8191 (67-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F52 has a factor: 389591181597081096683521 [TF:78:79*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^54+1 in k range: 16777216 to 33554431 (79-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F52 has a factor: 1475547810493913550438096961537 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F32_63gs] found 1 factor for k*2^54+1 in k range: 81900G to 81910G (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F32_63gs] F58 has a factor: 219055085875300925441 [TF:67:68*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^61+1 in k range: 64 to 127 (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F65 has a factor: 357393347081793620781479724788482049 [TF:118:119*:mmff 0.28 mfaktc_barrett120_F64_95gs] found 1 factor for k*2^68+1 in k range: 1210890G to 1210900G (119-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs] F72 has a factor: 1443765874709062348345951911937 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F64_95gs] found 1 factor for k*2^74+1 in k range: 67108864 to 134217727 (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs] F75 has a factor: 520961043404985083798310879233 [TF:98:99*:mmff 0.28 mfaktc_barrett108_F64_95gs] found 1 factor for k*2^77+1 in k range: 2097152 to 4194303 (99-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs] F77 has a factor: 3590715923977960355577974656860161 [TF:111:112*:mmff 0.28 mfaktc_barrett120_F64_95gs] found 1 factor for k*2^79+1 in k range: 5G to 6G (112-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs] F83 has a factor: 246947940268608417020015902258307792897 [TF:127:128*:mmff 0.28 mfaktc_barrett128_F64_95gs] found 1 factor for k*2^87+1 in k range: 1595G to 1596G (128-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs] F86 has a factor: 6195449970597928748332522715641578258433 [TF:132:133*:mmff 0.28 mfaktc_barrett140_F64_95gs] found 1 factor for k*2^88+1 in k range: 20018G to 20019G (133-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs] F88 has a factor: 148481934042154969241780501829489000449 [TF:126:127*:mmff 0.28 mfaktc_barrett128_F64_95gs] found 1 factor for k*2^90+1 in k range: 119G to 120G (127-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs] F90 has a factor: 985016348367230226078056532654006730753 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F64_95gs] found 1 factor for k*2^92+1 in k range: 198G to 199G (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs] F91 has a factor: 14072902366596202965053244178433 [TF:103:104*:mmff 0.28 mfaktc_barrett108_F64_95gs] found 1 factor for k*2^93+1 in k range: 1024 to 2047 (104-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs] F94 has a factor: 76459067246115642538831634131564386844673 [TF:135:136*:mmff 0.28 mfaktc_barrett140_F96_127gs] found 1 factor for k*2^97+1 in k range: 482G to 483G (136-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs] F96 has a factor: 8453027931784477309850388309101819121893377 [TF:142:143*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^101+1 in k range: 3334G to 3335G (143-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F107 has a factor: 3346902437331832346018436558958369334886401 [TF:141:142*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^111+1 in k range: 1073741824 to 2147483647 (142-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F116 has a factor: 4563438810603420826872624280490561141381005313 [TF:151:152*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^120+1 in k range: 3G to 4G (152-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F122 has a factor: 111331351706159727817280425663664652445286401 [TF:146:147*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^124+1 in k range: 4194304 to 8388607 (147-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F125 has a factor: 850705917302346158658436518579420528641 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F96_127gs] found 1 factor for k*2^127+1 in k range: 4 to 7 (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs] F133 has a factor: 3836232386548105510567872577199319351015739156856833 [TF:171:172*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^135+1 in k range: 88G to 89G (172-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F142 has a factor: 363618066009591119386121910507749518730588867002369 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^145+1 in k range: 4194304 to 8388607 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F146 has a factor: 13235038053749721162769301995307025251972223086886913 [TF:173:174*:mmff 0.28 mfaktc_barrett183_F128_159gs] found 1 factor for k*2^148+1 in k range: 33554432 to 67108863 (174-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs] F147 has a factor: 2230074519853062314153571827264836150598041600001 [TF:160:161*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^149+1 in k range: 2048 to 4095 (161-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F147 has a factor: 88894220732640180500173831441107513117330143465963521 [TF:175:176*:mmff 0.28 mfaktc_barrett183_F128_159gs] found 1 factor for k*2^149+1 in k range: 67108864 to 134217727 (176-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs] F150 has a factor: 124204803210043452689216278205372864748572142206977 [TF:166:167*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^154+1 in k range: 4096 to 8191 (167-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F150 has a factor: 287733134849521512021350451441018219494761719398401 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^157+1 in k range: 1024 to 2047 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F164 has a factor: 343390041044181900054983258125842173093877961821829176754177 [TF:197:198*:mmff 0.28 mfaktc_barrett204_F160_191gs] found 1 factor for k*2^167+1 in k range: 1073741824 to 2147483647 (198-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett204_F160_191gs] F166 has a factor: 8005705634611551271269985633916919970948098093294822472135213057 [TF:212:213*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^171+1 in k range: 2674G to 2675G (213-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F172 has a factor: 492544145925433733451855533863925475950550777193174123310743553 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^174+1 in k range: 20G to 21G (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F178 has a factor: 479744144560996421795040836675707785358665797968769873751310337 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^180+1 in k range: 300M to 400M (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F184 has a factor: 22953190542224652377639611826608942557783370967811443134226759681 [TF:213:214*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^187+1 in k range: 67108864 to 134217727 (214-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F195 has a factor: 9761213910603494986281795830720869047027739722070601061612088452553113601 [TF:242:243*:mmff 0.28 mfaktc_barrett247_F192_223gs] found 1 factor for k*2^197+1 in k range: 48594G to 48596G (243-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett247_F192_223gs] F205 has a factor: 47905779865361936656012887182939964920375512098173614759150973091841 [TF:224:225*:mmff 0.28 mfaktc_barrett236_F192_223gs] found 1 factor for k*2^207+1 in k range: 131072 to 262143 (225-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs] F215 has a factor: 6763365995538079644113691573900682504384080816814065022974359599316993 [TF:231:232*:mmff 0.28 mfaktc_barrett236_F192_223gs] found 1 factor for k*2^217+1 in k range: 16384 to 32767 (232-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs][/CODE] Some double mersennes test cases: [CODE]MMFactor=31,64,65 MMFactor=61,549e9,550e9 MMFactor=31,56e9,57e9 MMFactor=31,54e9,55e9 MMFactor=31,414.5e11,415e11 MMFactor=31,414e11,415e11 MMFactor=31,416e11,417e11[/CODE] The results are as expected without problems: [CODE]no factor for MM31 in k range: 4294967298 to 8589934595 (65-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM61 in k range: 549000000000 to 549755813887 (101-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs] no factor for MM61 in k range: 549755813888 to 550000000000 (102-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs] MM31 has a factor: 242557615644693265201 [TF:67:68*:mmff 0.28 mfaktc_barrett89_M31gs] found 1 factor for MM31 in k range: 56G to 57G (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM31 in k range: 54G to 55G (68-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM31 in k range: 41450G to 41500G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs] MM31 has a factor: 178021379228511215367151 [TF:77:78*:mmff 0.28 mfaktc_barrett89_M31gs] found 1 factor for MM31 in k range: 41400G to 41500G (78-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM31 in k range: 41600G to 41700G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs][/CODE] However, when it was compiled for Windows using Visual Studio 2019 it still failed to run (but was not class problems, etc.): [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 16M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) WARNING, no ProgressFormat specified in mmff.ini, using default TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1660 compute capability 7.5 maximum threads per block 1024 number of mutliprocessors 22 (unknown number of shader cores) clock rate 1800MHz got assignment: MM127, k range 116500000000000000 to 117000000000000000 (185-bit factors) Starting trial factoring of MM127 in k range: 116500T to 117000T (185-bit factor s) k_min = 116500000000000000 k_max = 117000000000000000 Using GPU kernel "mfaktc_barrett185_M127gs" class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 5/4620 | 108.23G | 11.263s | 3h00m | 9608.91M/s | 810549 [B][COLOR="Red"]ERROR: cudaGetLastError() returned 98: invalid device function[/COLOR][/B][/CODE] The [COLOR="Red"][B]invalid device function[/B][/COLOR] error is usually problems when a kernel was not compiled for correct CC architecture or didn't exist. I tried to get the attribute for target kernel but it also returns error. So that's [B]not[/B] because kernels are not compiled with correct CC architecture, but [B]didn't exist[/B]. I wrote a test kernel and it also raised the same problem. Yes, the program failed to recognize it, simply thought it didn't exist (and will not be executed). I don't know what went wrong for MSVC 2019 compiler to cause the programs can't recongnize the existance of any kernels, since the file size are normal. The older [B]Visual Studio 2012[/B] version [B]should work[/B], but I haven't use it now since I already uninstalled it. However, some problems must existed since all newer versions of MSVC compiler (2017 or later, I don't know about 2013 or 2015) can cause the problem. I really have no idea about that... The compiling process for Windows binary using Visual Studio follows the post by TheJudger somewhere in the forum. I haven't test the normal CUDA compiling process using Visual Studio, since it needs to adjust some including relations of header files in many source files of mmff, which is a little bit unconvenient. |
[QUOTE=Fan Ming;535756]
The [COLOR="Red"][B]invalid device function[/B][/COLOR] error is usually problems...[/QUOTE] It seems this problem can occur at linux too. Tesla P100 instances on Google colab. However, I'm not sure about this, and it's much harder to have P100 assigned on Google colab now. Can anyone confirm this? |
[QUOTE=Fan Ming;535989]It seems this problem can occur at linux too.
Tesla P100 instances on Google colab. However, I'm not sure about this, and it's much harder to have P100 assigned on Google colab now. Can anyone confirm this?[/QUOTE] Got a P100 instance successfully. It's [B]not[/B] this "invalid device funtion" error (which can cause Exp failure if failed to execute kernel and the garbage value in memory satisfies some conditions), but the [B][COLOR="Red"]real[/COLOR] Exponentiation failure[/B] error. For unknown reason the "-v 3" option couldn't work (mmff raised ERROR: can't parse -v option) on colab, so I changed the default verbosity level to 3. I ran sometimes and here is the error information: [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244016114007668591781862075984047752025015141633 = 4926629721325240139649429581548920523512559095913937 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243918643526487758168387626961494996338526257873 = 11812001209279499151039916953333557370062855661257534 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244016114007668591781862075984047752025015141633 = 4926629721325240139649429581548920523512559095913937 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244006681380457543367654871239929743410193636753 = 20826885465921148439067402367610686467153380117365399 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243923359840093282375491229333554000645937010313 = 18376582414064778318809558114847430298939300967906033 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244006681380457543367654871239929743410193636753 = 20826885465921148439067402367610686467153380117365399 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] Note the "ERROR: Verifying on CPU failed. Remainder didn\'t match. Possible problems exist." information is actually "[B]ERROR: Exponentiation failure[/B]". I changed the description of this error, see post #360 I posted several days ago. It seems the factor values were all legal values, for example, 23945244016114007668591781862075984047752025015141633, 23945243918643526487758168387626961494996338526257873, 23945243923359840093282375491229333554000645937010313, 23945244006681380457543367654871239929743410193636753 They are all legal 2kp+1 values. However, all [B]remainder values[/B] are [COLOR="Red"][B]indeed wrong[/B][/COLOR]. And for same factor value, the wrong remainder value is same. This problem [B]also exists[/B] in previous mmff 0.28 version(Before the solution of class problems, not because my changes. I haven't check previous versions now), so some problems must exists. Don't know why... |
1 Attachment(s)
Use Gary's source, and still errors occured(ran several times):
[CODE]mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243937508780909854996802036449731013568169267633 = 7606706320838621808794870660151320699229326362771323 ERROR: Exponentiation failure[/CODE] [CODE]mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243937508780909854996802036449731013568169267633 = 7606706320838621808794870660151320699229326362771323 ERROR: Exponentiation failure[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180 ERROR: Exponentiation failure[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180 ERROR: Exponentiation failure[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180 ERROR: Exponentiation failure[/CODE] However, other numbers seems work correctly(too large, see attached logs.txt, part 1): [too large, see attached logs.txt] Some other test cases(too large, only sample here, see attached logs.txt, part 2): [CODE]/content/drive/My Drive/mmff-0.28.1 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM31, k range 4294967298 to 8589934595 (65-bit factors) Starting trial factoring of MM31 in k range: 4294967298 to 8589934595 (65-bit factors) k_min = 4294967298 k_max = 8589934595 Using GPU kernel "mfaktc_barrett89_M31gs" Verifying (2^(2^31)) % 18455732847550407041 = 18041335883521486051 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 2/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455474908994598577 = 13210018195264925476 6/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450117401151801329 = 7139557165896038944 14/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450375361182446263 = 9057953753314217069 15/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455732950629622097 = 988841009176436615 26/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18449601545515020871 = 15586616874725587374 27/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454185233395425433 = 11040915153769198707 30/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451883495998061423 = 14953888264990787734 35/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455236950626641801 = 6124863183292633277 42/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453351910956141671 = 15189020512858988414 47/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450018342026132513 = 1741686722528884267 50/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451208911254996607 = 1117800095954824569 51/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18457876109244557039 = 7096891942730331470 59/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455594206006156721 = 4096068541967968090 62/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454899726974586097 = 9373744610566902525 66/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453887768255610287 = 83107776338795315 71/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453034547232853423 = 7489924297197587194 75/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453490977702154097 = 1069386786885538204 86/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454879987304902873 = 13223985354672910196 90/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18448252547827582999 = 16324629731528481613 99/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453292640407484471 = 16327008864618939019 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 107/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453868093010436473 = 14672522872021511977 110/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456844509640145767 = 158751210797949868 WARNING: Factor divisible by 293. Only occasionally should GPU sieve let small factors slip through 111/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456030969820218169 = 13243929036377631083 114/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18459761428087931279 = 928694628223646081 119/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451864026911317721 = 12732358327742052958 122/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456308819844401617 = 414574268522609115 126/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455495301499310489 = 9211324510596461997 134/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18457836750164274823 = 16899488420486930904 135/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451705387999346537 = 17081498552114410952 146/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455812841316257791 = 11683251650899530624 147/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451427606694639793 = 13103389033054754948 150/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453590487799388783 = 11499864167607720722 155/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450733149137905639 = 12859935701888440148 159/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454027058339922001 = 11659533504391022171 162/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18452677772889675431 = 10744843596348203562 167/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454900173651184673 = 1859362368207149326 170/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453610399267763767 = 8701203538784565069 171/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454324751113003729 = 6213104066269045180 174/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451685699869270841 = 6495362410738542899 182/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18464305726823282567 = 5979809582257939535 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 191/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456805180624634609 = 12467334056667722874 194/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453848615333758183 = 14209312759826643484 195/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453451807600432817 = 10388588196196914677 206/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18452122360604117233 = 10349137720694434220 210/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455614705885050983 = 167229099570259508 215/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453471719068807801 = 14507868372907181248 222/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453610639785932231 = 17813826982514021073 227/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456011629582493287 = 1627973603262381094 231/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18449781019313335249 = 12125103717347465058 234/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677[/CODE] All seems [B]work properly[/B]. However, once I changed to [B]MM127[/B], errors [B]occured again[/B]: [CODE]/content/drive/My Drive/mmff-0.28.1 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 562949953421312 to 1125899906842623 (178-bit factors) Starting trial factoring of MM127 in k range: 562949953421312 to 1125899906842623 (178-bit factors) k_min = 562949953421312 k_max = 1125899906842623 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 191561943147467962859727723905659853364042304328803289 = 25168583490388808698318691898045119457541087143113062 ERROR: Exponentiation failure[/CODE] Other mmff 0.28 version are the same (including the original version with some class problems unsolved and the version I posted). Possible bugs exist. |
Same problems for [B]MM107[/B], but MM89 is normal:
[CODE]/content/drive/My Drive/mmff-test mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors) Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett160_M107gs" Verifying (2^(2^107)) % 13435069371854815219033511685499715361952762321 = 974520303404695347505301237807931102140431668099 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] MM89 works properly: [CODE]/content/drive/My Drive/mmff-test mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM89, k range 41400000000000 to 41500000000000 (136-bit factors) Starting trial factoring of MM89 in k range: 41400G to 41500G (136-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett140_M89gs" Verifying (2^(2^89)) % 51250722476366711691515168579592911982721 = 37671549122511752130292866601915335328068 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 0/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720280168236496304157387929107838071 = 35746096159163930640949829473693574340078 5/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250719954174058311049257317093331713479 = 22759295645343611258946139802672470959760 9/4620 | 21.65M | 0.029s | n.a. | 746.60M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720852115103746739125095447985265401 = 41842644712508723081556126612950349320116 20/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721744324486800537682201019693669463 = 13062456361537928045073778273891658192745 21/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250722110368501136753204925391936624199 = 11766302253559315831356912138896967481965 29/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720709149122429788413288172826239287 = 14816860850408810792926573186880149802296 33/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721052309815139813681631034757950353 = 41152310359413274585223328516751757168125 36/4620 | 21.65M | 0.028s | n.a. | 773.27M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721263933188975570868864490245452809 = 44183317763900802218115380969512121058940 44/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721378323800365697147786268920062497 = 18692344536121868666837048177982998180467 48/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721395487839010388945297745277400527 = 3166578919721146857552725561773689514712 53/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250722287699697944266073163866784053033 = 37430319078903975242289720426417282202568 56/4620 | 21.65M | 0.027s | n.a. | 801.91M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721481285749313140796010178879854681 = 13236591153213340344689881456839734478969 60/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250721086661413289943698879210555986631 = 29373416315097083858424261053021555658515 65/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% Verifying (2^(2^89)) % 51250720960840901517095503879288267435217 = 28658988341202110234172669662839524833844 68/4620 | 21.65M | 0.026s | n.a. | 832.75M/s | 649781 | n.a.% ...[/CODE] |
GPUSieveSize limit
Various builds of mmff v0.28 have been posted. Do any of these support GPUSieveSize from 128 to 2047, like the recent increase in mfaktc? There seems to be an advantage all the way up to 128 and a bit of underutilization left yet there, on a GTX1650, and there likely is on other fast gpus also.
win 7 x64 gtx1650 mmff tune mm127, 120000T to 120500T GPUSievePrimes 810549 GPUSieveSize 16 GpuSieveProcessSize 32 367.75 66W 95% utilization GPUSievePrimes 810549 GPUSieveSize 32 GpuSieveProcessSize 32 380.41 GPUSievePrimes 810549 GPUSieveSize 64 GpuSieveProcessSize 32 387.10 99% GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 389.59 * 66W 99% GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 GPUSieveSize capped at 128 |
[QUOTE=kriesel;538323]Various builds of mmff v0.28 have been posted. Do any of these support GPUSieveSize from 128 to 2047, like the recent increase in mfaktc? There seems to be an advantage all the way up to 128 and a bit of underutilization left yet there, on a GTX1650, and there likely is on other fast gpus also.
[/QUOTE] I've ever tried to enlarge the upper limit to 2047, however, the speed gain seems no significant. I experimented it on colab T4. |
1 Attachment(s)
[QUOTE=Fan Ming;538340]I've ever tried to enlarge the upper limit to 2047, however, the speed gain seems no significant. I experimented it on colab T4.[/QUOTE]
Thanks for your response. Please post any T4 throughput data versus GPUSieveSize that you have collected. It appears to me after graphing the GTX1650 data I've collected, to offer about 0.6% additional throughput on that gpu model, or 2 to 2.5 days per year, depending on a 2047 or 4095 revised limit. Based on mfaktc experience, the effect is likely larger for faster gpus, and there are considerably faster than the GTX1650, such as the RTX2080 and similar, or the Tesla T4. |
[QUOTE=kriesel;538364]Thanks for your response. Please post any T4 throughput data versus GPUSieveSize that you have collected.
It appears to me after graphing the GTX1650 data I've collected, to offer about 0.6% additional throughput on that gpu model, or 2 to 2.5 days per year, depending on a 2047 or 4095 revised limit. Based on mfaktc experience, the effect is likely larger for faster gpus, and there are considerably faster than the GTX1650, such as the RTX2080 and similar, or the Tesla T4.[/QUOTE] Sorry I didn't keep the detailed data. I tested MM89, and Raw rate is about 1340? when GPUSieveSize is 128, and still ~1340 when GPUSieveSize is 2047. Since the change was not too significant, I'm not impressed with that and didn't keep the data. |
2047 GPUSieveSize limit Windows build requested
Please make and post a Windows 7 x64 through Windows 10 x64 CUDA 10.x compatible build allowing GPUSieveSize up to 2047. Switching to unsigned int for 4095 would be more work.
|
1 Attachment(s)
Compiled fixed mmff 0.28 (in this post: [url]https://www.mersenneforum.org/showpost.php?p=535756&postcount=360[/url]) CUDA 10.1 version for Windows 64bit using Microsoft Visual Studio 2012. This time all test cases should pass now(though some Exp failure problem described in this post: [url]https://www.mersenneforum.org/showpost.php?p=535994&postcount=362[/url] still remain unsolved for specific card). The 2047 version will be posted later.
|
1 Attachment(s)
Compiled fixed mmff 0.28 for Windows 64 with GPUSievesizemax enlarged to 2047. It seems some code in the gpusieve.cu require to negate the GPUSievesize and involves arithmetic for signed 32 bit integer, so I didn't make change for further 4095. Only 2047 version here.
|
Going to 2047
Thanks for the builds, Fan Ming!
As before, Win7x64, GTX1650, etc 128-2047 variation tune feb 28: [CODE]GPUSievePrimes 810549 GPUSieveSize 128 GpuSieveProcessSize 32 384.14 62W/75 99% GPUSievePrimes 810549 GPUSieveSize 256 GpuSieveProcessSize 32 386.14 66w 100% GPUSievePrimes 810549 GPUSieveSize 512 GpuSieveProcessSize 32 386.24 65w 100% GPUSievePrimes 810549 GPUSieveSize 1024 GpuSieveProcessSize 32 386.65 63w 100% GPUSievePrimes 810549 GPUSieveSize 2047 GpuSieveProcessSize 32 386.66 * 386.66/384.14= 1.00656 gain from 2047 over 128 GPUSieveSize[/CODE] I would expect somewhat more gain than that ratio, on faster gpus. |
Build request
For mmff v0.28, I see here,
CUDA ? OS? source only? [URL]https://mersenneforum.org/showpost.php?p=376423&postcount=317[/URL] CUDA 6 Win x86 and x64 [URL]https://mersenneforum.org/mmff/[/URL] CUDA 8.0 linux [URL]https://mersenneforum.org/showpost.php?p=497116&postcount=329[/URL] CUDA 8.0 linux [URL]https://mersenneforum.org/showpost.php?p=497151&postcount=331[/URL] CUDA 8.0 linux x64 [URL]https://mersenneforum.org/showpost.php?p=497231&postcount=333[/URL] CUDA 10. win 64 [URL]https://mersenneforum.org/showpost.php?p=505723&postcount=335[/URL] CUDA 10.1 linux [URL]https://mersenneforum.org/showpost.php?p=535756&postcount=360[/URL] CUDA 10.1 Win [URL]https://mersenneforum.org/showpost.php?p=538430&postcount=370[/URL] CUDA 10.1 GpuSieveSize 2047 max Win [URL]https://mersenneforum.org/showpost.php?p=538431&postcount=371[/URL] Could we also get a CUDA 8.0 Win 64 build with GpuSieveSize 2047 max, posted here? That would suit GTX10xx. |
2 Attachment(s)
Attached are two builds of mmff v0.28.1 (Gary's source), compiled on Ubuntu 20.04, with Cuda 10.1 and sm_61 (good for Pascal cards, ie GTX10xx). The first build is with a default max sieve size, the other is with max sieve size 2047. These run the worktodo_check file with no issues, however, MM107 still doesn't work:
[CODE]dylan@dylan-G11CD:~/Desktop/mmff-0.28.1$ ./mmff.exe -v 3 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1060 6GB compute capability 6.1 maximum threads per block 1024 number of mutliprocessors 10 (unknown number of shader cores) clock rate 1708MHz got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors) Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett160_M107gs" Verifying (2^(2^107)) % 13435069353863506604210333952641545581205240561 = 549163915026848401193023077053146353871994535742 ERROR: Exponentiation failure[/CODE]It even persists with a leading edge range, which uses a different kernel than what Fan Ming used in [URL="https://mersenneforum.org/showpost.php?p=535997&postcount=364"]post 364[/URL]: [CODE]dylan@dylan-G11CD:~/Desktop/mmff-0.28.1$ ./mmff.exe -v 3 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1060 6GB compute capability 6.1 maximum threads per block 1024 number of mutliprocessors 10 (unknown number of shader cores) clock rate 1708MHz got assignment: MM107, k range 10000000000000000 to 12000000000000000 (162-bit factors) Starting trial factoring of MM107 in k range: 10P to 12P (162-bit factors) k_min = 10000000000000000 k_max = 12000000000000000 Using GPU kernel "mfaktc_barrett172_M107gs" Verifying (2^(2^107)) % 3245185537408870535270390810652173364064364295271 = 249933689397060655837985681873552465902105993031524 ERROR: Exponentiation failure[/CODE] |
3 Attachment(s)
Attached is a CUDA 11.2 binary of mmff compiled on an Arch Linux system, using the cleaned up source posted by Fan Ming [url=https://mersenneforum.org/showpost.php?p=535756&postcount=360]here.[/url] These should work on any Linux system with the latest Nvidia drivers.
This one does seem to work with MM107 and MM127, using the examples posted by Fan Ming in posts 363 and 364 with the flag -v 3. MM127: [code]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled GPUSievePrimes 500000 GPUSieveSize 32M bits GPUSieveProcessSize 16K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 120s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) WARNING, no GPUProgressHeader specified in mmff.ini, using default GPUProgressHeader " class | raw cand. | time | ETA | raw rate | SievePrimes" WARNING, no GPUProgressFormat specified in mmff.ini, using default GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 11.20 CUDA runtime version 11.20 CUDA driver version 11.20 CUDA device info name GeForce GTX 1660 Ti compute capability 7.5 maximum threads per block 1024 number of mutliprocessors 24 (unknown number of shader cores) clock rate 1590MHz got assignment: MM127, k range 562949953421312 to 1125899906842623 (178-bit factors) Starting trial factoring of MM127 in k range: 562949953421312 to 1125899906842623 (178-bit factors) k_min = 562949953421312 k_max = 1125899906842623 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 191561944857917697129840166812120120096271125295021529 = 158757927754760480688654173499199469295287057656270356 Verifying (2^(2^127)) % 191614694258348779445950559282708892489982390750176689 = 33662559093375555778002927546058307399215184129861713 Verifying (2^(2^127)) % 191667446858012590842648103696906711574948849832403649 = 58322051460264670631592692291826830098619657940589851 Verifying (2^(2^127)) % 191720197063361195168223175182300315372061913389308569 = 161044083194471348086645110435896183576905312890737395 Verifying (2^(2^127)) % 191772947100494614230101526639209315731354679296043129 = 62574822488929725322867766087082605619226720248185889 Verifying (2^(2^127)) % 191825698154779667550033876773029890149243076788387249 = 178143699218529778276359107673564439346000111582072990 Verifying (2^(2^127)) % 191878449278237320417654597759685254765861316305100489 = 167970576150375862734607277121394055064519674969245756 Verifying (2^(2^127)) % 191931200154874561262841813657816481627920801325769369 = 68989798066545733249246754983396020869954214592518662 Verifying (2^(2^127)) % 191983951926039282622453643539197609014463925252484369 = 97958328059656999804568015951825574551513310180672117 Verifying (2^(2^127)) % 192036702536990857023110525934395209885947426134099129 = 57502989442863596951607305398169962920918382075438075 received signal "SIGINT" mmff will exit once the current class is finished. press ^C again to exit immediately mmff will exit NOW! [/code] MM107: [code]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled GPUSievePrimes 500000 GPUSieveSize 32M bits GPUSieveProcessSize 16K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 120s StopAfterFactor disabled PrintMode full V5UserID (none) ComputerID (none) WARNING, no GPUProgressHeader specified in mmff.ini, using default GPUProgressHeader " class | raw cand. | time | ETA | raw rate | SievePrimes" WARNING, no GPUProgressFormat specified in mmff.ini, using default GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 11.20 CUDA runtime version 11.20 CUDA driver version 11.20 CUDA device info name GeForce GTX 1660 Ti compute capability 7.5 maximum threads per block 1024 number of mutliprocessors 24 (unknown number of shader cores) clock rate 1590MHz WARNING: ignoring line 1 in "worktodo.txt"! Reason: doesn't begin with Factor= got assignment: MM107, k range 41400000000000 to 41500000000000 (154-bit factors) Starting trial factoring of MM107 in k range: 41400G to 41500G (154-bit factors) k_min = 41400000000000 k_max = 41500000000000 Using GPU kernel "mfaktc_barrett160_M107gs" Verifying (2^(2^107)) % 13435068670193779240929580104031093912799413681 = 11943755078920637255837466212346786801214623286 class | raw cand. | time | ETA | raw rate | SievePrimes 0/4620 | 21.66M | 0.031s | n.a. | 698.70M/s | 500277 Verifying (2^(2^107)) % 13435068674693228987403666670879552138089175391 = 10351997845221972775324276802600874943890505684 5/4620 | 21.66M | 0.031s | n.a. | 698.70M/s | 500277 ... [/code] I have also attached the full logs from both runs. Why the CUDA 10.1 made executables fail, I'm not sure. |
[QUOTE=Dylan14;574855]...got assignment: MM127, k range 562949953421312 to 1125899906842623 (178-bit factors)
Starting trial factoring of MM127 in k range: 562949953421312 to 1125899906842623 (178-bit factors) k_min = 562949953421312 k_max = 1125899906842623 Using GPU kernel "mfaktc_barrett[B]183[/B]_M127gs" ...[/code][/QUOTE]Cool. Did the posted Arch build include the expanded 2047M GpuSieveSize? Might want to aim higher, in your test run, for ranges of k and selection of kernel likely to be run in the future, since MM127 TF to 185 bits was completed months ago. [QUOTE=kriesel;556075][Fri Sep 04 18:24:46 2020] UID: kriesel/emu/gtx1650, no factor for MM127 in k range: 140000000000000000 to 144115188075855871 (185-bit factors) [mmff 0.28 mfaktc_barrett[B]185[/B]_M127gs] 145P ETA <7 days[/QUOTE] [QUOTE=kriesel;556721][Thu Sep 10 22:15:10 2020] UID: kriesel/emu/gtx1650, no factor for MM127 in k range: 144115188075855872 to 145000000000000000 (186-bit factors) [mmff 0.28 mfaktc_barrett[B]188[/B]_M127gs][/QUOTE] Info header was [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize [B]2047M bits[/B] GPUSieveProcessSize 16K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 300s StopAfterFactor disabled PrintMode full V5UserID kriesel ComputerID emu/gtx1650 TimeStampInResults yes CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1650 compute capability 7.5 maximum threads per block 1024 number of mu[B][COLOR=Red]tl[/COLOR][/B]iprocessors 14 (unknown number of shader cores) clock rate 1710MHz[/CODE]Edut: maibe ficks tha mi[COLOR=Red]ps[/COLOR]elling tu. |
[QUOTE=kriesel;574860]Cool. Did the posted Arch build include the expanded 2047M GpuSieveSize?[/QUOTE]
No, it is limited to 128M bits. I could easily fix that and put a updated build. |
[QUOTE=Dylan14;574863]No, it is limited to 128M bits. I could easily fix that and put a updated build.[/QUOTE]
Would you mind sharing the updated source code to the FermatSearch community (or at least to me?) :smile: I have my code happily running with Ubuntu and the 11.1 drivers, but no PrimeGaps speedup ... |
Please update [URL]http://www.doublemersennes.org/download.php[/URL] for the newer binaries posted recently in this thread.
If I read it correctly, this thread has CUDA10.1 and 11.2, while doublemersennes has only up to CUDA8 and no enlarged GPUSieveSize. |
[QUOTE=kriesel;578097]Please update [URL]http://www.doublemersennes.org/download.php[/URL] for the newer binaries posted recently in this thread.
If I read it correctly, this thread has CUDA10.1 and 11.2, while doublemersennes has only up to CUDA8 and no enlarged GPUSieveSize.[/QUOTE] I will. There are very few participants to this subproject, and no one complained (hard) until now... :smile: |
| All times are UTC. The time now is 00:40. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.