![]() |
hi,
many thanks to fan ming (and nomead) for the new win x64 cuda 10.1 executable of mmff v0.28 |
Maximum limits for mmff-0.28 for Fermat factoring. Tested on the Windows CUDA 10.1 version built by Fan Ming:
[url]https://www.mersenneforum.org/showpost.php?p=527991&postcount=39[/url] The ultimate limit is k < 2[SUP]64[/SUP] but for some exponents the limits is lower than that. [CODE] 28 <= n <= 223 n=28-119: k*2[SUP]n[/SUP]+1 < 2[SUP]n+64[/SUP] (92-183) k<2[SUP]64[/SUP] n=120-127: k*2[SUP]n[/SUP]+1 < 2[SUP]183[/SUP] k<2[SUP]63[/SUP] to k<2[SUP]56[/SUP] n=128-151: k*2[SUP]n[/SUP]+1 < 2[SUP]n+64[/SUP] (192-215) k<2[SUP]64[/SUP] n=152-159: k*2[SUP]n[/SUP]+1 < 2[SUP]215[/SUP] k<2[SUP]63[/SUP] to k<2[SUP]56[/SUP] n=160-183: k*2[SUP]n[/SUP]+1 < 2[SUP]n+64[/SUP] (224-247) k<2[SUP]64[/SUP] n=184-191: k*2[SUP]n[/SUP]+1 < 2[SUP]247[/SUP] k<2[SUP]63[/SUP] to k<2[SUP]56[/SUP] n=192-223: k*2[SUP]n[/SUP]+1 < 2[SUP]252[/SUP] k<2[SUP]60[/SUP] to k<2[SUP]29[/SUP] [/CODE] |
I found it curious that Andreas got errors when trying to verify the factors of F205 and F215, since Serge said he verified these factors when he released version 0.28 ([URL]https://www.mersenneforum.org/showpost.php?p=376423&postcount=317[/URL]). So I did some testing.
I confirmed that Andreas's FermatFactor=207,224,225 range dies with ERROR: Exponentiation failure. The next smaller bit range is not supported, while higher bit ranges run to completion. I then tried testing individual values of K in the 225-bit factor range, and found that 207,232905,232905 correctly finds the factor of F205. However about 30% of individual K values die with ERROR: Exponentiation failure. [CODE] // Trying bit ranges //FermatFactor=207,223,224 // WARNING: bit range isn't supported! //FermatFactor=207,224,225 // ERROR: Exponentiation failure: k range: 131072 to 262143 (225-bit factors) //FermatFactor=207,225,226 // Runs: k range: 262144 to 524287 (226-bit factors) //FermatFactor=207,226,227 // Runs: k range: 524288 to 1048575 (227-bit factors) //FermatFactor=207,227,228 // Runs: k range: 1048576 to 2097151 (228-bit factors) // Trying individual values of K in the 225 bit factor range //FermatFactor=207,232885,232885 // Runs //FermatFactor=207,232887,232887 // ERROR: Exponentiation failure //FermatFactor=207,232889,232889 // Runs //FermatFactor=207,232891,232891 // Runs //FermatFactor=207,232893,232893 // Runs //FermatFactor=207,232895,232895 // Runs //FermatFactor=207,232897,232897 // Runs //FermatFactor=207,232899,232899 // Runs //FermatFactor=207,232901,232901 // ERROR: Exponentiation failure //FermatFactor=207,232903,232903 // Runs //FermatFactor=207,232905,232905 // Runs, finds F205 factor //FermatFactor=207,232907,232907 // ERROR: Exponentiation failure //FermatFactor=207,232909,232909 // Runs //FermatFactor=207,232911,232911 // Runs //FermatFactor=207,232913,232913 // Runs //FermatFactor=207,232915,232915 // Runs //FermatFactor=207,232917,232917 // Runs //FermatFactor=207,232919,232919 // ERROR: Exponentiation failure //FermatFactor=207,232921,232921 // Runs //FermatFactor=207,232923,232923 // Runs //FermatFactor=207,232925,232925 // ERROR: Exponentiation failure [/CODE]For the 226-bit factor range, while the full 207,262144,524287 range runs without error, about 30% of individual K values continue to die with ERROR: Exponentiation failure. I also found that any range of K that contains a failing K also fails, up to that point that the range contains more than about 160000 K, at which point mmff runs to completion without error. [CODE] // Trying individual values of K in the 226 bit factor range //FermatFactor=207,419987,419987 // ERROR: Exponentiation failure //FermatFactor=207,419989,419989 // Runs //FermatFactor=207,419991,419991 // ERROR: Exponentiation failure //FermatFactor=207,419993,419993 // Runs //FermatFactor=207,419995,419995 // Runs //FermatFactor=207,419997,419997 // ERROR: Exponentiation failure //FermatFactor=207,419999,419999 // Runs //FermatFactor=207,420001,420001 // Runs //FermatFactor=207,420003,420003 // Runs //FermatFactor=207,420005,420005 // Runs //FermatFactor=207,420007,420007 // Runs //FermatFactor=207,420009,420009 // ERROR: Exponentiation failure // Trying ranges of K in the 226 bit factor range //FermatFactor=207,419999,420007 // Runs //FermatFactor=207,419997,420007 // ERROR: Exponentiation failure //FermatFactor=207,410000,420000 // ERROR: Exponentiation failure //FermatFactor=207,300000,420000 // ERROR: Exponentiation failure //FermatFactor=207,300000,450000 // ERROR: Exponentiation failure //FermatFactor=207,300000,460000 // Runs //FermatFactor=207,262144,524287 // Runs [/CODE]I see the same thing happening in recent "production" search ranges. In Andreas's recent range, individual K or small K ranges die with either ERROR: Exponentiation failure or ERROR: Class problems Factor divisible by ..., and the error will vary randomly on repeating the same test multiple times. [CODE] // Trying Andreas's full range ***** //FermatFactor=205,130000000000000,140737488355327 // Runs // Trying individual values of K in the 252 bit factor range //FermatFactor=205,130000000000001,130000000000001 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000003,130000000000003 // Runs //FermatFactor=205,130000000000005,130000000000005 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000007,130000000000007 // Runs //FermatFactor=205,130000000000009,130000000000009 // Runs //FermatFactor=205,130000000000011,130000000000011 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000013,130000000000013 // Runs //FermatFactor=205,130000000000015,130000000000015 // Runs //FermatFactor=205,130000000000017,130000000000017 // Runs //FermatFactor=205,130000000000019,130000000000019 // Exp failure OR Factor divisible (random) // Trying ranges of K in the 252 bit factor range //FermatFactor=205,130000000000000,130000000000100 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000001000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000010000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000180000 // Exp failure OR Factor divisible (random) //FermatFactor=205,130000000000000,130000000190000 // Runs [/CODE]Also in Peter's recent range with 171 bit factors. [CODE] // Trying ranges of K in the 171 bit factor range *** //FermatFactor=120,1527888802614000,1527888802615000 // ERROR: Exponentiation failure //FermatFactor=120,1527888802600000,1527888802700000 // ERROR: Exponentiation failure //FermatFactor=120,1527888802500000,1527888802700000 // Runs, finds F118 factor [/CODE]Of course this might a problem with my system. I am running Ubuntu 18.04 LTS with Cuda 10.1 on an RTX 2080. Could someone else verify some of the results above (just comment out individual lines). If it persists, hopefully this is an mmff problem that only affects small ranges of K. But looking at the source, it appears that only a tiny fraction of K values are checked for accuracy by calling validate_exponentiation(), for obvious performance reasons. So is it possible, if highly unlikely, that undetected errors are occurring for larger ranges of K? George or Serge, would one of you have time to investigate this? For hardware validation, by using single K values and adding some recent factors, here is an expanded version of Andreas's worktodo file that should verify 41 known Fermat factors. [CODE] // Check the known Fermat factors within the ranges of mmff // Ranges supported: 28 <= exp <= 223; 64 bit <= factor size <= 252 bit; K min/max vary with exp // K min/max < 1000 are interpreted as factor bit size min/max, >= 1000 as K min/max FermatFactor=36,2e10,3e10 // F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 // F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 // F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 // F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 // F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 // F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 // F48: 2139543641769 * 2^50 + 1 FermatFactor=54,4119,4119 // F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 // F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 // F52: 81909357657279 * 2^54 + 1 //FermatFactor=61,67,68 // F58: 95 * 2^61 + 1 ***No way to specify FermatFactor=68,121089e10,121090e10 // F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 // F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 // F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 // F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 // F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 // F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 // F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 // F90: 198922467387 * 2^92 + 1 FermatFactor=93,1421,1421 // F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 // F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 // F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 // F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 // F116: 3433149787 * 2^120 + 1 FermatFactor=120,1527888802500000,1527888802700000 // F118: 1527888802614951 * 2^120 + 1 FermatFactor=124,146,147 // F122: 5234775 * 2^124 + 1 //FermatFactor=127,129,130 // F125: 5 * 2^127 + 1 ***No way to specify FermatFactor=135,1075441212600000,1075441212800000 // F132: 1075441212722595 * 2^135 + 1 FermatFactor=135,88e9,89e9 // F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 // F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 // F146: 37092477 * 2^148 + 1 FermatFactor=149,3125,3125 // F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 // F147: 124567335 * 2^149 + 1 FermatFactor=157,1575,1575 // F150: 1575 * 2^157 + 1 FermatFactor=154,5439,5439 // F150: 5439 * 2^154 + 1 FermatFactor=167,197,198 // F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 // F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 // F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 // F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 // F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 // F195: 48595346636925 * 2^197 + 1 FermatFactor=207,232905,232905 // F205: 232905 * 2^207 + 1 FermatFactor=217,32111,32111 // F215: 32111 * 2^217 + 1 [/CODE] |
My guess is that when the k-range is very small then sieving might remove all candidates and there is no candidate left to do the exponentiation.
In your single k tests the ones that work are probably the ones without any small factors. I might check later but I do not have time right now. |
1 Attachment(s)
[QUOTE=ATH;531841]My guess is that when the k-range is very small then sieving might remove all candidates and there is no candidate left to do the exponentiation.
In your single k tests the ones that work are probably the ones without any small factors. I might check later but I do not have time right now.[/QUOTE] Looks like you are right! When running small ranges, each time before an error occurs the number of factors surviving the sieve is zero (total_bit_count = 0 in the tf_*.h kernel). This causes the kernel to skip the calculations entirely, but it still copies the factor and final remainder for one value of K to the results array (RES) for validation. Since the factor and final remainder are function local variables that are never written, they contain garbage values. This explains why running the same test repeatedly produces various Factor divisible and Exponentiation failure errors. So the mystery is solved, and none of this raises any doubts about mmff correctness for large ranges of K (which I hoped and expected all along). I modified the kernels to set a flag in the results validation array (datalen = 0) when zero factors survive the sieve. Then in tf_validate.h the validation checks are skipped if datalen is zero. Hopefully this will eliminate the following errors for correctly working hardware: ERROR: Class problems. Factor divisible by 2, 3, 5, 7, or 11 ERROR: GPU sieve problems. Factor divisible by <int> ERROR: Exponentiation failure With these changes, all 43 known factors within the range of mmff can be verified using the following worktodo.txt file: [CODE] // Check the known Fermat factors within the ranges of mmff // Ranges supported: 28 <= exp <= 223; 64 bit <= factor size <= 252 bit; K min/max vary with exp // K min/max < 1000 are interpreted as factor bit size min/max, >= 1000 as K min/max FermatFactor=36,2e10,3e10 // F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 // F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 // F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 // F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 // F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 // F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 // F48: 2139543641769 * 2^50 + 1 FermatFactor=54,66,67 // F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 // F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 // F52: 81909357657279 * 2^54 + 1 FermatFactor=61,67,68 // F58: 95 * 2^61 + 1 FermatFactor=68,121089e10,121090e10 // F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 // F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 // F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 // F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 // F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 // F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 // F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 // F90: 198922467387 * 2^92 + 1 FermatFactor=93,103,104 // F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 // F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 // F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 // F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 // F116: 3433149787 * 2^120 + 1 FermatFactor=120,1527888e9,1527889e9 // F118: 1527888802614951 * 2^120 + 1 FermatFactor=124,146,147 // F122: 5234775 * 2^124 + 1 FermatFactor=127,129,130 // F125: 5 * 2^127 + 1 FermatFactor=135,1075441e9,1075442e9 // F132: 1075441212722595 * 2^135 + 1 FermatFactor=135,88e9,89e9 // F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 // F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 // F146: 37092477 * 2^148 + 1 FermatFactor=149,160,161 // F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 // F147: 124567335 * 2^149 + 1 FermatFactor=157,167,168 // F150: 1575 * 2^157 + 1 FermatFactor=154,166,167 // F150: 5439 * 2^154 + 1 FermatFactor=167,197,198 // F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 // F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 // F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 // F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 // F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 // F195: 48595346636925 * 2^197 + 1 FermatFactor=207,224,225 // F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 // F215: 32111 * 2^217 + 1 [/CODE]Here is source with these changes and a CUDA 10.1 Linux binary that will hopefully run on Kepler or later (--gpu-architecture=compute_30). I included Serge's patch to print factors found in K*2^N+1 form. If you want factors in the old format, use output.c from the 0.28 release. I also fixed a few other misc things, and changed the version to 0.28.1 to identify this binary. I am not sure who the current owner of mmff is, but if I changed anything in a "bad" way please feel free to fix it and re-post. |
@Gary: The original v0.28 version was posted by Serge ([URL]https://mersenneforum.org/showpost.php?p=376423&postcount=317[/URL]), so I would presume he is the current maintainer.
Do note, it has been 5 years since that has been posted. |
@Gary: I think it is a case of "you touch it, you own it". Congratulations.
|
1 Attachment(s)
Thanks for clues provided by Andreas!
The class problems and exp failure problems are indeed solved for mmff now, I post the source code here because I also did some other minor changes and still some problems with Windows binary. Attached file contains CUDA 10.1 binary compiled for linux-64bit and the source code. The code is based on 0.28 version, and the compiled binary can be used on Google colab. Note that changes for tf to fix the class problems in source codes are made before I saw the source files posted by Gary (I haven't check now), so notice me If I did some flaky/bad changes. Minor changes: (1) Fixed the class problems & exp failure caused by tf validate by set RES[RESULTS_ARRAY_VALIDATION_OFFSET] = 0 and do not copy other values if no candidate survives. If RES[RESULTS_ARRAY_VALIDATION_OFFSET] == 0 then just do not call validate function. Note I think that the "ERROR: Exponentiation failure" error message is somewhat unclear, so I changed it to : "ERROR: Verifying on CPU failed. Remainder didn\'t match. Possible problems exist." Please notice me if my understanding is incorrect. (2) Replaced all deprecated cudaThreadSynchronize() functions with cudaDeviceSynchronize() funtions in case they are not supported in the future. (3) In gpusieve.cu, the launch bounds for many functions are: [CODE]__global__ static void __launch_bounds__(256,6) blablabla....[/CODE] However, the maximum number of threads per stream multiprocessor for [B]Turing cards (CC 7.5)[/B] are [B]1024[/B] instead of [B]2048[/B] of all previous cards. Since it's [B]lower bound[/B] setting, this will cause overflow for Turing cards so the second parameter setting is ignored when compiling for Turing CC7.5 architecture using NVCC. I don't know if this lower bound setting is necessary, but I still changed all these launch bounds settings to: [CODE]#if __CUDA_ARCH__ < 750 __global__ static void __launch_bounds__(256,6) blablabla... #else __global__ static void __launch_bounds__(256,3) blablabla...[/CODE] Notice me if this change is incorrect. (4) Minor format reading problems fixes. The compiled binary for linux passed all 41 test cases provided by ATH: [CODE]FermatFactor=36,2e10,3e10 # F28: 25709319373 * 2^36 + 1 FermatFactor=33,546e10,547e10 # F31: 5463561471303 * 2^33 + 1 FermatFactor=39,69,70 # F37: 1275438465 * 2^39 + 1 FermatFactor=41,286492e10,286493e10 # F39: 2864929972774011 * 2^41 + 1 FermatFactor=45,11131e10,11132e10 # F42: 111318179143061 * 2^45 + 1 FermatFactor=45,21e10,22e10 # F43: 212675402445 * 2^45 + 1 FermatFactor=50,213e10,214e10 # F48: 2139543641769 * 2^50 + 1 FermatFactor=54,66,67 # F52: 4119 * 2^54 + 1 FermatFactor=54,78,79 # F52: 21626655 * 2^54 + 1 FermatFactor=54,8190e10,8191e10 # F52: 81909357657279 * 2^54 + 1 FermatFactor=61,67,68 # F58: 95 * 2^61 + 1 FermatFactor=68,121089e10,121090e10 # F65: 1210895760431083 * 2^68 + 1 FermatFactor=74,100,101 # F72: 76432329 * 2^74 + 1 FermatFactor=77,98,99 # F75: 3447431 * 2^77 + 1 FermatFactor=79,5e9,6e9 # F77: 5940341195 * 2^79 + 1 FermatFactor=87,1595e9,1596e9 # F83: 1595863660157 * 2^87 + 1 FermatFactor=88,20018e9,20019e9 # F86: 20018578522347 * 2^88 + 1 FermatFactor=90,119e9,120e9 # F88: 119942751127 * 2^90 + 1 FermatFactor=92,198e9,199e9 # F90: 198922467387 * 2^92 + 1 FermatFactor=93,103,104 # F91: 1421 * 2^93 + 1 FermatFactor=97,482e9,483e9 # F94: 482524552001 * 2^97 + 1 FermatFactor=101,3334e9,3335e9 # F96: 3334131633063 * 2^101 + 1 FermatFactor=111,141,142 # F107: 1289179925 * 2^111 + 1 FermatFactor=120,3e9,4e9 # F116: 3433149787 * 2^120 + 1 FermatFactor=124,146,147 # F122: 5234775 * 2^124 + 1 FermatFactor=127,129,130 # F125: 5 * 2^127 + 1 FermatFactor=135,88e9,89e9 # F133: 88075576149 * 2^135 + 1 FermatFactor=145,167,168 # F142: 8152599 * 2^145 + 1 FermatFactor=148,173,174 # F146: 37092477 * 2^148 + 1 FermatFactor=149,160,161 # F147: 3125 * 2^149 + 1 FermatFactor=149,175,176 # F147: 124567335 * 2^149 + 1 FermatFactor=154,166,167 # F150: 5439 * 2^154 + 1 FermatFactor=157,167,168 # F150: 1575 * 2^157 + 1 FermatFactor=167,197,198 # F164: 1835601567 * 2^167 + 1 FermatFactor=171,2674e9,2675e9 # F166: 2674670937447 * 2^171 + 1 FermatFactor=174,20e9,21e9 # F172: 20569603303 * 2^174 + 1 FermatFactor=180,3e8,4e8 # F178: 313047661 * 2^180 + 1 FermatFactor=187,213,214 # F184: 117012935 * 2^187 + 1 FermatFactor=197,48594e9,48596e9 # F195: 48595346636925 * 2^197 + 1 FermatFactor=207,224,227 # F205: 232905 * 2^207 + 1 FermatFactor=217,231,232 # F215: 32111 * 2^217 + 1[/CODE] Result: [CODE]F28 has a factor: 1766730974551267606529 [TF:70:71*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^36+1 in k range: 20G to 30G (71-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F31 has a factor: 46931635677864055013377 [TF:75:76*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^33+1 in k range: 5460G to 5470G (76-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F37 has a factor: 701179711390136401921 [TF:69:70*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^39+1 in k range: 1073741824 to 2147483647 (70-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F39 has a factor: 6300047635658008393597059073 [TF:92:93*:mmff 0.28 mfaktc_barrett96_F32_63gs] found 1 factor for k*2^41+1 in k range: 2864920G to 2864930G (93-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs] F42 has a factor: 3916660235220715932328394753 [TF:91:92*:mmff 0.28 mfaktc_barrett96_F32_63gs] found 1 factor for k*2^45+1 in k range: 111310G to 111320G (92-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs] F43 has a factor: 7482850493766970889994241 [TF:82:83*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^45+1 in k range: 210G to 220G (83-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F48 has a factor: 2408911986953445595315961857 [TF:90:91*:mmff 0.28 mfaktc_barrett96_F32_63gs] found 1 factor for k*2^50+1 in k range: 2130G to 2140G (91-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett96_F32_63gs] F52 has a factor: 74201307460556292097 [TF:66:67*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^54+1 in k range: 4096 to 8191 (67-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F52 has a factor: 389591181597081096683521 [TF:78:79*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^54+1 in k range: 16777216 to 33554431 (79-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F52 has a factor: 1475547810493913550438096961537 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F32_63gs] found 1 factor for k*2^54+1 in k range: 81900G to 81910G (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F32_63gs] F58 has a factor: 219055085875300925441 [TF:67:68*:mmff 0.28 mfaktc_barrett89_F32_63gs] found 1 factor for k*2^61+1 in k range: 64 to 127 (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_F32_63gs] F65 has a factor: 357393347081793620781479724788482049 [TF:118:119*:mmff 0.28 mfaktc_barrett120_F64_95gs] found 1 factor for k*2^68+1 in k range: 1210890G to 1210900G (119-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs] F72 has a factor: 1443765874709062348345951911937 [TF:100:101*:mmff 0.28 mfaktc_barrett108_F64_95gs] found 1 factor for k*2^74+1 in k range: 67108864 to 134217727 (101-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs] F75 has a factor: 520961043404985083798310879233 [TF:98:99*:mmff 0.28 mfaktc_barrett108_F64_95gs] found 1 factor for k*2^77+1 in k range: 2097152 to 4194303 (99-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs] F77 has a factor: 3590715923977960355577974656860161 [TF:111:112*:mmff 0.28 mfaktc_barrett120_F64_95gs] found 1 factor for k*2^79+1 in k range: 5G to 6G (112-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett120_F64_95gs] F83 has a factor: 246947940268608417020015902258307792897 [TF:127:128*:mmff 0.28 mfaktc_barrett128_F64_95gs] found 1 factor for k*2^87+1 in k range: 1595G to 1596G (128-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs] F86 has a factor: 6195449970597928748332522715641578258433 [TF:132:133*:mmff 0.28 mfaktc_barrett140_F64_95gs] found 1 factor for k*2^88+1 in k range: 20018G to 20019G (133-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs] F88 has a factor: 148481934042154969241780501829489000449 [TF:126:127*:mmff 0.28 mfaktc_barrett128_F64_95gs] found 1 factor for k*2^90+1 in k range: 119G to 120G (127-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett128_F64_95gs] F90 has a factor: 985016348367230226078056532654006730753 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F64_95gs] found 1 factor for k*2^92+1 in k range: 198G to 199G (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F64_95gs] F91 has a factor: 14072902366596202965053244178433 [TF:103:104*:mmff 0.28 mfaktc_barrett108_F64_95gs] found 1 factor for k*2^93+1 in k range: 1024 to 2047 (104-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett108_F64_95gs] F94 has a factor: 76459067246115642538831634131564386844673 [TF:135:136*:mmff 0.28 mfaktc_barrett140_F96_127gs] found 1 factor for k*2^97+1 in k range: 482G to 483G (136-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs] F96 has a factor: 8453027931784477309850388309101819121893377 [TF:142:143*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^101+1 in k range: 3334G to 3335G (143-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F107 has a factor: 3346902437331832346018436558958369334886401 [TF:141:142*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^111+1 in k range: 1073741824 to 2147483647 (142-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F116 has a factor: 4563438810603420826872624280490561141381005313 [TF:151:152*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^120+1 in k range: 3G to 4G (152-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F122 has a factor: 111331351706159727817280425663664652445286401 [TF:146:147*:mmff 0.28 mfaktc_barrett152_F96_127gs] found 1 factor for k*2^124+1 in k range: 4194304 to 8388607 (147-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett152_F96_127gs] F125 has a factor: 850705917302346158658436518579420528641 [TF:129:130*:mmff 0.28 mfaktc_barrett140_F96_127gs] found 1 factor for k*2^127+1 in k range: 4 to 7 (130-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett140_F96_127gs] F133 has a factor: 3836232386548105510567872577199319351015739156856833 [TF:171:172*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^135+1 in k range: 88G to 89G (172-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F142 has a factor: 363618066009591119386121910507749518730588867002369 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^145+1 in k range: 4194304 to 8388607 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F146 has a factor: 13235038053749721162769301995307025251972223086886913 [TF:173:174*:mmff 0.28 mfaktc_barrett183_F128_159gs] found 1 factor for k*2^148+1 in k range: 33554432 to 67108863 (174-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs] F147 has a factor: 2230074519853062314153571827264836150598041600001 [TF:160:161*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^149+1 in k range: 2048 to 4095 (161-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F147 has a factor: 88894220732640180500173831441107513117330143465963521 [TF:175:176*:mmff 0.28 mfaktc_barrett183_F128_159gs] found 1 factor for k*2^149+1 in k range: 67108864 to 134217727 (176-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett183_F128_159gs] F150 has a factor: 124204803210043452689216278205372864748572142206977 [TF:166:167*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^154+1 in k range: 4096 to 8191 (167-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F150 has a factor: 287733134849521512021350451441018219494761719398401 [TF:167:168*:mmff 0.28 mfaktc_barrett172_F128_159gs] found 1 factor for k*2^157+1 in k range: 1024 to 2047 (168-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett172_F128_159gs] F164 has a factor: 343390041044181900054983258125842173093877961821829176754177 [TF:197:198*:mmff 0.28 mfaktc_barrett204_F160_191gs] found 1 factor for k*2^167+1 in k range: 1073741824 to 2147483647 (198-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett204_F160_191gs] F166 has a factor: 8005705634611551271269985633916919970948098093294822472135213057 [TF:212:213*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^171+1 in k range: 2674G to 2675G (213-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F172 has a factor: 492544145925433733451855533863925475950550777193174123310743553 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^174+1 in k range: 20G to 21G (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F178 has a factor: 479744144560996421795040836675707785358665797968769873751310337 [TF:208:209*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^180+1 in k range: 300M to 400M (209-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F184 has a factor: 22953190542224652377639611826608942557783370967811443134226759681 [TF:213:214*:mmff 0.28 mfaktc_barrett215_F160_191gs] found 1 factor for k*2^187+1 in k range: 67108864 to 134217727 (214-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett215_F160_191gs] F195 has a factor: 9761213910603494986281795830720869047027739722070601061612088452553113601 [TF:242:243*:mmff 0.28 mfaktc_barrett247_F192_223gs] found 1 factor for k*2^197+1 in k range: 48594G to 48596G (243-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett247_F192_223gs] F205 has a factor: 47905779865361936656012887182939964920375512098173614759150973091841 [TF:224:225*:mmff 0.28 mfaktc_barrett236_F192_223gs] found 1 factor for k*2^207+1 in k range: 131072 to 262143 (225-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs] F215 has a factor: 6763365995538079644113691573900682504384080816814065022974359599316993 [TF:231:232*:mmff 0.28 mfaktc_barrett236_F192_223gs] found 1 factor for k*2^217+1 in k range: 16384 to 32767 (232-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett236_F192_223gs][/CODE] Some double mersennes test cases: [CODE]MMFactor=31,64,65 MMFactor=61,549e9,550e9 MMFactor=31,56e9,57e9 MMFactor=31,54e9,55e9 MMFactor=31,414.5e11,415e11 MMFactor=31,414e11,415e11 MMFactor=31,416e11,417e11[/CODE] The results are as expected without problems: [CODE]no factor for MM31 in k range: 4294967298 to 8589934595 (65-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM61 in k range: 549000000000 to 549755813887 (101-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs] no factor for MM61 in k range: 549755813888 to 550000000000 (102-bit factors) [mmff 0.28 mfaktc_barrett108_M61gs] MM31 has a factor: 242557615644693265201 [TF:67:68*:mmff 0.28 mfaktc_barrett89_M31gs] found 1 factor for MM31 in k range: 56G to 57G (68-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM31 in k range: 54G to 55G (68-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM31 in k range: 41450G to 41500G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs] MM31 has a factor: 178021379228511215367151 [TF:77:78*:mmff 0.28 mfaktc_barrett89_M31gs] found 1 factor for MM31 in k range: 41400G to 41500G (78-bit factors) (partially tested) [mmff 0.28 mfaktc_barrett89_M31gs] no factor for MM31 in k range: 41600G to 41700G (78-bit factors) [mmff 0.28 mfaktc_barrett89_M31gs][/CODE] However, when it was compiled for Windows using Visual Studio 2019 it still failed to run (but was not class problems, etc.): [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 16M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) WARNING, no ProgressFormat specified in mmff.ini, using default TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.20 CUDA device info name GeForce GTX 1660 compute capability 7.5 maximum threads per block 1024 number of mutliprocessors 22 (unknown number of shader cores) clock rate 1800MHz got assignment: MM127, k range 116500000000000000 to 117000000000000000 (185-bit factors) Starting trial factoring of MM127 in k range: 116500T to 117000T (185-bit factor s) k_min = 116500000000000000 k_max = 117000000000000000 Using GPU kernel "mfaktc_barrett185_M127gs" class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 5/4620 | 108.23G | 11.263s | 3h00m | 9608.91M/s | 810549 [B][COLOR="Red"]ERROR: cudaGetLastError() returned 98: invalid device function[/COLOR][/B][/CODE] The [COLOR="Red"][B]invalid device function[/B][/COLOR] error is usually problems when a kernel was not compiled for correct CC architecture or didn't exist. I tried to get the attribute for target kernel but it also returns error. So that's [B]not[/B] because kernels are not compiled with correct CC architecture, but [B]didn't exist[/B]. I wrote a test kernel and it also raised the same problem. Yes, the program failed to recognize it, simply thought it didn't exist (and will not be executed). I don't know what went wrong for MSVC 2019 compiler to cause the programs can't recongnize the existance of any kernels, since the file size are normal. The older [B]Visual Studio 2012[/B] version [B]should work[/B], but I haven't use it now since I already uninstalled it. However, some problems must existed since all newer versions of MSVC compiler (2017 or later, I don't know about 2013 or 2015) can cause the problem. I really have no idea about that... The compiling process for Windows binary using Visual Studio follows the post by TheJudger somewhere in the forum. I haven't test the normal CUDA compiling process using Visual Studio, since it needs to adjust some including relations of header files in many source files of mmff, which is a little bit unconvenient. |
[QUOTE=Fan Ming;535756]
The [COLOR="Red"][B]invalid device function[/B][/COLOR] error is usually problems...[/QUOTE] It seems this problem can occur at linux too. Tesla P100 instances on Google colab. However, I'm not sure about this, and it's much harder to have P100 assigned on Google colab now. Can anyone confirm this? |
[QUOTE=Fan Ming;535989]It seems this problem can occur at linux too.
Tesla P100 instances on Google colab. However, I'm not sure about this, and it's much harder to have P100 assigned on Google colab now. Can anyone confirm this?[/QUOTE] Got a P100 instance successfully. It's [B]not[/B] this "invalid device funtion" error (which can cause Exp failure if failed to execute kernel and the garbage value in memory satisfies some conditions), but the [B][COLOR="Red"]real[/COLOR] Exponentiation failure[/B] error. For unknown reason the "-v 3" option couldn't work (mmff raised ERROR: can't parse -v option) on colab, so I changed the default verbosity level to 3. I ran sometimes and here is the error information: [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244016114007668591781862075984047752025015141633 = 4926629721325240139649429581548920523512559095913937 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243918643526487758168387626961494996338526257873 = 11812001209279499151039916953333557370062855661257534 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]mmff v0.28 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" GPUProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s | %W%%" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244016114007668591781862075984047752025015141633 = 4926629721325240139649429581548920523512559095913937 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244006681380457543367654871239929743410193636753 = 20826885465921148439067402367610686467153380117365399 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243923359840093282375491229333554000645937010313 = 18376582414064778318809558114847430298939300967906033 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945244006681380457543367654871239929743410193636753 = 20826885465921148439067402367610686467153380117365399 ERROR: Verifying on CPU failed. Remainder didn't match. Possible problems exist.[/CODE] Note the "ERROR: Verifying on CPU failed. Remainder didn\'t match. Possible problems exist." information is actually "[B]ERROR: Exponentiation failure[/B]". I changed the description of this error, see post #360 I posted several days ago. It seems the factor values were all legal values, for example, 23945244016114007668591781862075984047752025015141633, 23945243918643526487758168387626961494996338526257873, 23945243923359840093282375491229333554000645937010313, 23945244006681380457543367654871239929743410193636753 They are all legal 2kp+1 values. However, all [B]remainder values[/B] are [COLOR="Red"][B]indeed wrong[/B][/COLOR]. And for same factor value, the wrong remainder value is same. This problem [B]also exists[/B] in previous mmff 0.28 version(Before the solution of class problems, not because my changes. I haven't check previous versions now), so some problems must exists. Don't know why... |
1 Attachment(s)
Use Gary's source, and still errors occured(ran several times):
[CODE]mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243937508780909854996802036449731013568169267633 = 7606706320838621808794870660151320699229326362771323 ERROR: Exponentiation failure[/CODE] [CODE]mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243937508780909854996802036449731013568169267633 = 7606706320838621808794870660151320699229326362771323 ERROR: Exponentiation failure[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180 ERROR: Exponentiation failure[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180 ERROR: Exponentiation failure[/CODE] [CODE]got assignment: MM127, k range 70368744177664 to 500000000000000 (175 to 177 bit factors) Starting trial factoring of MM127 in k range: 70368744177664 to 140737488355327 (175-bit factors) k_min = 70368744177664 k_max = 140737488355327 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 23945243956374035331951825216445937967030797812277393 = 21049357416014847908393584649762608127534186076535180 ERROR: Exponentiation failure[/CODE] However, other numbers seems work correctly(too large, see attached logs.txt, part 1): [too large, see attached logs.txt] Some other test cases(too large, only sample here, see attached logs.txt, part 2): [CODE]/content/drive/My Drive/mmff-0.28.1 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM31, k range 4294967298 to 8589934595 (65-bit factors) Starting trial factoring of MM31 in k range: 4294967298 to 8589934595 (65-bit factors) k_min = 4294967298 k_max = 8589934595 Using GPU kernel "mfaktc_barrett89_M31gs" Verifying (2^(2^31)) % 18455732847550407041 = 18041335883521486051 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 2/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455474908994598577 = 13210018195264925476 6/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450117401151801329 = 7139557165896038944 14/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450375361182446263 = 9057953753314217069 15/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455732950629622097 = 988841009176436615 26/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18449601545515020871 = 15586616874725587374 27/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454185233395425433 = 11040915153769198707 30/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451883495998061423 = 14953888264990787734 35/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455236950626641801 = 6124863183292633277 42/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453351910956141671 = 15189020512858988414 47/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450018342026132513 = 1741686722528884267 50/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451208911254996607 = 1117800095954824569 51/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18457876109244557039 = 7096891942730331470 59/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455594206006156721 = 4096068541967968090 62/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454899726974586097 = 9373744610566902525 66/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453887768255610287 = 83107776338795315 71/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453034547232853423 = 7489924297197587194 75/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453490977702154097 = 1069386786885538204 86/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454879987304902873 = 13223985354672910196 90/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18448252547827582999 = 16324629731528481613 99/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453292640407484471 = 16327008864618939019 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 107/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453868093010436473 = 14672522872021511977 110/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456844509640145767 = 158751210797949868 WARNING: Factor divisible by 293. Only occasionally should GPU sieve let small factors slip through 111/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456030969820218169 = 13243929036377631083 114/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18459761428087931279 = 928694628223646081 119/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451864026911317721 = 12732358327742052958 122/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456308819844401617 = 414574268522609115 126/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455495301499310489 = 9211324510596461997 134/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18457836750164274823 = 16899488420486930904 135/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451705387999346537 = 17081498552114410952 146/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455812841316257791 = 11683251650899530624 147/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451427606694639793 = 13103389033054754948 150/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453590487799388783 = 11499864167607720722 155/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18450733149137905639 = 12859935701888440148 159/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454027058339922001 = 11659533504391022171 162/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18452677772889675431 = 10744843596348203562 167/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454900173651184673 = 1859362368207149326 170/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453610399267763767 = 8701203538784565069 171/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18454324751113003729 = 6213104066269045180 174/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18451685699869270841 = 6495362410738542899 182/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18464305726823282567 = 5979809582257939535 class | candidates | time | ETA | raw rate | SievePrimes | CPU wait 191/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456805180624634609 = 12467334056667722874 194/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453848615333758183 = 14209312759826643484 195/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453451807600432817 = 10388588196196914677 206/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18452122360604117233 = 10349137720694434220 210/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18455614705885050983 = 167229099570259508 215/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453471719068807801 = 14507868372907181248 222/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18453610639785932231 = 17813826982514021073 227/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18456011629582493287 = 1627973603262381094 231/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677 Verifying (2^(2^31)) % 18449781019313335249 = 12125103717347465058 234/4620 | 0.93M | 0.001s | n.a. | 933.89M/s | 90677[/CODE] All seems [B]work properly[/B]. However, once I changed to [B]MM127[/B], errors [B]occured again[/B]: [CODE]/content/drive/My Drive/mmff-0.28.1 mmff v0.28.1 (64bit built) Compiletime options THREADS_PER_BLOCK 256 MORE_CLASSES enabled Runtime options GPU Sieving enabled WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486) GPUSievePrimes depends on worktodo entry GPUSieveSize 128M bits WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8) GPUSieveProcessSize 8K bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s StopAfterFactor class PrintMode full V5UserID (none) ComputerID (none) GPUProgressHeader " class | candidates | time | ETA | raw rate | SievePrimes | CPU wait" WARNING, no ProgressFormat specified in mmff.ini, using default ProgressFormat "%C/4620 | %n | %ts | %e | %rM/s | %s" TimeStampInResults no CUDA version info binary compiled for CUDA 10.10 CUDA runtime version 10.10 CUDA driver version 10.10 CUDA device info name Tesla P100-PCIE-16GB compute capability 6.0 maximum threads per block 1024 number of mutliprocessors 56 (unknown number of shader cores) clock rate 1328MHz got assignment: MM127, k range 562949953421312 to 1125899906842623 (178-bit factors) Starting trial factoring of MM127 in k range: 562949953421312 to 1125899906842623 (178-bit factors) k_min = 562949953421312 k_max = 1125899906842623 Using GPU kernel "mfaktc_barrett183_M127gs" Verifying (2^(2^127)) % 191561943147467962859727723905659853364042304328803289 = 25168583490388808698318691898045119457541087143113062 ERROR: Exponentiation failure[/CODE] Other mmff 0.28 version are the same (including the original version with some class problems unsolved and the version I posted). Possible bugs exist. |
| All times are UTC. The time now is 00:40. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.