![]() |
![]() |
#12 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·29·127 Posts |
![]()
(so no basis for the following claim)
Quote:
For comparison, mfaktc on an RTX 2080 GPU running at reduced power, and more_classes (960 of 4620 classes) logging default console format to a rotating disk, does all TF candidates in 1-67 bits for exponent M86000063 in under a minute. That's a space of ~2^67/2/86000063/2 ~ 428,993,733,963 k values 1 or 7 mod 8. (Estimated throughput corresponds to ~0.14 nanosecond per k value, pessimistically assuming 60 seconds total duration. That's pessimistic by a factor of something less than 3, possibly as low as 6/5. Times per class indicate run time ~23 seconds.) Code:
Starting trial factoring M86000063 from 2^1 to 2^67 (0.35 GHz-days) k_min = 0 k_max = 857987467925 Using GPU kernel "75bit_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jul 17 07:03 | 0 0.1% | 0.024 n.a. | 1295.66 106037 n.a.% Jul 17 07:03 | 1 0.2% | 0.024 n.a. | 1295.66 106037 n.a.% Jul 17 07:03 | 13 0.3% | 0.024 n.a. | 1295.66 106037 n.a.% ... Jul 17 07:03 | 4605 99.8% | 0.023 n.a. | 1352.00 106037 n.a.% Jul 17 07:03 | 4612 99.9% | 0.023 n.a. | 1352.00 106037 n.a.% Jul 17 07:03 | 4617 100.0% | 0.023 n.a. | 1352.00 106037 n.a.% results.txt contents: [Sun Jul 17 07:03:29 2022] UID: Kriesel/asr3-rtx2080, M86000063 has a factor: 61920045361 [TF:1:67:mfaktc 0.21 75bit_mul32_gs] [Sun Jul 17 07:03:50 2022] UID: Kriesel/asr3-rtx2080, found 1 factor for M86000063 from 2^ 1 to 2^67 [mfaktc 0.21 75bit_mul32_gs] [Sun Jul 17 08:04:06 2022] UID: Kriesel/asr3-rtx2080, no factor for M86000063 from 2^67 to 2^68 [mfaktc 0.21 barrett76_mul32_gs] [Sun Jul 17 08:04:29 2022] UID: Kriesel/asr3-rtx2080, no factor for M86000063 from 2^68 to 2^69 [mfaktc 0.21 barrett76_mul32_gs] [Sun Jul 17 08:05:13 2022] UID: Kriesel/asr3-rtx2080, no factor for M86000063 from 2^69 to 2^70 [mfaktc 0.21 barrett76_mul32_gs] (In reality the latency of each trial factor candidate takes longer, most are sieved out and so not tried, and there are many being tried in parallel.) Last fiddled with by kriesel on 2022-07-17 at 13:20 |
|
![]() |
![]() |
![]() |
#13 |
Jul 2022
3×52 Posts |
![]()
Thanks for the information I am not able to make this type of comparison, surely my algorithm is slower.
I just wanted to know if the algorithm could be optimized to make it as fast as the other method but apparently not. The only comparison I can make is using a search for the numbers 1 + 8*p*k. When I did some tests I used a parallel search for the numbers 1 + 8*p*k and 1 + 2*p+8 p* k if (P // 6)% 2 == 1 or 1 + 6*p + 8*p*k if (p // 6)% 2 == 0. This code takes less than 30 seconds to find the factor: Code:
int main() { uint64_t P=(uint64_t)86000063; uint64_t D=(uint64_t)1+8*P; for (;D<70000000000;D+=(uint64_t)8*P) if (is_pow_onemod(P,D)) { std::cout <<"factor: "<< D << std::endl; break; } return 0; } Last fiddled with by User140242 on 2022-07-17 at 14:03 |
![]() |
![]() |
![]() |
#14 |
Jul 2022
4B16 Posts |
![]()
Of course, using the wheel sieve described here https://gist.github.com/user140242 takes a few seconds:
Code:
#include <iostream> #include <cmath> #include <algorithm> #include <vector> #include <cstdlib> #include <stdint.h> int64_t Euclidean_Diophantine( int64_t coeff_a, int64_t coeff_b) { // find y in Diophantine equation coeff_a x + coeff_b y = 1 int64_t k=1; std::vector<int64_t> div_t; std::vector<int64_t> rem_t; std::vector<int64_t> coeff_t; div_t.push_back(coeff_a); rem_t.push_back(coeff_b); coeff_t.push_back((int64_t)0); div_t.push_back((int64_t)div_t[0]/rem_t[0]); rem_t.push_back((int64_t)div_t[0]%rem_t[0]); coeff_t.push_back((int64_t)0); while (rem_t[k]>1) { k=k+1; div_t.push_back((int64_t)rem_t[k-2]/rem_t[k-1]); rem_t.push_back((int64_t)rem_t[k-2]%rem_t[k-1]); coeff_t.push_back((int64_t)0); } k=k-1; coeff_t[k]=-div_t[k+1]; if (k>0) coeff_t[k-1]=(int64_t)1; while (k > 1) { k=k-1; coeff_t[k-1]=coeff_t[k+1]; coeff_t[k]+=(int64_t)(coeff_t[k+1]*(-div_t[k+1])); } if (k==1) return (int64_t)(coeff_t[k-1]+coeff_t[k]*(-div_t[k])); else return (int64_t)(coeff_t[0]); } bool is_pow_onemod(uint64_t P,uint64_t D) { uint64_t D_2=D/(uint64_t)2; int b = 0; uint64_t N=D; while (N) { b++; N >>= 1; } uint64_t imax=(uint64_t)P-b; uint64_t R=(uint64_t)1; for (uint64_t i=0;i<=imax;i++) R=(R%2==1)?(uint64_t)1+R/(uint64_t)2+D_2:R/(uint64_t)2; if (R==(uint64_t)1 <<(b-1)) return true; return false; } int main() { uint64_t P=(uint64_t)86000063; uint64_t D=(uint64_t)1+8*P; int64_t bW=(int64_t)(8*P); int64_t dim_primes=std::min(bW,(int64_t)524288); int64_t dim_step=(int64_t)32768; std::vector<char> Primes_f(dim_primes + 1,true); std::vector<char> Primes_t(dim_step + 1,true); int64_t r_t,C1,C2,m, mmin; for (int64_t p=3;p*p<=dim_primes;p+=2) if (Primes_f[p]) for (m=p*p;m<=dim_primes;m+=p) Primes_f[m]=false; for (int64_t p=3; p<dim_primes;p+=2) { if (Primes_f[p]) { r_t=(int64_t)(Euclidean_Diophantine(bW, p))%bW; if (r_t>0) r_t-=bW; C2=(int64_t)((-bW+p)*r_t/bW); C1=r_t-bW+p; mmin=(int64_t)(bW+C1+C2); for (m =mmin; m <= dim_step && m>=0; m += p) Primes_t[m] = false; } } int64_t k=1; for (;D<70000000000 && k<=dim_step ;D+=(uint64_t)8*P ,k++) if (Primes_t[k]) { if (is_pow_onemod(P,D)) { std::cout <<"factor: "<< D << std::endl; break; } } return 0; } |
![]() |
![]() |
![]() |
#15 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]()
Sure, to apparently a mere D<70000000000 (~36 bits). TF effort is proportional to 2^bits_depth, so this is ~2^31 fold (2,147,483,648:1) smaller work than to 2^67 used in the earliest mfaktc example posted in this thread.
Also the code in the previous post appears to me to be skipping all the 7 mod 8 D values, for another factor of 2:1. Code:
uint64_t D=(uint64_t)1+8*P; D+=(uint64_t)8*P Last fiddled with by kriesel on 2022-07-18 at 09:12 |
![]() |
![]() |
![]() |
#16 | ||
Jul 2022
10010112 Posts |
![]() Quote:
We agree that the algorithm is slower. Quote:
I can explain how the algorithm works but I am unable to optimize the code. What I posted is an example to show how it works. Last fiddled with by User140242 on 2022-07-18 at 09:16 |
||
![]() |
![]() |
![]() |
#17 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
22·5·503 Posts |
![]()
There is nothing that needs explaining. You are implementing exponentiation by millions of multiplications.
Instead, this method exists which is > 1,000,000 times faster for current run-of-the-mill worktask units. (The current range of testing is around P ~ 110,000,000) This is a longer explanation to the algorithm that was already given to you for reading > 3 times - https://www.mersenne.org/various/math.php (3rd section, "Trial Factoring") |
![]() |
![]() |
![]() |
#18 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]() Quote:
The timings you give for your code are for unspecified hardware so don't mean a lot. Although they do serve to show that it may be several orders of magnitude slower per candidate factor. Or run time is dominated by setup of the small-primes table etc. p=86000063 is ~26.36 bits, so for D<36 bits, k< D/2/p ~ 400. Only ~20 TF candidates will be prime and tried. That's too few for your program's timings to be meaningful. It would help to add time outputs to your code at start, at completion of setup, and at exit, and perhaps output kmin, kmax. If you haven't yet, I suggest you review the source code of prime95 TF or mfactor and of mfaktc or mfakto for some ideas of how to make TF code fast. As another point of comparison, prime95 V30.7b9 on i5-1035G1 (4 core & HT), Factor=86000063,48,56 Code:
[Jul 18 04:36:14] Starting trial factoring of M86000063 to 2^56 [Jul 18 04:36:15] M86000063 has a factor: 61920045361 (TF:48-56) [Jul 18 04:36:15] Trial factoring M86000063 to 2^50. [Jul 18 04:36:15] Trial factoring M86000063 to 2^51. [Jul 18 04:36:18] Trial factoring M86000063 to 2^52. [Jul 18 04:36:21] Trial factoring M86000063 to 2^53. [Jul 18 04:36:24] Trial factoring M86000063 to 2^54. [Jul 18 04:36:28] Trial factoring M86000063 to 2^55. [Jul 18 04:36:32] Trial factoring M86000063 to 2^56. [Jul 18 04:36:35] M86000063 no factor from 2^48 to 2^56, Wh4: 39743C51 Factor=86000063,56,64 Code:
[Jul 18 04:42:15] Starting trial factoring of M86000063 to 2^64 [Jul 18 04:42:15] Trial factoring M86000063 to 2^57. [Jul 18 04:42:18] Trial factoring M86000063 to 2^58. [Jul 18 04:42:21] Trial factoring M86000063 to 2^59. [Jul 18 04:42:24] Trial factoring M86000063 to 2^60. [Jul 18 04:42:27] Trial factoring M86000063 to 2^61. [Jul 18 04:42:34] Trial factoring M86000063 to 2^62. [Jul 18 04:42:44] Trial factoring M86000063 to 2^63. [Jul 18 04:43:02] M86000063 no factor from 2^56 to 2^63, Wh4: 39743C51 [Jul 18 04:43:02] Trial factoring M86000063 to 2^64. [Jul 18 04:43:38] M86000063 no factor from 2^63 to 2^64, Wh4: 39743C51 AFAIK, all of Mn, 50M< n < 1G have been TF past 2^73 already. https://www.mersenne.org/report_fact...i=&tftobits=72 Last fiddled with by kriesel on 2022-07-18 at 10:28 |
|
![]() |
![]() |
![]() |
#19 | ||
Jul 2022
3×52 Posts |
![]() Quote:
So I explained: Quote:
1+2*p+8 p*k or or 1+6*p+8*p*k they are numbers 7 mod 8 I am not an expert as I said I am not able to do what you ask me. |
||
![]() |
![]() |
![]() |
#20 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
22·5·503 Posts |
![]()
How about this, straight from Wikipedia (with pseudocode changed into python):
Code:
def is_pow_onemod_fromWiki(P,D): # return true if 2^Mp = 1 (mod D) x=2 y=1 while (P > 1): if (P%2 == 1): y = (x * y) %D x = (x * x) %D P >>= 1 return (((x*y)%D) == 1) print is_pow_onemod_fromWiki(11,23) print is_pow_onemod_fromWiki(23,47) print is_pow_onemod_fromWiki(410143, 4101431) print is_pow_onemod_fromWiki(9650023,231600553) print is_pow_onemod_fromWiki(96500219,3088007009) True True True True True real 0m0.009s You will need to implement more for (x*y)%D when D > 2^32, but this works in the same domain as your code. |
![]() |
![]() |
![]() |
#21 |
Jul 2022
3·52 Posts |
![]()
Thanks but we had already made it clear that it was much slower.
The algorithm you posted is faster: Code:
#include <iostream> #include <cmath> #include <algorithm> #include <vector> #include <cstdlib> #include <stdint.h> #include <time.h> int64_t Euclidean_Diophantine( int64_t coeff_a, int64_t coeff_b) { // return y in Diophantine equation coeff_a x + coeff_b y = 1 int64_t k=1; std::vector<int64_t> div_t; std::vector<int64_t> rem_t; std::vector<int64_t> coeff_t; div_t.push_back(coeff_a); rem_t.push_back(coeff_b); coeff_t.push_back((int64_t)0); div_t.push_back((int64_t)div_t[0]/rem_t[0]); rem_t.push_back((int64_t)div_t[0]%rem_t[0]); coeff_t.push_back((int64_t)0); while (rem_t[k]>1) { k=k+1; div_t.push_back((int64_t)rem_t[k-2]/rem_t[k-1]); rem_t.push_back((int64_t)rem_t[k-2]%rem_t[k-1]); coeff_t.push_back((int64_t)0); } k=k-1; coeff_t[k]=-div_t[k+1]; if (k>0) coeff_t[k-1]=(int64_t)1; while (k > 1) { k=k-1; coeff_t[k-1]=coeff_t[k+1]; coeff_t[k]+=(int64_t)(coeff_t[k+1]*(-div_t[k+1])); } if (k==1) return (int64_t)(coeff_t[k-1]+coeff_t[k]*(-div_t[k])); else return (int64_t)(coeff_t[0]); } bool is_2pow_onemod(uint64_t P,uint64_t D) { // return true if 2^P = 1 (mod D) uint64_t x=2; uint64_t y=1; while (P > (uint64_t)1) { if (P%2 == 1) y = (x * y) %D; x = (x * x) %D; P >>= 1; } return (((x*y)%D) == 1); } uint64_t mersenne_first_factor(uint64_t P) { // P>61 returns, if it exists, the first factor of (2^P - 1) less than 1+dim_step*8*P int64_t bW=(int64_t)(8*P); // set the maximum value for factor D == 1+dim_step*bW int64_t dim_step=(int64_t)2097152; // set the maximum value small primes for sieve int64_t dim_primes=std::min(bW,(int64_t)524288); uint64_t D1mod8=(uint64_t)1; uint64_t D7mod8=(uint64_t)1+2*P; if ((P/(uint64_t)6)%2==0) D7mod8+=(uint64_t)4*P; // vector for find small primes for sieve std::vector<char> Primes_s(dim_primes + 1,true); // vector for find factor primes 1 mod 8 p_k= 1+8*P*k for k>0 std::vector<char> Primes_1mod8(dim_step + 1,true); // vector for find factor 7 mod 8 p_k= 1+2*p+8*P*k or p_k= 1+6*p+8*P*k for k>=0 std::vector<char> Primes_7mod8(dim_step + 1,true); int64_t r_D,r_t,C1,C2,m, mmin; // find small primes for sieve for (int64_t p=3;p*p<=dim_primes;p+=2) if (Primes_s[p]) for (m=p*p;m<=dim_primes;m+=p) Primes_s[m]=false; if ((int64_t)P<=dim_primes) Primes_s[(int64_t)P]=false; // delete multiples of small primes in Primes_1mod8 for (int64_t p=3; p<dim_primes;p+=2) { if (Primes_s[p]) { r_t=(int64_t)(Euclidean_Diophantine(bW, p))%bW; if (r_t>0) r_t-=bW; C2=(int64_t)((-bW+p)*r_t/bW); C1=r_t-bW+p; mmin=(int64_t)(bW+C1+C2); for (m =mmin; m <= dim_step && m>=0; m += p) Primes_1mod8[m] = false; } } // delete multiples of small primes in Primes_7mod8 r_D=(int64_t)D7mod8; for (int64_t p=3; p<dim_primes;p+=2) { if (Primes_s[p]) { r_t=(int64_t)((Euclidean_Diophantine(bW, p))*r_D)%bW; if (r_t>0) r_t-=bW; C2=(int64_t)((-bW+p)*r_t/bW); C1=r_t-bW+p; mmin=(int64_t)(bW+C1+C2); for (m =mmin; m <= dim_step && m>=0; m += p) Primes_7mod8[m] = false; } } Primes_1mod8[0]=false; Primes_7mod8[0]=true; uint64_t D=(uint64_t)0; for (int64_t k=0; k<=dim_step ;D1mod8+=(uint64_t)8*P,D7mod8+=(uint64_t)8*P ,k++) { if (Primes_7mod8[k]) { if (is_2pow_onemod(P,D7mod8)) { D=D7mod8; break; } } if (Primes_1mod8[k]) { if (is_2pow_onemod(P,D1mod8)) { D=D1mod8; break; } } } return D; } int main() { clock_t begin = clock(); std::cout <<"factor: "<< mersenne_first_factor(96500219)<< std::endl; clock_t end = clock(); double time_spent = (double)(end - begin) / CLOCKS_PER_SEC; std::cout << " t: s "<< time_spent << std::endl; return 0; } Last fiddled with by User140242 on 2022-07-19 at 10:30 |
![]() |
![]() |
![]() |
#22 |
Jul 2022
4B16 Posts |
![]()
I add a note, of course the code in the previous post serves as an example in the specific case 96500219. Because I simply replaced the is_pow_onemod function but the new function can be used for D <2^32. I can't edit anymore but it only works for dim_step< 2^32 /8/P. It also always checks 1 + 2P or 1 + 6P even if they are multiples of the small primes sieve.
Last fiddled with by User140242 on 2022-07-19 at 12:28 |
![]() |
![]() |