mersenneforum.org Mp mod D
 Register FAQ Search Today's Posts Mark Forums Read

2022-07-17, 13:17   #12
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts

Quote:
 Originally Posted by User140242 though I don't have a comparison with the other method
(so no basis for the following claim)
Quote:
 it's fast.
The known factor 61920045361 is ~36 bits. When databases for mersenne.org or mersenne.ca are set up or expanded, they are typically initialized by running TF up to 57 bits or so on every exponent entry. Because exhaustively TF up to that shallow a depth is straightforward.

For comparison, mfaktc on an RTX 2080 GPU running at reduced power, and more_classes (960 of 4620 classes) logging default console format to a rotating disk, does all TF candidates in 1-67 bits for exponent M86000063 in under a minute. That's a space of ~2^67/2/86000063/2 ~ 428,993,733,963 k values 1 or 7 mod 8. (Estimated throughput corresponds to ~0.14 nanosecond per k value, pessimistically assuming 60 seconds total duration. That's pessimistic by a factor of something less than 3, possibly as low as 6/5. Times per class indicate run time ~23 seconds.)

Code:
Starting trial factoring M86000063 from 2^1 to 2^67 (0.35 GHz-days)
k_min =  0
k_max =  857987467925
Using GPU kernel "75bit_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jul 17 07:03 |    0   0.1% |  0.024    n.a. |   1295.66   106037    n.a.%
Jul 17 07:03 |    1   0.2% |  0.024    n.a. |   1295.66   106037    n.a.%
Jul 17 07:03 |   13   0.3% |  0.024    n.a. |   1295.66   106037    n.a.%
...
Jul 17 07:03 | 4605  99.8% |  0.023    n.a. |   1352.00   106037    n.a.%
Jul 17 07:03 | 4612  99.9% |  0.023    n.a. |   1352.00   106037    n.a.%
Jul 17 07:03 | 4617 100.0% |  0.023    n.a. |   1352.00   106037    n.a.%

results.txt contents:

[Sun Jul 17 07:03:29 2022]
UID: Kriesel/asr3-rtx2080, M86000063 has a factor: 61920045361 [TF:1:67:mfaktc 0.21 75bit_mul32_gs]
[Sun Jul 17 07:03:50 2022]
UID: Kriesel/asr3-rtx2080, found 1 factor for M86000063 from 2^ 1 to 2^67 [mfaktc 0.21 75bit_mul32_gs]

[Sun Jul 17 08:04:06 2022]
UID: Kriesel/asr3-rtx2080, no factor for M86000063 from 2^67 to 2^68 [mfaktc 0.21 barrett76_mul32_gs]
[Sun Jul 17 08:04:29 2022]
UID: Kriesel/asr3-rtx2080, no factor for M86000063 from 2^68 to 2^69 [mfaktc 0.21 barrett76_mul32_gs]
[Sun Jul 17 08:05:13 2022]
UID: Kriesel/asr3-rtx2080, no factor for M86000063 from 2^69 to 2^70 [mfaktc 0.21 barrett76_mul32_gs]
The last 3 results are from a single worktodo entry, so the second reflects ~269-268 ~268 covered in 23 seconds; the third ~269 covered in 44 seconds. That last is ~2^69/2/86000063/2 / 44 ~ 39.E9 candidates/sec or an effective rate ~25.6 picoseconds / candidate.
(In reality the latency of each trial factor candidate takes longer, most are sieved out and so not tried, and there are many being tried in parallel.)

Last fiddled with by kriesel on 2022-07-17 at 13:20

 2022-07-17, 13:29 #13 User140242   Jul 2022 3×52 Posts Thanks for the information I am not able to make this type of comparison, surely my algorithm is slower. I just wanted to know if the algorithm could be optimized to make it as fast as the other method but apparently not. The only comparison I can make is using a search for the numbers 1 + 8*p*k. When I did some tests I used a parallel search for the numbers 1 + 8*p*k and 1 + 2*p+8 p* k if (P // 6)% 2 == 1 or 1 + 6*p + 8*p*k if (p // 6)% 2 == 0. This code takes less than 30 seconds to find the factor: Code: int main() { uint64_t P=(uint64_t)86000063; uint64_t D=(uint64_t)1+8*P; for (;D<70000000000;D+=(uint64_t)8*P) if (is_pow_onemod(P,D)) { std::cout <<"factor: "<< D << std::endl; break; } return 0; } Last fiddled with by User140242 on 2022-07-17 at 14:03
 2022-07-18, 08:21 #14 User140242   Jul 2022 4B16 Posts Of course, using the wheel sieve described here https://gist.github.com/user140242 takes a few seconds: Code: #include #include #include #include #include #include int64_t Euclidean_Diophantine( int64_t coeff_a, int64_t coeff_b) { // find y in Diophantine equation coeff_a x + coeff_b y = 1 int64_t k=1; std::vector div_t; std::vector rem_t; std::vector coeff_t; div_t.push_back(coeff_a); rem_t.push_back(coeff_b); coeff_t.push_back((int64_t)0); div_t.push_back((int64_t)div_t[0]/rem_t[0]); rem_t.push_back((int64_t)div_t[0]%rem_t[0]); coeff_t.push_back((int64_t)0); while (rem_t[k]>1) { k=k+1; div_t.push_back((int64_t)rem_t[k-2]/rem_t[k-1]); rem_t.push_back((int64_t)rem_t[k-2]%rem_t[k-1]); coeff_t.push_back((int64_t)0); } k=k-1; coeff_t[k]=-div_t[k+1]; if (k>0) coeff_t[k-1]=(int64_t)1; while (k > 1) { k=k-1; coeff_t[k-1]=coeff_t[k+1]; coeff_t[k]+=(int64_t)(coeff_t[k+1]*(-div_t[k+1])); } if (k==1) return (int64_t)(coeff_t[k-1]+coeff_t[k]*(-div_t[k])); else return (int64_t)(coeff_t[0]); } bool is_pow_onemod(uint64_t P,uint64_t D) { uint64_t D_2=D/(uint64_t)2; int b = 0; uint64_t N=D; while (N) { b++; N >>= 1; } uint64_t imax=(uint64_t)P-b; uint64_t R=(uint64_t)1; for (uint64_t i=0;i<=imax;i++) R=(R%2==1)?(uint64_t)1+R/(uint64_t)2+D_2:R/(uint64_t)2; if (R==(uint64_t)1 <<(b-1)) return true; return false; } int main() { uint64_t P=(uint64_t)86000063; uint64_t D=(uint64_t)1+8*P; int64_t bW=(int64_t)(8*P); int64_t dim_primes=std::min(bW,(int64_t)524288); int64_t dim_step=(int64_t)32768; std::vector Primes_f(dim_primes + 1,true); std::vector Primes_t(dim_step + 1,true); int64_t r_t,C1,C2,m, mmin; for (int64_t p=3;p*p<=dim_primes;p+=2) if (Primes_f[p]) for (m=p*p;m<=dim_primes;m+=p) Primes_f[m]=false; for (int64_t p=3; p0) r_t-=bW; C2=(int64_t)((-bW+p)*r_t/bW); C1=r_t-bW+p; mmin=(int64_t)(bW+C1+C2); for (m =mmin; m <= dim_step && m>=0; m += p) Primes_t[m] = false; } } int64_t k=1; for (;D<70000000000 && k<=dim_step ;D+=(uint64_t)8*P ,k++) if (Primes_t[k]) { if (is_pow_onemod(P,D)) { std::cout <<"factor: "<< D << std::endl; break; } } return 0; }
 2022-07-18, 08:58 #15 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2×29×127 Posts Sure, to apparently a mere D<70000000000 (~36 bits). TF effort is proportional to 2^bits_depth, so this is ~2^31 fold (2,147,483,648:1) smaller work than to 2^67 used in the earliest mfaktc example posted in this thread. Also the code in the previous post appears to me to be skipping all the 7 mod 8 D values, for another factor of 2:1. Code: uint64_t D=(uint64_t)1+8*P; D+=(uint64_t)8*P See for example the larger known factor of https://www.mersenne.ca/exponent/86000069; 65.3 bits and 7 mod 8. Or the smaller of https://www.mersenne.ca/exponent/86000143; 43.9 bits and 7 mod 8. Last fiddled with by kriesel on 2022-07-18 at 09:12
2022-07-18, 09:05   #16
User140242

Jul 2022

10010112 Posts

Quote:
 Originally Posted by kriesel Sure, to apparently a mere D<70000000000 (~36 bits). Effort is proportional to 2^bits, so this is ~2^31 fold (2,147,483,648:1) smaller work than to 2^67 used in the earliest mfaktc example posted in this thread.

We agree that the algorithm is slower.

Quote:
 Also it appears to me to be skipping all the 7 mod 8 D values.
Yes, as mentioned before, a second block should be made to work in parallel for the numbers 7 mod 8.

I can explain how the algorithm works but I am unable to optimize the code. What I posted is an example to show how it works.

Last fiddled with by User140242 on 2022-07-18 at 09:16

 2022-07-18, 10:06 #17 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 22·5·503 Posts There is nothing that needs explaining. You are implementing exponentiation by millions of multiplications. Instead, this method exists which is > 1,000,000 times faster for current run-of-the-mill worktask units. (The current range of testing is around P ~ 110,000,000) This is a longer explanation to the algorithm that was already given to you for reading > 3 times - https://www.mersenne.org/various/math.php (3rd section, "Trial Factoring")
2022-07-18, 10:16   #18
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts

Quote:
 Originally Posted by User140242 We agree that the algorithm is slower. Yes, as mentioned before, a second block should be made to work in parallel for the numbers 7 mod 8. I can explain how the algorithm works but I am unable to optimize the code. What I posted is an example to show how it works.
I do not see any mention in your posts preceding the quoted one, regarding 7 mod 8.
The timings you give for your code are for unspecified hardware so don't mean a lot. Although they do serve to show that it may be several orders of magnitude slower per candidate factor. Or run time is dominated by setup of the small-primes table etc.
p=86000063 is ~26.36 bits, so for D<36 bits, k< D/2/p ~ 400. Only ~20 TF candidates will be prime and tried. That's too few for your program's timings to be meaningful. It would help to add time outputs to your code at start, at completion of setup, and at exit, and perhaps output kmin, kmax.

If you haven't yet, I suggest you review the source code of prime95 TF or mfactor and of mfaktc or mfakto for some ideas of how to make TF code fast.
As another point of comparison, prime95 V30.7b9 on i5-1035G1 (4 core & HT),
Factor=86000063,48,56
Code:
[Jul 18 04:36:14] Starting trial factoring of M86000063 to 2^56
[Jul 18 04:36:15] M86000063 has a factor: 61920045361 (TF:48-56)
[Jul 18 04:36:15] Trial factoring M86000063 to 2^50.
[Jul 18 04:36:15] Trial factoring M86000063 to 2^51.
[Jul 18 04:36:18] Trial factoring M86000063 to 2^52.
[Jul 18 04:36:21] Trial factoring M86000063 to 2^53.
[Jul 18 04:36:24] Trial factoring M86000063 to 2^54.
[Jul 18 04:36:28] Trial factoring M86000063 to 2^55.
[Jul 18 04:36:32] Trial factoring M86000063 to 2^56.
[Jul 18 04:36:35] M86000063 no factor from 2^48 to 2^56, Wh4: 39743C51
The preceding shows bit level timings almost invariant with depth, appearing dominated by setup time. Going deeper,

Factor=86000063,56,64
Code:
[Jul 18 04:42:15] Starting trial factoring of M86000063 to 2^64
[Jul 18 04:42:15] Trial factoring M86000063 to 2^57.
[Jul 18 04:42:18] Trial factoring M86000063 to 2^58.
[Jul 18 04:42:21] Trial factoring M86000063 to 2^59.
[Jul 18 04:42:24] Trial factoring M86000063 to 2^60.
[Jul 18 04:42:27] Trial factoring M86000063 to 2^61.
[Jul 18 04:42:34] Trial factoring M86000063 to 2^62.
[Jul 18 04:42:44] Trial factoring M86000063 to 2^63.
[Jul 18 04:43:02] M86000063 no factor from 2^56 to 2^63, Wh4: 39743C51
[Jul 18 04:43:02] Trial factoring M86000063 to 2^64.
[Jul 18 04:43:38] M86000063 no factor from 2^63 to 2^64, Wh4: 39743C51
It's mostly setup time per bit level of ~3 seconds until 2^62. TF from 64 bits to 65 ran in 60 seconds. Prime95 appears to be running 8 threads of TF, using 90% of the logical CPU core cycles available (and there were numerous other processes running on that CPU at the time, including many tabs open in this web browser session, taking the other 10%)

AFAIK, all of Mn, 50M< n < 1G have been TF past 2^73 already. https://www.mersenne.org/report_fact...i=&tftobits=72

Last fiddled with by kriesel on 2022-07-18 at 10:28

2022-07-18, 10:27   #19
User140242

Jul 2022

3×52 Posts

Quote:
 Originally Posted by kriesel I do not see any mention in your posts preceding the quoted one, regarding 7 mod 8.

So I explained:

Quote:
 When I did some tests I used a parallel search for the numbers 1 + 8*p*k and 1 + 2*p+8 p* k if (P // 6)% 2 == 1 or 1 + 6*p + 8*p*k if (p // 6)% 2 == 0.

1+2*p+8 p*k or or 1+6*p+8*p*k they are numbers 7 mod 8

I am not an expert as I said I am not able to do what you ask me.

2022-07-19, 06:18   #20
Batalov

"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

22·5·503 Posts

Quote:
 Originally Posted by User140242 example is_pow_onemod(11,23) returns True

Code:
def is_pow_onemod_fromWiki(P,D):
# return true if 2^Mp = 1 (mod D)
x=2
y=1
while (P > 1):
if (P%2 == 1):
y = (x * y) %D
x = (x * x) %D
P >>= 1
return (((x*y)%D) == 1)

print is_pow_onemod_fromWiki(11,23)
print is_pow_onemod_fromWiki(23,47)
print is_pow_onemod_fromWiki(410143, 4101431)
print is_pow_onemod_fromWiki(9650023,231600553)
print is_pow_onemod_fromWiki(96500219,3088007009)
\$ time python ex.py
True
True
True
True
True

real 0m0.009s

You will need to implement more for (x*y)%D when D > 2^32, but this works in the same domain as your code.

 2022-07-19, 10:10 #21 User140242   Jul 2022 3·52 Posts Thanks but we had already made it clear that it was much slower. The algorithm you posted is faster: Code: #include #include #include #include #include #include #include int64_t Euclidean_Diophantine( int64_t coeff_a, int64_t coeff_b) { // return y in Diophantine equation coeff_a x + coeff_b y = 1 int64_t k=1; std::vector div_t; std::vector rem_t; std::vector coeff_t; div_t.push_back(coeff_a); rem_t.push_back(coeff_b); coeff_t.push_back((int64_t)0); div_t.push_back((int64_t)div_t[0]/rem_t[0]); rem_t.push_back((int64_t)div_t[0]%rem_t[0]); coeff_t.push_back((int64_t)0); while (rem_t[k]>1) { k=k+1; div_t.push_back((int64_t)rem_t[k-2]/rem_t[k-1]); rem_t.push_back((int64_t)rem_t[k-2]%rem_t[k-1]); coeff_t.push_back((int64_t)0); } k=k-1; coeff_t[k]=-div_t[k+1]; if (k>0) coeff_t[k-1]=(int64_t)1; while (k > 1) { k=k-1; coeff_t[k-1]=coeff_t[k+1]; coeff_t[k]+=(int64_t)(coeff_t[k+1]*(-div_t[k+1])); } if (k==1) return (int64_t)(coeff_t[k-1]+coeff_t[k]*(-div_t[k])); else return (int64_t)(coeff_t[0]); } bool is_2pow_onemod(uint64_t P,uint64_t D) { // return true if 2^P = 1 (mod D) uint64_t x=2; uint64_t y=1; while (P > (uint64_t)1) { if (P%2 == 1) y = (x * y) %D; x = (x * x) %D; P >>= 1; } return (((x*y)%D) == 1); } uint64_t mersenne_first_factor(uint64_t P) { // P>61 returns, if it exists, the first factor of (2^P - 1) less than 1+dim_step*8*P int64_t bW=(int64_t)(8*P); // set the maximum value for factor D == 1+dim_step*bW int64_t dim_step=(int64_t)2097152; // set the maximum value small primes for sieve int64_t dim_primes=std::min(bW,(int64_t)524288); uint64_t D1mod8=(uint64_t)1; uint64_t D7mod8=(uint64_t)1+2*P; if ((P/(uint64_t)6)%2==0) D7mod8+=(uint64_t)4*P; // vector for find small primes for sieve std::vector Primes_s(dim_primes + 1,true); // vector for find factor primes 1 mod 8 p_k= 1+8*P*k for k>0 std::vector Primes_1mod8(dim_step + 1,true); // vector for find factor 7 mod 8 p_k= 1+2*p+8*P*k or p_k= 1+6*p+8*P*k for k>=0 std::vector Primes_7mod8(dim_step + 1,true); int64_t r_D,r_t,C1,C2,m, mmin; // find small primes for sieve for (int64_t p=3;p*p<=dim_primes;p+=2) if (Primes_s[p]) for (m=p*p;m<=dim_primes;m+=p) Primes_s[m]=false; if ((int64_t)P<=dim_primes) Primes_s[(int64_t)P]=false; // delete multiples of small primes in Primes_1mod8 for (int64_t p=3; p0) r_t-=bW; C2=(int64_t)((-bW+p)*r_t/bW); C1=r_t-bW+p; mmin=(int64_t)(bW+C1+C2); for (m =mmin; m <= dim_step && m>=0; m += p) Primes_1mod8[m] = false; } } // delete multiples of small primes in Primes_7mod8 r_D=(int64_t)D7mod8; for (int64_t p=3; p0) r_t-=bW; C2=(int64_t)((-bW+p)*r_t/bW); C1=r_t-bW+p; mmin=(int64_t)(bW+C1+C2); for (m =mmin; m <= dim_step && m>=0; m += p) Primes_7mod8[m] = false; } } Primes_1mod8[0]=false; Primes_7mod8[0]=true; uint64_t D=(uint64_t)0; for (int64_t k=0; k<=dim_step ;D1mod8+=(uint64_t)8*P,D7mod8+=(uint64_t)8*P ,k++) { if (Primes_7mod8[k]) { if (is_2pow_onemod(P,D7mod8)) { D=D7mod8; break; } } if (Primes_1mod8[k]) { if (is_2pow_onemod(P,D1mod8)) { D=D1mod8; break; } } } return D; } int main() { clock_t begin = clock(); std::cout <<"factor: "<< mersenne_first_factor(96500219)<< std::endl; clock_t end = clock(); double time_spent = (double)(end - begin) / CLOCKS_PER_SEC; std::cout << " t: s "<< time_spent << std::endl; return 0; } Last fiddled with by User140242 on 2022-07-19 at 10:30
 2022-07-19, 12:27 #22 User140242   Jul 2022 4B16 Posts I add a note, of course the code in the previous post serves as an example in the specific case 96500219. Because I simply replaced the is_pow_onemod function but the new function can be used for D <2^32. I can't edit anymore but it only works for dim_step< 2^32 /8/P. It also always checks 1 + 2P or 1 + 6P even if they are multiples of the small primes sieve. Last fiddled with by User140242 on 2022-07-19 at 12:28

All times are UTC. The time now is 16:21.

Sat Feb 4 16:21:30 UTC 2023 up 170 days, 13:50, 1 user, load averages: 1.59, 1.15, 0.94