mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Data (https://www.mersenneforum.org/forumdisplay.php?f=21)
-   -   Thinking out loud about getting under 20M unfactored exponents (https://www.mersenneforum.org/showthread.php?t=22476)

petrw1 2020-12-31 05:56

Is a roundoff error ok?
 
[Dec 30 17:46] P-1 on M43012451 with B1=815000, B2=TBD
[Dec 30 17:46] Setting affinity to run helper thread 1 on CPU core #2
[Dec 30 17:46] Using AVX FFT length 2240K, Pass1=448, Pass2=5K, clm=4, 2 threads
[Dec 30 20:54] M43012451 stage 1 complete. 2351934 transforms. Time: 11283968.671 ms.
[Dec 30 20:54] Stage 1 GCD complete. Time: 12759.703 ms.
[Dec 30 20:54] Available memory is 3977MB.
[Dec 30 20:54] With trial factoring done to 2^74, optimal B2 is 46*B1 = 37490000. If no prior P-1, chance of a new factor is 4.7%
[Dec 30 20:54] D: 210, relative primes: 219, stage 2 primes: 2224734, pair%=84.02
[Dec 30 20:54] Using 3966MB of memory.
[Dec 30 20:54] Stage 2 init complete. 2391 transforms. Time: 23210.622 ms.
[Dec 30 21:43] M43012451 stage 2 is 12.802% complete. Time: 2913325.276 ms.
[Dec 30 22:06] Restarting worker with new memory settings.
[Dec 30 22:06]
[Dec 30 22:06] P-1 on M43012451 with B1=815000, B2=TBD
[Dec 30 22:06] Using AVX FFT length 2240K, Pass1=448, Pass2=5K, clm=4, 2 threads
[Dec 30 22:06] Setting affinity to run helper thread 1 on CPU core #2
[Dec 30 22:06] Resuming P-1 in stage 2 with B2=37490000
[Dec 30 22:06] Available memory is 2006MB.
[Dec 30 22:06] D: 210, relative primes: 108, stage 2 primes: 1812724, pair%=70.58
[Dec 30 22:06] Using 1994MB of memory.
[Dec 30 22:06] Stage 2 init complete. 1324 transforms. Time: 11045.366 ms.
[Dec 30 22:06] M43012451 stage 2 is 37.871% complete.
[Dec 30 22:55] M43012451 stage 2 is 49.781% complete. Time: 2918571.851 ms.
[Dec 30 23:32] [B][U]Possible roundoff error (0.5)[/U][/B], backtracking to last save file.
[Dec 30 23:32] Setting affinity to run helper thread 1 on CPU core #2
[Dec 30 23:32] Using AVX FFT length 2240K, Pass1=448, Pass2=5K, clm=4, 2 threads
[Dec 30 23:32] Resuming P-1 in stage 2 with B2=37490000
[Dec 30 23:32] Available memory is 2006MB.
[Dec 30 23:32] Using 1994MB of memory.
[Dec 30 23:32] Stage 2 init complete. 1385 transforms. Time: 9182.739 ms.
[Dec 30 23:32] M43012451 stage 2 is 53.929% complete.

Prime95 2020-12-31 08:09

[QUOTE=petrw1;567786]Ok maybe that's a little dramatic but I did crash 30.4 [/QUOTE]

Excellent. I'll have a fix soon.

[QUOTE=axn;567795]Omitting B-S simplifies the code somewhat, but yields only minor speedup. The main speed up comes from better prime pairing. Previously, the prime pairing was like 10-15%. Now it is more like 85-95%. I don't have much insight into how this is achieved, but I'm guessing the increased number of temps (much higher than the relprimes(D)) is somehow involved.[/QUOTE]

The idea came from Mihai Preda. If D=30=2*3*5, the four relative primes 1,7,11,13 can cover a particular multiple of D between B1 and B2. If both Dmultiple - relprime and Dmultiple + relprime are prime then a pairing occurs requiring half the work.

In the old code if we had more memory we would change D to 210=2*3*5*7 which increased speed two ways. One, it is faster to step from B1 to B2 by 210 rather than 30. Two, the prime pairing chances go up a little.

Mihai's idea is instead of using extra memory to increase D, we us more than the minimum relative primes. For D=30, if we allocate 8 relative primes then each prime between B1 and B2 can be represented by two different Dmultiples +/- relprime. Prime95 now has two chances to pair a prime instead of one. We lose some speed by stepping by a smaller increment, but gain much more by better pairing.

I'm not a fan of your idea to leave out the larger unpaired primes. The idea works, but it just seems "untidy".

nordi 2020-12-31 14:20

The new code seems to have a race condition during startup that segfaults mprime roughly 1 out of 10 times on my system. I have
[LIST][*]32 workers, configured to run [I]ECM2=N/A,1,2,1277,-1,100000000,1000000000,1,[/I][*]StaggerStarts=0 in prime.txt to make all threads launch at about the same time[*]"rm e*" before each start to remove the old state[/LIST]When the segfault happens, it is immediately after starting mprime. The last thing to be printed on the screen is

[Worker #32 Dec 31 15:09] Setting affinity to run worker on CPU core #16

Kernel log shows

mprime[22843]: segfault at 10 ip 00007f3c4a9cae40 sp 00007f3c3d99c7e8 error 4 in libpthread-2.26.so[7f3c4a9c0000+19000]


I tried the same setup with version 29.8. In 40 runs, it did not segfault a single time.

nordi 2020-12-31 14:37

While investigating the segfault issue, I found that when doing

ECM2=N/A,1,2,11,-1,100000000,1000000000,1

the factor is found and then mprime goes to waiting mode. When I stop it with CTRL+C, I get a lot of

[Main thread Dec 31 15:22] In write_gwnum, unexpected len == 0 failure

messages. Version 29.8 does not print this error.

However, while trying that out I also segfaulted version 29.8 a few times, so maybe the race condition is not in the new code after all.



And all this because I wanted to run some more benchmarks :lol:

nordi 2020-12-31 16:01

I finally got the benchmarks done. For

ECM2=N/A,1,2,1619,-1,10000000,100000000,1,

which uses FMA3 FFT length 96 on my Ryzen 3950X with 16 cores, I had these results:

16 workers:
[LIST][*]version 29.8: 44 seconds[*]version 30.4b3: 46 seconds[/LIST]32 workers:
[LIST][*]version 29.8: 62 seconds[*]version 30.4b3: 65 seconds[/LIST]So the new version is slightly slower for step 1 in this setup. Also, hyperthreading gives ~40% faster stage 1 in both versions.

masser 2020-12-31 20:02

[QUOTE=petrw1;567792]
Woot Woot[/QUOTE]

Indeed. Version 30.4 will shave [B]months[/B] off of my effort to have less than 2000 unfactored exponents in the 14.0M range. Thanks, George!

It took me a little while to get good apples-to-apples comparisons. Note that with v. 30.4 I have flexibility to improve both the factoring odds and the runtimes. Here are my benchmarks:
[B]
First Machine[/B]: i5-4690s, with 32 GB 1600 Mhz DDR4 RAM (only 7GB allocated to mprime)

[B]PM1[/B]
Exponent, B1, B2, runtime_30.3(min:sec), runtime_30.4(min:sec)
72713617, 104771, 2619275, 70:32, 49:50
95675581, 114357, 3316353, 111:55, 78:35
102001051, 37123, 928075, 39:46, 28:04

[B]ECM[/B]
Exponent, NumCurves, B1, B2, runtime_30.3(min:sec), runtime_30.4(min:sec)
4312787, 5, 50000, 6650000, 70:50, 46:25
5094979, 6, 50000, 6350000, 81:19, 61:00

[B]14.0M factoring tasks[/B] (times are hour:min:sec)
ECM: 6 t25 (B1=50k,B2=100B1+1) curves; with 30.3: 5:17:13; with 30.4: 4:05:26
P-1: B1=5M, B2=135M; with 30.3: 7:47:37; with 30.4: 4:40:50


[B]Second Machine[/B]: i7-6700, with 8 GB 2400 Mhz DDR4 RAM (only 3GB allocated to mprime)

[B]PM1[/B]
Exponent, B1, B2, runtime_30.3(min:sec), runtime_30.4(min:sec)
50077721, 280000, 280000, 25:47, 26:20 <---- notice this is a stage one only run
72713617, 104771, 2304962, 42:32, 32:57
95675581, 114357, 2973282, 68:48, 53:48
102001051, 37123, 816706, 24:51, 19:53

[B]ECM[/B]
Exponent, NumCurves, B1, B2, runtime_30.3(min:sec), runtime_30.4(min:sec)
4312787, 5, 50000, 5250000, 42:03, 27:31
5094979, 6, 50000, 5450000, 53:01, 41:48

[B]14.0M factoring tasks[/B] (times are hour:min:sec)
ECM: 6 t25 (B1=50k,B2=100B1+1) curves; with 30.3: 3:21:26; with 30.4: 2:37:30
P-1: B1=7M, B2=210M; with 30.3: 6:59:36; with 30.4: 5:05:28

Prime95 2020-12-31 23:09

[QUOTE=nordi;567846]I finally got the benchmarks done. For
ECM2=N/A,1,2,1619,-1,10000000,100000000,1,
which uses FMA3 FFT length 96 on my Ryzen 3950X with 16 cores, I had these results:
<snip>
So the new version is slightly slower for step 1 in this setup.[/QUOTE]

In 30.4 try reducing PracSearch. Here is the explanation from the next build's undoc.txt:

The ECM stage 1, the program examines several different Lucas-chains looking for the shortest.
For ECM on very small numbers, it may be beneficial to reduce the search effort as the
work saved is pretty small. For ECM on larger numbers, it might pay to increase the search
effort. I have not studied the optimal search effort, so the current default of 7 is a
complete guess. To change the search effort, add this to prime.txt:
PracSearch=n (default is 7)
Values from 1 to 50 are supported.

petrw1 2021-01-01 03:59

Looks like 30.4 Missed a P1 Factor
 
Interestingly this is the same exponent I noted in the RoundOff error a few posts back.
As well my PrimeNet appeared to have crashed last night. It was not running this AM.
But when I restarted it and it started at P2 0.00% and completed without crashing I assumed all would be well until I noticed it did NOT report a Factor.

This says with the bounds used below it should have been found.
[url]https://www.mersenne.ca/exponent/43012451[/url]

I'm going to run it again with the same parms and then once more with specific B1/B2.

[CODE][Main thread Dec 31 10:12] Mersenne number primality test program version 30.4
[Main thread Dec 31 10:12] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 4x256 KB, L3 cache size: 6 MB
[Main thread Dec 31 10:12] Starting workers.
[Comm thread Dec 31 17:35] Sending result to server: UID: petrw1/Rocky, M43012451 completed P-1, B1=815000, B2=81500000, Wi4: 5C397974
[Comm thread Dec 31 17:35]
[Comm thread Dec 31 17:35] PrimeNet error 40: No assignment
[Comm thread Dec 31 17:35] P-1 result for M43012451 was not needed
[Comm thread Dec 31 17:35] Done communicating with server.[/CODE]

[CODE][Dec 31 10:12] Worker starting
[Dec 31 10:12] Setting affinity to run worker on CPU core #1
[Dec 31 10:12]
[Dec 31 10:12] P-1 on M43012451 with B1=815000, B2=TBD
[Dec 31 10:12] Using AVX FFT length 2240K, Pass1=448, Pass2=5K, clm=4, 2 threads
[Dec 31 10:12] Setting affinity to run helper thread 1 on CPU core #2
[Dec 31 10:12] Available memory is 2000MB.
[Dec 31 10:12] D: 210, relative primes: 108, stage 2 primes: 2462241, pair%=59.97
[Dec 31 10:12] Using 1994MB of memory.
[Dec 31 10:13] Stage 2 init complete. 1332 transforms. Time: 39860.484 ms.
[Dec 31 10:13] M43012451 stage 2 is 0.000% complete.
[Dec 31 17:35] M43012451 stage 2 complete. 4286150 transforms. Time: 26552061.456 ms.
[Dec 31 17:35] Stage 2 GCD complete. Time: 12513.374 ms.
[Dec 31 17:35] M43012451 completed P-1, B1=815000, B2=81500000, Wi4: 5C397974
[/CODE]

petrw1 2021-01-01 16:36

Worked this time....
 
[QUOTE=petrw1;567921]Interestingly this is the same exponent I noted in the RoundOff error a few posts back.

I'm going to run it again with the same parms and then once more with specific B1/B2.
Same parms rerun found the factor this time,
Second rerun with specific B1/B2 was ignored.
[/QUOTE]

What I didn't notice when I posted that error yesterday it lists a B2=815000000=100xB1.
But when that same run had the roundoff error and then crashed overnight it had a B2=37490000=46xB1.
Seems after the crash it changed the B2 itself and in a confused state missed the factor.

On the rerun

[CODE][Main thread Dec 31 22:04] Mersenne number primality test program version 30.4
[Main thread Dec 31 22:04] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 4x256 KB, L3 cache size: 6 MB
[Comm thread Jan 1 06:07] Sending result to server: UID: petrw1/Rocky, M43012451 has a factor: 772533645156306046237663 (P-1, B1=815000, B2=33415000)
[Comm thread Jan 1 06:07]
[Comm thread Jan 1 06:07] PrimeNet error 40: No assignment
[Comm thread Jan 1 06:07] Factoring result for M43012451 was not needed
[Comm thread Jan 1 06:07] Done communicating with server.
[/CODE]

[CODE][Dec 31 22:04] Worker starting
[Dec 31 22:04] Setting affinity to run worker on CPU core #1
[Dec 31 22:04]
[Dec 31 22:04] P-1 on M43012451 with B1=815000, B2=TBD
[Dec 31 22:04] Using AVX FFT length 2240K, Pass1=448, Pass2=5K, clm=4, 2 threads
[Dec 31 22:04] Setting affinity to run helper thread 1 on CPU core #2
[Jan 1 00:45] M43012451 stage 1 complete. 2351934 transforms. Time: 9604623.566 ms.
[Jan 1 00:45] Stage 1 GCD complete. Time: 12391.663 ms.
[Jan 1 00:45] Available memory is 2000MB.
[Jan 1 00:45] With trial factoring done to 2^74, optimal B2 is 41*B1 = 33415000. If no prior P-1, chance of a new factor is 4.61%
[Jan 1 00:45] D: 210, relative primes: 108, stage 2 primes: 1990631, pair%=69.23
[Jan 1 00:45] Using 1994MB of memory.
[Jan 1 00:45] Stage 2 init complete. 1296 transforms. Time: 12671.548 ms.
[Jan 1 06:07] M43012451 stage 2 complete. 3181698 transforms. Time: 19318016.339 ms.
[Jan 1 06:07] Stage 2 GCD complete. Time: 11868.497 ms.
[Jan 1 06:07] P-1 found a factor in stage #2, B1=815000, B2=33415000.
[Jan 1 06:07] M43012451 has a factor: 772533645156306046237663 (P-1, B1=815000, B2=33415000)
[Jan 1 06:07]
[Jan 1 06:07] P-1 on M43012451 with B1=815000, B2=17300000
[Jan 1 06:07] Setting affinity to run helper thread 1 on CPU core #2
[Jan 1 06:07] Using AVX FFT length 2240K, Pass1=448, Pass2=5K, clm=4, 2 threads
[Jan 1 06:07] M43012451 already tested to B1=815000 and B2=33415000.[/CODE]

petrw1 2021-01-01 17:30

BTW
 
I wouldn't rule out hardware problems on my side.
This PC has a reputation of freezing or crashing a few times a year.

masser 2021-01-01 17:38

I confirmed that mprime finds that factor with a clean run on one of my systems:

[CODE][Fri Jan 1 10:21:21 2021]
P-1 found a factor in stage #2, B1=1000000, B2=18000000.
M43012451 has a factor: 772533645156306046237663 (P-1, B1=1000000, B2=18000000)[/CODE]


All times are UTC. The time now is 22:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.