mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
Thread Tools
Unread 2021-07-21, 00:58   #23
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

54 Posts
Default

Quote:
Originally Posted by masser View Post
Are you referring to the P-1 column here?

I tried looking at the poorly P-1'ed exponents and the remaining exponents in that range appear to have B1 >= 200K.

I'm guessing those stalled exponents in the P-1 column must be some kind of server hiccup?
According to this report from mersenne.ca here, there seem to really be exponents which don't have P-1 done. It does explain the 100M to 104M ranges, but it doesn't explain the 94M to 99M ranges. These may be caused by too small bounds used, thus the server says they need more.
Viliam Furik is online now   Reply With Quote
Unread 2021-07-21, 02:16   #24
masser
 
masser's Avatar
 
Jul 2003
wear a mask

167610 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
According to this report from mersenne.ca here, there seem to really be exponents which don't have P-1 done. It does explain the 100M to 104M ranges, but it doesn't explain the 94M to 99M ranges. These may be caused by too small bounds used, thus the server says they need more.
Ah, that's a little helpful. I found this slice in 100M:
Attached Thumbnails
Click image for larger version

Name:	Screen Shot 2021-07-20 at 8.15.16 PM.png
Views:	33
Size:	41.7 KB
ID:	25314  
masser is offline   Reply With Quote
Unread 2021-07-21, 08:39   #25
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

6608 Posts
Default

Quote:
Originally Posted by kriesel View Post
Accuracy matters. I believe "all" is false, contrary to what we see and measure and document in actual operation. The numbers I used in the example were from memory on a 64GB ram dual-8core-Xeon system when it was configured for 4 workers for an extended period. 300GB to one 103M P-1 stage2 is a sign of probable misconfiguration, given how little the minimum and desirable values are documented to be, and how far out on diminishing returns that 300GB/103M is likely to be.
I don't believe the 103M P-1 stage2 use of 300 GB is a sign of misconfiguration. The P-1 factoring has been changed in version 30.6, and it will now use a lot more RAM. The mprime 30.6b4 text-based interface allows me to set the maximum RAM between 0.000000 and 338.915039 GB - see below.
Code:
Daytime P-1/P+1/ECM stage 2 memory in GB (0.300000): 380
Please enter a value between 0.000000 and 338.915039. 338
Nighttime P-1/P+1/ECM stage 2 memory in GB (0.300000): 380
Please enter a value between 0.000000 and 338.915039. 338
I contacted George to ask if it was possible to set a value above 338.915039 GB. He wrote:

Yes, simply edit local.txt and restart mprime.

The restriction is there to prevent users from creating a RAM thrashing situation.

I'm sure George would not have told me how to set a limit higher than 338.915039 GB if that would have misconfigured the software.
drkirkby is offline   Reply With Quote
Unread 2021-07-21, 13:22   #26
axn
 
axn's Avatar
 
Jun 2003

13DE16 Posts
Default

Quote:
Originally Posted by kriesel View Post
The more I look at it the less sure I am of what you meant. I ran a test with no per-worker availability limits, only an overall. When the second worker started P-1 stage 2, the first stopped, and reset itself to use less memory, BUT did not lose its previous 9% progress.
I did not mean to imply that it would lose the progress. But, phew, I wasn't imagining things. That's a relief.

Quote:
Originally Posted by kriesel View Post
At 56000M, one worker is doing 0.608%/99.5 seconds ~ .00611%/second; ~16367. seconds to perform.
48 seconds are consumed in the transition of the first worker to smaller allowed memory, and setup of the second worker's stage 2.
At 28000M, the same worker is doing 0.573%/98 seconds ~ .00585%/second; ~17094. seconds to perform.
The difference is 0.000263%/second.
I am not sure the projections are accurate. For the first one, if we look at:
Code:
[Jul 20 13:27:30] Stage 2 init complete. 12157 transforms. Time: 98.042 sec.
...
[Jul 20 13:52:27] M105207229 stage 2 is 9.08% complete. Time: 98.176 sec.
24:57 for 9.08% gives 16486 seconds for the whole.
Then:
Code:
[Jul 20 13:53:16] M105207229 stage 2 is 9.09% complete.
[Jul 20 13:59:47] M105207229 stage 2 is 11.39% complete. Time: 98.387 sec.
6:31 for 2.30% = 17000 sec.

But let's say your numbers are accurate. You say that the other worker which did whole stage 2 with reduced memory would finish in 16600. Did it by any chance use a smaller B2? If not, then you should consider the increased projected time for the first one as a penalty for restarting with reduced memory -- a penalty which you wouldn't have paid had you run the whole test at reduced memory. Also a quick sanity check: 50% @ 16300 rate and 50% @ 17100 rate is very roughly 16700 rate. So not sure how you say that you will gain time by allowing it to run at full memory for half the time.

Anyway, I stand by my original point. If you're going to allocate all the memory to a single worker, you should ensure that no other worker enters stage 2. Otherwise, limit the per-worker memory. That way, you don't have to worry about all the penalties and stuff.
axn is offline   Reply With Quote
Unread 2021-07-21, 14:38   #27
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

24·33 Posts
Default

Quote:
Originally Posted by axn View Post
Anyway, I stand by my original point. If you're going to allocate all the memory to a single worker, you should ensure that no other worker enters stage 2. Otherwise, limit the per-worker memory. That way, you don't have to worry about all the penalties and stuff.
I'm sure that's good advice. If systems start running out of RAM, they will probably swap and grind to a halt or terminate the program. But given
  1. I have a fair amount of RAM
  2. I like to experiment
I thought I'd try to see what happen if I ignored that. I have set Memory=376832 in local.txt (368 GB), and no limits on individual workers. Then I reserved 4 exponents which don't have P-1 done. So now I'm running PRP tests on M105213601, M105212323, M105212539 and M105212563. All have started the first stage, but none in the second stage.
Code:
[Worker #1 Jul 21 15:09] Optimal P-1 factoring of M105213601 using up to 376832MB of memory.
[Worker #1 Jul 21 15:09] Assuming no factors below 2^76 and 1 primality test saved if a factor is found.
[Worker #1 Jul 21 15:09] Optimal bounds are B1=434000, B2=21338000
[Worker #1 Jul 21 15:09] Chance of finding a factor is an estimated 3.6%

[Worker #2 Jul 21 15:09] Optimal P-1 factoring of M105212323 using up to 376832MB of memory.
[Worker #2 Jul 21 15:09] Assuming no factors below 2^76 and 1 primality test saved if a factor is found.
[Worker #2 Jul 21 15:09] Optimal bounds are B1=434000, B2=21338000
[Worker #2 Jul 21 15:09] Chance of finding a factor is an estimated 3.6%

[Worker #3 Jul 21 15:09] Optimal P-1 factoring of M105212539 using up to 376832MB of memory.
[Worker #3 Jul 21 15:09] Assuming no factors below 2^76 and 1 primality test saved if a factor is found.
[Worker #3 Jul 21 15:09] Optimal bounds are B1=434000, B2=21338000
[Worker #3 Jul 21 15:09] Chance of finding a factor is an estimated 3.6%

[Worker #4 Jul 21 15:09] Optimal P-1 factoring of M105212563 using up to 376832MB of memory.
[Worker #4 Jul 21 15:09] Assuming no factors below 2^76 and 1 primality test saved if a factor is found.
[Worker #4 Jul 21 15:09] Optimal bounds are B1=434000, B2=21338000
[Worker #4 Jul 21 15:09] Chance of finding a factor is an estimated 3.6%
Of course, if all 4 workers try to use 368 GB RAM that would exhaust my 384 GB RAM many times over. The exponents are fairly similar in value, and have determined B1's and B2's which are identical, so I would expect all 4 workers to try to get in stage 2 at the same time.

Any thoughts about what will happen when they enter stage 2? I will let you know later!

Last fiddled with by drkirkby on 2021-07-21 at 14:40
drkirkby is offline   Reply With Quote
Unread 2021-07-21, 14:49   #28
axn
 
axn's Avatar
 
Jun 2003

508610 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Any thoughts about what will happen when they enter stage 2? I will let you know later!
The first worker (even if by mere seconds) will start with all the memory. Then, as the second worker comes along, first will stop and restart with reduced memory; second will grab the remaining. When the third one comes ... Ok, I don't know. Maybe 1st and 2nd will stop and restart with one-third?

Where's the popcorn smiley
axn is offline   Reply With Quote
Unread 2021-07-21, 15:58   #29
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Code:
[Worker #4 Jul 21 15:09] Optimal P-1 factoring of M105212563 using up to 376832MB of memory.
[Worker #4 Jul 21 15:09] Assuming no factors below 2^76 and 1 primality test saved if a factor is found.
[Worker #4 Jul 21 15:09] Optimal bounds are B1=434000, B2=21338000
[Worker #4 Jul 21 15:09] Chance of finding a factor is an estimated 3.6%
Those are astoundingly modest bounds for so much memory available! Neither the selected B1 nor B2 meet either PrimeNet or GPU72 row recommendations. https://www.mersenne.ca/exponent/105212563

Re George's response on how to set >90% of ram as available,
IIRC it's improper to quote a PM without consent, and SOP to indicate consent to quote in the same post if it was given.
George gives the user the benefit of the doubt as to knowing what they are doing, or learning, or supporting experimentation.
He also supports, and codes for, many usages that are not optimal.
That is I believe a project-level optimization.
Happier users are more likely to remain engaged and apply more cycles of more hardware to the overall project progress.


Quote:
Originally Posted by axn View Post
The first worker (even if by mere seconds) will start with frequently nearly all the memory. Then, as the second worker comes along, first will stop and restart with reduced memory; second will grab usually nearly all the remaining.
FTFY.
I just conducted the following experiment in v30.6b4 prime95. Remove all per-worker memory caps from local.txt. Restart prime95 with only its global memory cap 110GB. Only one of 4 workers is running P-1.
Code:
[Jul 21 11:04] Waiting 15 seconds to stagger worker starts.
[Jul 21 11:04] Worker starting
[Jul 21 11:04] Setting affinity to run worker on CPU core #13
[Jul 21 11:04] Optimal P-1 factoring of M105198883 using up to 112640MB of memory.
[Jul 21 11:04] Assuming no factors below 2^76 and 2 primality tests saved if a factor is found.
[Jul 21 11:04] Optimal bounds are B1=882000, B2=50252000
[Jul 21 11:04] Chance of finding a factor is an estimated 4.61%
[Jul 21 11:04] 
[Jul 21 11:04] Setting affinity to run helper thread 1 on CPU core #14
[Jul 21 11:04] Setting affinity to run helper thread 2 on CPU core #15
[Jul 21 11:04] Using AVX FFT length 5600K, Pass1=896, Pass2=6400, clm=1, 4 threads
[Jul 21 11:04] Setting affinity to run helper thread 3 on CPU core #16
[Jul 21 11:04] Ignoring suggested B1 value, using B1=857000 from the save file
[Jul 21 11:04] Ignoring suggested B2 value, using B2=44190000 from the save file
[Jul 21 11:04] Available memory is 111776MB.
[Jul 21 11:04] Using 32763MB of memory.
Note in this case same per-worker ram usage as before, same bounds as before. A good thing since that stage 2 is ~87% complete at the change; would not want to lose the previous progress. IIRC that also works in the other direction, when ram/worker gets reduced.
CUDAPm1 and GpuOwl also resist bounds changes midstream.

Getting initially or later, exactly optimal bounds is nice but not very important. It's important to remember that
a) the differing bounds for considerably different memory allocations are not that far apart on a log scale,
b) the slopes (partial derivative of run time vs B1 or B2, probability of factor found, probable net time savings) are small near a given optimal for given parameters
c) it only affects stage 2 of P-1, which is ~1/60 of a P-1/PRP combination.
So the memory change effect is ~ (small delta)3.

I have a heterogenous 100M / 100Mdigit combination queued up for test.

Last fiddled with by kriesel on 2021-07-21 at 16:50
kriesel is offline   Reply With Quote
Unread 2021-07-21, 16:32   #30
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

24×33 Posts
Default

Okay, this is what happened ...Worker 2 got completed first, and grabbed 207828 MB (203 GB) RAM
Code:
[Worker #2 Jul 21 15:42] Stage 1 GCD complete. Time: 46.141 sec.
[Worker #2 Jul 21 15:42] Available memory is 376632MB.
[Worker #2 Jul 21 15:42] D: 2310, relative primes: 4743, stage 2 primes: 1313481, pair%=94.34
[Worker #2 Jul 21 15:42] Using 207828MB of memory.
I've known workers to use about 300 GB RAM before, but that is probably because others had B2 around 50 million, as the server assignment said the P-1 saves 2 tests. I assumed 1 test would be saved, so that possibly explains why less RAM was used.

Next worker 1 grabbed 168867 MB (165 GB) of RAM, which is 99.98% of the reaming available memory.
Code:
[Worker #1 Jul 21 15:42] Stage 1 GCD complete. Time: 45.777 sec.
[Worker #1 Jul 21 15:42] Available memory is 168900MB.
[Worker #1 Jul 21 15:42] D: 1470, relative primes: 3853, stage 2 primes: 1313481, pair%=95.43
[Worker #1 Jul 21 15:42] Using 168867MB of memory.
Worker 2, followed by worker 1, restarted with new (reduced) memory settings
Code:
[Worker #2 Jul 21 15:43] Restarting worker with new memory settings.
[Worker #1 Jul 21 15:43] Restarting worker with new memory settings.
So the final memory usage of the workers is
Code:
[Worker #2 Jul 21 15:43] Using 82273MB of memory.
[Worker #3 Jul 21 15:43] Using 125569MB of memory.
[Worker #1 Jul 21 15:43] Using 70802MB of memory.
[Worker #4 Jul 21 15:43] Using 98163MB of memory.
Unfortunately none of the workers reported the percentage of any work completed until after they had fought over the RAM, so I can not determine whether restarting with the new memory settings resulted in much loss of work or not. I will need to repeat this again, but next time letting a third worker try to enter stage 2 when two workers have already grabbed all the available memory and reported the percentage of stage 2 they have completed.
Code:
[Worker #2 Jul 21 16:43] M105212323 stage 2 is 81.24% complete. Time: 576.928 sec.
[Worker #1 Jul 21 16:47] M105213601 stage 2 is 80.33% complete. Time: 613.766 sec.
[Worker #3 Jul 21 16:49] M105212539 stage 2 is 96.66% complete. Time: 534.885 sec.
[Worker #4 Jul 21 16:49] M105212563 stage 2 is 95.41% complete. Time: 539.106 sec
.
Note after the RAM fights, worker 3 ended up with the most RAM and worker 1 the least. Just as they are near the end of completing stage 2, one can see the percentage completed is highest for the worker with the most RAM, and lowest for the worker with the least RAM.
Code:
[Worker #3 Jul 21 16:52] Stage 2 GCD complete. Time: 47.464 sec.
[Worker #4 Jul 21 16:53] Stage 2 GCD complete. Time: 46.851 sec.
[Worker #2 Jul 21 16:57] Stage 2 GCD complete. Time: 48.068 sec.
[Worker #1 Jul 21 17:02] Stage 2 GCD complete. Time: 47.487 sec.
Finally the P-1 has finished, with the worker with the most RAM taking 10 minutes less than the worker with the least RAM. So not really much difference when one considers the PRP test will in total take about 4 days.

All workers had the same number of cores (13).

Last fiddled with by drkirkby on 2021-07-21 at 16:37
drkirkby is offline   Reply With Quote
Unread 2021-07-21, 17:57   #31
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

6608 Posts
Default

Quote:
Originally Posted by kriesel View Post
Those are astoundingly modest bounds for so much memory available! Neither the selected B1 nor B2 meet either PrimeNet or GPU72 row recommendations. https://www.mersenne.ca/exponent/105212563
I did not specify the bounds! I only specified that one primality test would be saved. mprime calculated the bounds. The P-1 factorings show in my results as 9.0809 GHz days of credit. George has said that the P-1 factoring had changed significantly in version 30.6, so I'm not sure how much one can trust the online calculators.

Looking at the data from
https://www.mersenne.ca/exponent/105212563
I would intuitively think it is better to spend 9.3765 GHz days to get a 5.3973% chance of finding a factor, rather than 9.0809 GHz days for the 3.6% chance reported by mprime. However, mprime has changed the P-1 factoring a lot in the latest version, so I don't know if it is valid to compare with the data at https://www.mersenne.ca/ any more.
drkirkby is offline   Reply With Quote
Unread 2021-07-21, 18:59   #32
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I did not specify the bounds!
I did not claim that. Only commented that the bounds seemed quite low to me for all that ram available.
kriesel is offline   Reply With Quote
Unread 2021-07-21, 19:49   #33
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

263616 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
According to this report from mersenne.ca here, there seem to really be exponents which don't have P-1 done. It does explain the 100M to 104M ranges, but it doesn't explain the 94M to 99M ranges. These may be caused by too small bounds used, thus the server says they need more.
Some empirical data, if I may...

I've had my CPU Colab workers clearing out candidates which didn't have a P-1 done before the FTC. A few recent examples are 104510041, 104509267 and 104180299

Interestingly, these were all "cleared" by way of C-PRP. But, at the same time, the column showing "Available: P-1" for 104M hasn't changed for days. So, empirically, those candidates I'm doing P-1 on are not the ones included in this column.

Any insight the Primenet "gods" might be able to provide would be appreciated. My OCD doesn't like having values in that column before the Cat 0 wavefront...
chalsall is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
P-1 on small exponents markr PrimeNet 18 2009-08-23 17:23
Large small factor Zeta-Flux Factoring 96 2007-05-14 16:59
Problems with Large FFT but not Small FFT's? RichTJ99 Hardware 2 2006-02-08 23:38
Small range with high density of factors hbock Lone Mersenne Hunters 1 2004-03-07 19:51
Small win32 program, range of time to do a TF dsouza123 Programming 1 2003-10-09 16:04

All times are UTC. The time now is 10:21.


Fri Aug 6 10:21:15 UTC 2021 up 14 days, 4:50, 1 user, load averages: 4.00, 3.87, 3.85

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.