Calculating optimal P1 memory
I searched around a while back and did not see an answer for this...
I know that the more memory that can be given to P1 stage 2, the better. What I was trying to figure or find, is there a formula or tool somewhere to: calculate how many relative primes P1 will process for a given set of parameters. On one machine I noticed that if I went from one amount of memory to another the number of primes jumped disproportionately. I would love to find out where the various break points are for various assignments. Then I could tune memory settings to be livable and productive. Also, if I knew that I could add 50 or 100 MB to the night setting and get a boost in the number of primes, that would be great. 
From [url]http://www.mersennewiki.org/index.php/BrentSuyama_extension[/url], it seems to me that the number of relative primes is due to the BrentSuyama extension e used, but I don't know exactly what memory equates to what e value.

I have been puzzled by this, too. Until recently, I had been allowing P95 to use 27000MB. Rather than doing 480 RP in a single pass, it would mostly do multipass on 960 RP. A change in memory usage led me to reduce the P95 allocation to 25500MB. Now it only does 960 RP occasionally, but it is still doing multipass at 480 RP. It seems that somewhere in there it would find it possible to do Stage 2 in a single pass.

[QUOTE=MiniGeek;340272]From [url]http://www.mersennewiki.org/index.php/BrentSuyama_extension[/url],[/quote]
That page was taken (with my permission) verbatum or nearly so from a post I made here some months ago. Frankly I'm surprised it hasn't been substantially revised by someone who knows more than I do. I am very far from being the most qualified person to have written it. [quote]it seems to me that the number of relative primes is due to the BrentSuyama extension e used, but I don't know exactly what memory equates to what e value.[/QUOTE] Yes and no. No because, as the wiki page says, the number of relative primes is computed from a different parameter, denoted by d, not from the e value. Yes because the program tries to optimise the choice of parameters, including d and e, subject to the maximum memory available. For fixed d, a higher value of e requires slightly more memory, so it can happen that there is enough memory available for a higher e, or a larger d, but not both. [QUOTE=kladner;340287]I have been puzzled by this, too. Until recently, I had been allowing P95 to use 27000MB. Rather than doing 480 RP in a single pass, it would mostly do multipass on 960 RP.[/quote] As is perhaps obvious, the explanation I gave for how the number of relative primes is calculated is a simplification, partly because my purpose was to explain the BrentSuyama extension, not to go into P95 internals, and partly because, although I know what the program does, I don't understand why. Specifically what the program does is choose d to be a small primorial (30, 210 or 2310) or a multiple of one of these values smaller than the next primorial. The number of relative primes is then 8, 48, or 480, or the corresponding multiple of one of these. But I don't understand why it would ever choose a multiple (other than the next primorial up). Apparently if it can manage to do so, then there is a slight improvement in speed, but I cannot for the life of me think of a reason why this should be so. [quote]A change in memory usage led me to reduce the P95 allocation to 25500MB. Now it only does 960 RP occasionally, but it is still doing multipass at 480 RP. It seems that somewhere in there it would find it possible to do Stage 2 in a single pass.[/QUOTE] For exponents in the current range for P1 assignments, I'm using 12107MB to do P1 stage 2 in one pass with 432 relative primes, E=12. 
[QUOTE=Mr. P1;340612]
But I don't understand why it would ever choose a multiple (other than the next primorial up). Apparently if it can manage to do so, then there is a slight improvement in speed, but I cannot for the life of me think of a reason why this should be so. [/QUOTE] In each pass, it is computing E^(k*d)^e  E^rp^e for all of the relative primes that its processing, then increasing k by 2 and repeating until k*d gets up to b2. It needs 2 * e transforms for each change of k, so larger d results in fewer "change of base" operations per pass. This gain is balanced against the possible increase in the number of passes and the associated initialization costs a larger d might require. The next primorial gives a rather dramatic increase in the number of relative primes, so sometimes the initialization costs for the increased number of passes outweighs the other advantages of the larger primorial. Clear as mud? 
Thanks to Mr. P1 and owftheevil for the information. I get at least a vague sense of what's going on. I can see that I left out a key variable in my previous account: the number of HighMem workers. At the moment this is five, which means that any particular run never comes close to having 12 gigabytes. They rarely exceed 6 GB. I do consistently come in at E=12, however.
Even with a total of 32 GB I can't really lock in 12 GB per worker without having to intervene once in a while to let the Stage 2's catch up, and I'm too lazy to mess with things that much. 
All times are UTC. The time now is 11:35. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2020, Jelsoft Enterprises Ltd.