20211105, 06:36  #12  
Jun 2003
2×3^{2}×293 Posts 
Quote:
The option is viable with as little as 8 temps = 5 bits per multiplication. Quote:


20211105, 14:41  #13  
"Lisander Viaene"
Oct 2020
Belgium
89 Posts 
Thank you for bringing this up techn1ciaN! And thank you kriesel for your insightful calculations. Although I'm having difficulty fully wrapping my head around your second post in this thread, I'll simply accept your title
Quote:
On the subject of P1 work, I've been learning a lot about best practices with this workload and I'm still trying to find the best configuration for my system: OS, software: Win64/Win11 and Prime95 v30.3b6 Processor: Intel Core i58400 (6 cores, 6 threads) currently set to 6 workers Memory: 4x8GB 2400MHz memory (32 GB) I have daytime & nighttime Stage2/ECM limits at respectively 13.2 & 18 GBs. (my OS + usual programs tend to use up to 10 GB and that leaves 8.8 GB of "gamesreserved" memory (or for other memory intensive programs) during the day, with the nighttime Stage2/ECM limit taking up 4.8 GB out of the "spare" memory. I have run timing 5760K FFT benchmarks with all possible worker variations of 6 cores and came up with the following values: 6 cores, 6 workers: anywhere between 121 and 132 iter/sec 6 cores, 3 workers: anywhere between 122 and 131 iter/sec 6 cores, 2 workers: anywhere between 121 and 132 iter/sec 6 cores, 1 worker: anywhere between 118 and 131 iter/sec TL;DR: I have six workers, one core/thread each, doing p1 with at least 2.2 GB memory allocated with a maximum of 18 GB allocated. Every P1 assignment takes approx. 39 hrs to complete. Phew. That was a mouthful. In readme.txt I found these values for ram usage with P1 under "Daytime and nighttime P1/ECM stage 2 memory" (perhaps these are deprecated, or certainly not correct anymore for this different tests_saved value given that lower tests_saved equals lower bounds thus lower ram usage(?) with 1 test saved versus 2 tests saved.) Exponent Minimum Reasonable Desirable     100000000 0.2GB 0.7GB 1.1GB 333000000 0.7GB 2.1GB 3.5GB Based on this table I decided that a 107M exponent would certainly not require more than 2.2 GB of memory for a "desirable" P1 test (this was before accounting for the tests_saved value of 1 instead of 2.) I have read before that higher ram allocation speeds up the P1 process, but I do not know the relation (or rather; difference in speed) between higher core count per worker and higher ram usage. My questions are the following: 1: Is my P1 workload understanding wrong in any of the above statements? (or are there any other mistakes present?) 2: Did I miss any important factors (pun intended) of doing P1? Are there other things I haven't accounted for that would impact my systems' ability to do 'optimal' P1? 3: Do I have too little minimum stage 2 memory allocated (2.2 GB) for FTCwavefront P1? 4: Would it be better for me to have 1 worker, 6 cores take up all of the stage 2 memory? And also, why? (I've been stuck on the reasoning for choosing 1, 2, 3 or 6 workers) Last fiddled with by lisanderke on 20211105 at 14:54 Reason: added OS and Prime95 build info. 

20211105, 14:51  #14  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·5·13·47 Posts 
Quote:
128 or 256 temporaries are multiple gigabytes. Gpuowl v7.x would fill a GPU's 8 or 16 GiB memory about as full as we let it with them. https://mersenneforum.org/showpost.p...30&postcount=2 Actually it looks like larger than p bits: https://mersenneforum.org/showpost.p...52&postcount=7 gives an example of 22 MB/buffer (176 Mbits) for 100M exponent. Mihai on stage 2 efficiency vs low or high memory amount: https://mersenneforum.org/showpost.p...6&postcount=31 There references "big" buffers 44MB each at wavefront at that time. ...oh, you're perhaps referring to how many bits of the P1 stage 1 power can be accomplished per verywide multiplication of saved big temporaries. Last fiddled with by kriesel on 20211105 at 15:37 

20211105, 15:34  #15  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
17DE_{16} Posts 
Quote:
Given your benchmark summary table, I'd be inclined to run 2 workers. Costs apparently little or no throughput, and gives much quicker latency than 6 workers, and reduces required disk space for work in progress. Also for P1 work it can allow more ram per worker for the same total allowed, when memoryhungry stage twos overlap in time, which will help P1 effectiveness. Next, if daytime and nighttime allowance were equalized, that would avoid restarts on allowance change twice daily for a bit more efficiency. (And in some versions, restart does not continue from the already completed point, but restarts stage 2 from the beginning, which can be a big net throughput loss.) If your system is not fully populated with ram, you may gain some more performance by adding more; prime95 is typically memory bandwidth bound, and using all the available memory channels helps. Use the reference info for additional background. Don't let concern about optimization get in the way of having fun. (When near the optimal, modest deltas on independent variables have little or no effect on the dependent variable being optimized. Layman's version of at the optimal point, the partial derivative is zero, and near there, it's still small.) Last fiddled with by kriesel on 20211105 at 15:45 

20211105, 16:53  #16 
Jun 2003
2×3^{2}×293 Posts 

20211105, 18:35  #17 
May 2011
Orange Park, FL
2^{2}×3^{2}×5^{2} Posts 

20211105, 19:14  #18 
"Lisander Viaene"
Oct 2020
Belgium
1011001_{2} Posts 

20211105, 19:26  #19 
If I May
"Chris Halsall"
Sep 2002
Barbados
3×43×79 Posts 
I would like to support this idea. Also, could we please be able to set the value beyond 9.0?
/Some/ of us are doing things with mprime which "don't make sense". But, it's our compute. This wouldn't be as much of an issue if (for example) Pminus1=1,2,15458087,1,2000000,28800000,72 lines worked with the Primenet API. But, they don't... 
20211105, 19:29  #20  
"Lisander Viaene"
Oct 2020
Belgium
89 Posts 
Quote:


20211105, 20:34  #21 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1011111011110_{2} Posts 

20211105, 20:48  #22 
Mar 2019
USA
73 Posts 
This has been a very interesting thread, even though the majority of it is out of my realm of comprehension. I've read the description of the tests_saved parameter, but I still don't quite understand how setting the number of future primality tests saved if a factor is found corresponds with the P1 test completing faster? Is there a layman's answer to that?

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
How to set 2^78 as default on trial factoring  piforbreakfast  PrimeNet  14  20210324 20:54 
How to change default job type  piforbreakfast  Information & Answers  2  20210308 13:30 
Bootable Standalone Prime95?  ant  Software  9  20160727 16:45 
Default ECM assignments  lycorn  PrimeNet  9  20150109 16:32 
Search default (threads or posts)  schickel  Forum Feedback  15  20090405 14:50 