![]() |
|
|
#12 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
3·5·313 Posts |
I have a couple PCs running P-1 on all 4 cores. The better architected one (i5-750) handles 3 highmemworkers quite well. The older (Q9550) not so well; I generally limit it to 2 HMWs; ocassionally 3 to try to limit the backlog but it slows noticeably. Because Stage 2 is about 50% longer than stage 1, with 3 out of 4 HMWs I have yet to see a worker have to skip ahead to stage 1 on more than 1 extra exponent before a HMW becomes available.
Observation 1: With only 2 out of 4 HMW its an entirely different story. Stage 2 does fall behind; but what I find curious (and curious means an opportunity for future versions to tweak the algorithm) is that the falling behind is NOT evenly distributed. I find that workers 3 and 4 fall behind more than workers 1 and 2. In fact right now on my Q9550 worker 4 is doing stage 1 on the 6th exponent in worktodo.txt while the other 3 workers are all caught up. Another related observation (curiosity). When I tried varying the HMW based on time of day (i.e. 2 during the day and 3 at night); every day when "DAY" time was met one of the workers had to be stopped. It seemed to be fairly random which HMW was stopped but when "NIGHT" time was met it favored restarting the lower worker numbers first; so again workers 3 and 4 fell behind. Third observation: I no longer use the Memory= parm in local.txt for each worker because I don't want to leave RAM unused when there are less HMW than expected. In this situation another "dance of the HighMemWorkers" happens. More like a Salsa. 1. Worker 1 completes stage 1 and starts stage 2. It takes all 2400MB and processes about 100 relative primes. 2. Worker 2 completes stage 1 and starts stage 2. Prime95 takes half of the RAM from worker 1 and each workers gets 1200MB and process 50 primes each...so far so good. 3. Worker 3 completes stage 1 but cannot start stage 2 because HMW=2 so it goes on to stage 1 on exponent 2. 4. Worker 1 gets to the last 30 relative primes and releases RAM; worker 2 grabs it (but not immediately - see fifth below) and now processes 70 relative primes. 5. Worker 1 completes stage 2 and moves on to stage 1 of exponent 2. 6. Worker 3 immediately restarts and goes back to do stage 2 of exponent 1. But only has enough RAM available to do 30 primes. Prime95 does NOT stop worker 2 to redistribute the RAM. Note how quickly the RAM allocation imbalance can happen....When I have had 3 HMW I have noticed 1 worker drop to as low as 8 relative primes while another may be processing 50 or more. And why might we care? The term "knee-of-the-curve" comes to mind here. I'm not sure where the knee is but I notice it exists. What I am trying to say is that with more RAM a worker can process more "relative primes". It is very apparent that the more that are processed at one time the less overall time stage 2 takes; it is NOT linear. For example: right now my two HMW are as follows: 1. Processing 56 relative primes (out of 480): 136 minutes. 480/56*136/60 = 19.42 hours (with several events that will impact the time) 2. Processing 20 relative primes (out of 480): 64 minutes. 480/20*64/60 = 25.6 hours. Fourth observation: When I have had 2 HMW during the day and 3 at night, the minute "DAY" time was met a worker was stopped. However, the RAM freed up by this worker was NOT immediately grabbed by one of the remaining 2 HMW even though it is knows that doing so will speed up that worker. (See fifth below) Fifth observation: When "NIGHT" time was met it did NOT immediately start the 3rd HMW up again; I recall it was not until one of the HMWs completed a batch of relative primes: then it was stopped and half of its RAM given to the 3rd HMW in waiting. Last fiddled with by petrw1 on 2011-12-14 at 17:12 Reason: spelling |
|
|
|
|
|
#13 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
Keep in mind that moving the RAM around is not as easy as it seems. To do just Stage 2 initialization, just one HMW, takes 8 minutes for me, so readjusting the RAM every hour or more might cause more delay than it's worth. Obviously I don't think so practically, but it's something to keep in mind.
|
|
|
|
|
|
#14 | |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
3·5·313 Posts |
Quote:
And as expected Worker 3 started to get lost in Stage 1 land. That is, unless the Workers are restarted for some reason, then Workers 1 and 2 take the Stage 2 work back. Last fiddled with by petrw1 on 2011-12-22 at 15:09 |
|
|
|
|
|
|
#15 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
3×5×313 Posts |
Just by the luck of the draw one P-1 Stage 2 worker got to the point of processing the last 1 of 480 primes....and it took 22 minutes...over a week at this rate
This same core completed the penultimate 25 primes in about 75 minutes...24 hours at this rate. |
|
|
|
|
|
#16 |
|
Mar 2003
Melbourne
5×103 Posts |
I have 2x spare CPU cores in my farm now.
Converting them to P-1. Not going to worry about the P-1 dance. Setup 2x workers, affinity on the spare cores and set 1.5GB ram each worker, 3.5GB total. I thought it might be wise to allow a bit of headroom. -- Craig |
|
|
|