mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2011-12-14, 17:10   #12
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3·5·313 Posts
Default A few more observations

I have a couple PCs running P-1 on all 4 cores. The better architected one (i5-750) handles 3 highmemworkers quite well. The older (Q9550) not so well; I generally limit it to 2 HMWs; ocassionally 3 to try to limit the backlog but it slows noticeably. Because Stage 2 is about 50% longer than stage 1, with 3 out of 4 HMWs I have yet to see a worker have to skip ahead to stage 1 on more than 1 extra exponent before a HMW becomes available.

Observation 1: With only 2 out of 4 HMW its an entirely different story. Stage 2 does fall behind; but what I find curious (and curious means an opportunity for future versions to tweak the algorithm) is that the falling behind is NOT evenly distributed. I find that workers 3 and 4 fall behind more than workers 1 and 2. In fact right now on my Q9550 worker 4 is doing stage 1 on the 6th exponent in worktodo.txt while the other 3 workers are all caught up.

Another related observation (curiosity). When I tried varying the HMW based on time of day (i.e. 2 during the day and 3 at night); every day when "DAY" time was met one of the workers had to be stopped. It seemed to be fairly random which HMW was stopped but when "NIGHT" time was met it favored restarting the lower worker numbers first; so again workers 3 and 4 fell behind.

Third observation: I no longer use the Memory= parm in local.txt for each worker because I don't want to leave RAM unused when there are less HMW than expected. In this situation another "dance of the HighMemWorkers" happens. More like a Salsa.
1. Worker 1 completes stage 1 and starts stage 2. It takes all 2400MB and processes about 100 relative primes.
2. Worker 2 completes stage 1 and starts stage 2. Prime95 takes half of the RAM from worker 1 and each workers gets 1200MB and process 50 primes each...so far so good.
3. Worker 3 completes stage 1 but cannot start stage 2 because HMW=2 so it goes on to stage 1 on exponent 2.
4. Worker 1 gets to the last 30 relative primes and releases RAM; worker 2 grabs it (but not immediately - see fifth below) and now processes 70 relative primes.
5. Worker 1 completes stage 2 and moves on to stage 1 of exponent 2.
6. Worker 3 immediately restarts and goes back to do stage 2 of exponent 1. But only has enough RAM available to do 30 primes. Prime95 does NOT stop worker 2 to redistribute the RAM.
Note how quickly the RAM allocation imbalance can happen....When I have had 3 HMW I have noticed 1 worker drop to as low as 8 relative primes while another may be processing 50 or more.
And why might we care? The term "knee-of-the-curve" comes to mind here. I'm not sure where the knee is but I notice it exists. What I am trying to say is that with more RAM a worker can process more "relative primes". It is very apparent that the more that are processed at one time the less overall time stage 2 takes; it is NOT linear. For example: right now my two HMW are as follows:
1. Processing 56 relative primes (out of 480): 136 minutes. 480/56*136/60 = 19.42 hours (with several events that will impact the time)
2. Processing 20 relative primes (out of 480): 64 minutes. 480/20*64/60 = 25.6 hours.

Fourth observation: When I have had 2 HMW during the day and 3 at night, the minute "DAY" time was met a worker was stopped. However, the RAM freed up by this worker was NOT immediately grabbed by one of the remaining 2 HMW even though it is knows that doing so will speed up that worker. (See fifth below)

Fifth observation: When "NIGHT" time was met it did NOT immediately start the 3rd HMW up again; I recall it was not until one of the HMWs completed a batch of relative primes: then it was stopped and half of its RAM given to the 3rd HMW in waiting.

Last fiddled with by petrw1 on 2011-12-14 at 17:12 Reason: spelling
petrw1 is online now   Reply With Quote
Old 2011-12-14, 17:53   #13
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Keep in mind that moving the RAM around is not as easy as it seems. To do just Stage 2 initialization, just one HMW, takes 8 minutes for me, so readjusting the RAM every hour or more might cause more delay than it's worth. Obviously I don't think so practically, but it's something to keep in mind.
Dubslow is offline   Reply With Quote
Old 2011-12-22, 15:08   #14
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3·5·313 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Observation 1: With only 2 out of 4 HMW its an entirely different story. Stage 2 does fall behind; but what I find curious (and curious means an opportunity for future versions to tweak the algorithm) is that the falling behind is NOT evenly distributed. I find that workers 3 and 4 fall behind more than workers 1 and 2. In fact right now on my Q9550 worker 4 is doing stage 1 on the 6th exponent in worktodo.txt while the other 3 workers are all caught up.
Follow up...once Worker 4 does fairly get a Stage 2 turn (after going 9 deep in Stage 1's) it keeps it until the Stage 2 is all caught up.
And as expected Worker 3 started to get lost in Stage 1 land.

That is, unless the Workers are restarted for some reason, then Workers 1 and 2 take the Stage 2 work back.

Last fiddled with by petrw1 on 2011-12-22 at 15:09
petrw1 is online now   Reply With Quote
Old 2011-12-23, 17:24   #15
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3×5×313 Posts
Default The extreme on the low end....

Just by the luck of the draw one P-1 Stage 2 worker got to the point of processing the last 1 of 480 primes....and it took 22 minutes...over a week at this rate
This same core completed the penultimate 25 primes in about 75 minutes...24 hours at this rate.
petrw1 is online now   Reply With Quote
Old 2011-12-29, 10:14   #16
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

I have 2x spare CPU cores in my farm now.

Converting them to P-1.

Not going to worry about the P-1 dance. Setup 2x workers, affinity on the spare cores and set 1.5GB ram each worker, 3.5GB total. I thought it might be wise to allow a bit of headroom.

-- Craig
nucleon is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 05:57.


Fri Aug 6 05:57:39 UTC 2021 up 14 days, 26 mins, 1 user, load averages: 3.04, 3.33, 3.19

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.