mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2011-04-19, 16:59   #1
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2×7×331 Posts
Default Observations with MaxHighMemWorkers

I have a Dual Core with both cores doing ECM-Fermat.
I am running 64-Bit Prime95 26.5 Build 5 on Windows 7.
Memory 1600Mb day and night.
Memory=800Mb for each worker in Local.txt
This is the first time I have tried the MaxHighMemWorkers parm; I have it set to 1.

I have hilighted the observations I find "curious".

Both workers start ECM-F stage 1. All is well.

Eventually workers 1 gets to stage 2 while the other is still on stage 1. All is well.

Then worker 2 completes stage 1 and wants to go to stage 2 while the first worker is still on stage 2. Message says looking for work that requires less Memory. All is well subject to the following:
- If the next assignment for worker 2 is ECM-F on the same exponent it skips past it until it finds an ECM-F for a different exponent.
- The workload report on the server will show the same progress on the exponent abandoned and the one skipped. i.e. C1S2 0.0%. This may correct itself, or just change to something more curious with the next update.

Worker 1 completes stage 2 and goes on to the next exponent back on stage 1. Immediately worker 2 stops the new assignment and goes back to stage 2 on the exponent if abandoned earlier. All is well.

Worker 1 completes stage 1 while worker 2 is still on stage 2. Worker 1 now looks for a new exponent at stage 1 --- with the same curiousities reported above for worker 2 looking for new work.

The PC requires a reboot. Just before the reboot:
- worker 1 is on the new assignment stage 1 with an earlier exponent waiting for stage 2.
- worker 2 is on stage 2
After the reboot:
- worker 1 reverts to the first exponent in stage 2
- worker 2 skips forward to another exponent in stage 1
I send new expected completion dates to the server and it reports worker 2 exponent in stage 2 as having no progress.

So now I am nervous that the software forgot that server 2 was almost done stage 2. So I stopped prime95 and reversed the assignments for the workers in worktodo.txt and restarted prime95. Now as expected worker 1 continues stage 2 on what was the worker 2 exponent (whew no work lost - silly me); and worker 2 continues on what was worker 1's stage 1 work. I resend completion dates and the progerss changes again. My best summation is that work on any exponent currently active by either worker (whether or not it is the first in worktodo) reports correctly. Any others seem to be hit-and-miss. That is, I haven't determined the pattern yet.

By the way I noticed this same unpredicatable (to me anyway) pattern of progress on the workload report in previous versious of prime95 when both cores are doing ECM-F without MaxHighMemWorkers as well.

NB: None of these observations point to missed work or rework on any assignment; only in the way assignments are picked (i.e. skipping past exponents of the same value) or the way progess is reported. Oh, and what ever is reported on the servers seems to match the status window on the GUI as well.

Last fiddled with by petrw1 on 2011-04-19 at 17:00
petrw1 is offline   Reply With Quote
Old 2011-04-19, 19:56   #2
Mr. P-1
 
Mr. P-1's Avatar
 
Jun 2003

7·167 Posts
Default

I have also noticed similar behaviour with ECM-M. I can think of two reasons for it.

1. If a factor is found, does not Prime 95 finish with that exponent even if additional curves are assigned? If so, then it makes sense not to start another curve while the first is pending.

2. Perhaps the save file format cannot handle multiple curves.
Mr. P-1 is offline   Reply With Quote
Old 2011-04-20, 00:25   #3
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

769210 Posts
Default

Quote:
Originally Posted by petrw1 View Post
- If the next assignment for worker 2 is ECM-F on the same exponent it skips past it until it finds an ECM-F for a different exponent.
What's wrong with that? It can't proceed with the ECM it's already started on that exponent -- why shouldn't it go to a different exponent while the first one is blocked?
Quote:
I am nervous that the software forgot that server 2 was almost done stage 2.
Why? There's nothing wrong.

Both workers have an unfinished pending stage 2. At the restart, prime95 looks for work for worker 1 first. It finds the unfinished stage 2 for worker 1, so it resumes that.

Then prime95 looks for work for worker 2. The first work is finds is an unfinished stage 2, but it can't start that while worker 1 is doing a stage 2, so it goes on to find some different work for worker 2.

All perfectly normal.

What do you expect to happen?

a) That both workers resume a stage 2, violating MaxHighMemWorkers = 1?

Then, why should it ignore your specific instruction not to do more than one high-mem job at a time?

b) That worker 2 should resume its stage 2 instead of worker 1 resuming its stage 2?

Why? They can't both do that, so one of them has to have priority, so prime95 gives priority to worker 1. What's wrong with that?

If you want worker 2 to have priority over worker 1, then just swap their worktodos so that worker 1 will do what worker 2 would have done. What's such a big deal about worker #1 being given first choice?

Quote:
So I stopped prime95 and reversed the assignments for the workers in worktodo.txt and restarted prime95. Now as expected worker 1 continues stage 2 on what was the worker 2 exponent (whew no work lost - silly me); and worker 2 continues on what was worker 1's stage 1 work.
... exactly, precisely as I suggested and is perfectly normal.

Why do you consider worker 2's unfinished stage 2 assignment more important than worker 1's unfinished stage 2 assignment?

Both workers had stage 2 work pending -- why is it important for worker 2 to resume its stage 2 work before worker 1 resumes its stage 2 work?

After the swap, why weren't you concerned that the software might have forgotten the unfinished stage 2 that was formerly assigned to worker 1 but now assigned to worker 2? It's just the mirror image of what you had before the swap -- two unfinished stage 2, only with swapped worker numbers -- so why didn't you have the mirror image of your first concern?

Last fiddled with by cheesehead on 2011-04-20 at 00:36
cheesehead is offline   Reply With Quote
Old 2011-04-20, 00:27   #4
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×641 Posts
Default

Quote:
Originally Posted by Mr. P-1 View Post
2. Perhaps the save file format cannot handle multiple curves.
Bingo.

Why does it need to handle simultaneous saves of multiple curves on the same exponent? Why not simply finish one curve on an exponent before starting another curve on the same exponent?

If multiple curves were simultaneously in progress, and one found a factor, all the work done on the other curves becomes wasted -- unless one wants to find multiple factors, but in that case, one can simply sequentially run curves, or run them simultaneously on separate systems. What's wrong with that?

Why should prime95 start another curve's stage 1 on an exponent when it hasn't finished an earlier curve's stage 2 on that same exponent? If it did, it would make the save file logic more complicated, to no apparent advantage I can see.

Last fiddled with by cheesehead on 2011-04-20 at 00:38
cheesehead is offline   Reply With Quote
Old 2011-04-20, 01:31   #5
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

10010000110102 Posts
Default

Quote:
Originally Posted by cheesehead View Post
What's wrong with that? It can't proceed with the ECM it's already started on that exponent -- why shouldn't it go to a different exponent while the first one is blocked?
Since the server thought nothing of assigning me the same exponent multiple times for a worker I see no reason why it couldn't have work in progress on more than one at the same time obviously with different "s" values.

Quote:
Quote:
I am nervous that the software forgot that server 2 was almost done stage 2.
Why? There's nothing wrong. What do you expect to happen?
I thought it might remember what worker each worker was doing and restart there. But what it did is perfectly fine and makes more sense. The only reason I was worried is because when I sent new dates it did NOT show any progress for the worker 2 exponent that I knew was in curve 2 stage 2.

Quote:
After the swap, why weren't you concerned that the software might have forgotten the unfinished stage 2 that was formerly assigned to worker 1 but now assigned to worker 2?
Since after the swap the work continued where it left off in stage 2 I suspected that if it had not lost the work after all (in spite of what was reported) that all would be well in this mirror scenario.

Quote:
Why does it need to handle simultaneous saves of multiple curves on the same exponent? Why not simply finish one curve on an exponent before starting another curve on the same exponent?

If multiple curves were simultaneously in progress, and one found a factor, all the work done on the other curves becomes wasted -- unless one wants to find multiple factors, but in that case, one can simply sequentially run curves, or run them simultaneously on separate systems. What's wrong with that?
Why should prime95 start another curve's stage 1 on an exponent when it hasn't finished an earlier curve's stage 2 on that same exponent? If it did, it would make the save file logic more complicated, to no apparent advantage I can see.
I actually do have 2 save files each for two different exponents; but only one from each worker. There is a _1 or _2 appended to the end of the file name of the duplicates. I am going to guess the _1, _2 is worker # of the second occurance. This makes we wonder now, if switching the worktodo entries might cause an issue as the worker that the in-progress work now belongs to would not match the _x value.

The only potential concern I can forsee is a buildup of workfiles for incomplete assignments if this scenario happens too often with multiple workers.
petrw1 is offline   Reply With Quote
Old 2011-04-20, 15:56   #6
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2·7·331 Posts
Default I may have an issue ... or invalidated my work

Quote:
Originally Posted by petrw1 View Post
I actually do have 2 save files each for two different exponents; but only one from each worker. There is a _1 or _2 appended to the end of the file name of the duplicates. I am going to guess the _1, _2 is worker # of the second occurance. This makes we wonder now, if switching the worktodo entries might cause an issue as the worker that the in-progress work now belongs to would not match the _x value.

The only potential concern I can forsee is a buildup of workfiles for incomplete assignments if this scenario happens too often with multiple workers.
This is the original worktodo.txt referred to in scenario 2 below.

Code:
[Worker #1]
ECM2=xxx,1,2,8388608,1,1000000,100000000,3,"167772161"
ECM2=xxx,1,2,8388608,1,1000000,100000000,3,"167772161"
ECM2=xxx,1,2,16777216,1,1000000,100000000,1
[Worker #2]
ECM2=xxx,1,2,16777216,1,1000000,100000000,1
ECM2=xxx,1,2,8388608,1,1000000,100000000,3,"167772161"
ECM2=xxx,1,2,8388608,1,1000000,100000000,3,"167772161"
With the ECM-F work for the workers reversed I have:
Worker 1: F24 Curve 1 Stage 1 57.23% s=197xxx140
Worker 2: F23 Curve 1 Stage 1 59.50% s=576xxx081

When I moved the work back to the original workers I now have:
Worker 1: F23 Curve 3 Stage 2 1.30% s=397xxx891
Worker 2: F24 Curve 1 Stage 1 57.23% s=197xxx140

I think they are both working on the first assignment because when I do Test / Status in both cases they show the first assignment for each worker completing first.

F24 seems to be okay as it does the same work with either worker.
I'm less sure about F23. Could the workers be using the wrong save file when the work is reversed? Or if the "s" value is in the save files then the work will probably still be valid and it's just a case of which one is being worked on by a worker.

In my directory I have the following save files (each also with a .bu)
e8388608
e8388608_2
eG777216

In any case I guess I should stop fiddling with my worktodo.txt especially when doing ECM.

Last fiddled with by petrw1 on 2011-04-20 at 15:57 Reason: Last line
petrw1 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GIMPS emotions and random observations stars10250 Lounge 6 2008-09-10 05:01

All times are UTC. The time now is 05:01.

Mon May 10 05:01:54 UTC 2021 up 31 days, 23:42, 0 users, load averages: 3.15, 2.86, 2.65

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.