mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 26.6 (https://www.mersenneforum.org/showthread.php?t=15504)

Xyzzy 2012-01-11 15:01

"PrimeNet=0" would work and then use the manual reporting form?

petrw1 2012-01-11 15:27

[QUOTE=Bdot;285882]Despite my best efforts, mprime keeps unreserving my GPU-to-72 assignments.

prime.txt contains
[code]
UnreserveDays=365
MaxExponents=50
DaysOfWork=1
DaysBetweenCheckins=7
[/code]Still I see this during startup:
[code]
[Main thread Jan 11 13:45] Unable to detect some of the hyperthreaded logical CPUs.
[Main thread Jan 11 13:45] See AffinityScramble2 in undoc.txt.
[/code]My worktodo now has 30 exponents (including 10 P-1 ones).
This leaves one worker with only the current 45M exponent, which is already 65% complete,~ 9 days to go.

How can I prevent this unreserving?

Is it possible that mprime discovered that my wrong adding of work exceeded some limit (but no way 365 days, it was 5 expos at most for one worker). But it did not immediately unreserve it but saved that for the next restart/reporting in? Even though the work has been rebalanced in between?

I'd like a switch to completely disable unreserving ...[/QUOTE]

I have only personally witnessed 4 situations that automatically unreserved exponents:

1. MaxExponents exceeded: You have this one covered
2. UnReserveDays exceeded: You have this one covered.
3. Brief brain freeze of Prime95 (or my PC) made it incorrectly detect or translate my CPUID to one with much less Ghz than it really is thereby triggering point 2. A stop and restart (or two) of Prime95 would fix this for a while and then setting [B]FixedHardwareUID=1[/B] in Prime.txt made it stop happening.
4. "Twilight zone": A few times on a dual core; but again not in the last couple years at check-in time one core would un-reserve all assignments (except the active one) and immediately get a new batch of similar assignments; sometimes the same ones. This hasn't happened recently so it could have been a V25 thing.

You situation does not seem to match any of these, though.

I do find the "Unable to find..." message curious. Might it think you have work for a non-findable CPU?

Dubslow 2012-01-11 15:47

[QUOTE=petrw1;285894]
I do find the "Unable to find..." message curious. Might it think you have work for a non-findable CPU?[/QUOTE]No, it detects the CPU fine, but can't determine which logical cores map to which physical cores. That does vary by OS. My 2600K, in Windows, goes 01234567 in the format of AffinityScramble2, whereas in Linux, it's 04152637. I've also had this problem, where 1/3rd of the time it gets HT right, 1/3rd of the time it gives that error message above, and 1/3rd time it gives a variant of that error, except in then proceeds to guess. Of those, roughly half the guesses are wrong (so it might guess 04152736 or 05162734 or something not-quite-right). However, I have not experienced these mysterious unreservings that seem to happen randomly to people from time to time. Perhaps turn on debugging and see if it prints a reason next time?

Prime95 2012-01-11 21:39

[QUOTE=Bdot;285882]
I'd like a switch to completely disable unreserving ...[/QUOTE]

Try UnreserveDays=9999

Use Advanced/Manual Communication so that you only contact the server when
the workload gets low.

ckdo 2012-01-11 22:01

The latter is the key. For me, unreservations only happen when Prime95 is freshly started (and only on worker #1, it appears). ManualComm=1 will take care of the problem. Either leave it set or set it before quitting Prime95 and unset it when your CPU runs at full power again after a fresh launch.

Did I ever suggest a CommDelay=[seconds] option before? :rolleyes:

Bdot 2012-01-17 10:29

[QUOTE=Prime95;285957]Try UnreserveDays=9999

Use Advanced/Manual Communication so that you only contact the server when
the workload gets low.[/QUOTE]

I'll give that a try and monitor the situation :smile:
I usually start it as
mprime -d &
Will another mprime -m communicate with the running instance to tell it to change any settings, or do I need to restart mprime -d to let the changes take effect?

Regarding the "Unable to detect which logical CPUs are hyperthreaded." message.
The machine has 2 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, with hyperthreading disabled. In other words 12 cores without the need to detect hyperthreading. 8 cores are dedicated to 8 mprime workers with fix affinity, the other cores are for mfakto and some work I'm doing on this machine ...
Of course, it would be nice to have mprime detect that no hyperthreading is active, but I don't really care.

KyleAskine 2012-01-17 11:18

[QUOTE=Bdot;286557]I'll give that a try and monitor the situation :smile:
I usually start it as
mprime -d &
Will another mprime -m communicate with the running instance to tell it to change any settings, or do I need to restart mprime -d to let the changes take effect?

Regarding the "Unable to detect which logical CPUs are hyperthreaded." message.
The machine has 2 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, with hyperthreading disabled. In other words 12 cores without the need to detect hyperthreading. 8 cores are dedicated to 8 mprime workers with fix affinity, the other cores are for mfakto and some work I'm doing on this machine ...
Of course, it would be nice to have mprime detect that no hyperthreading is active, but I don't really care.[/QUOTE]

If you change the worktodo.txt and do not shutdown prime95 (or mprime) it will unreserve everything that you just added. Use worktodo.add, or shut it down and then add the new lines.

Bdot 2012-01-17 23:04

[QUOTE=KyleAskine;286560] Use worktodo.add, or shut it down and then add the new lines.[/QUOTE]
Yes, this is exactly what I did. Used worktodo.add, as the result was not satisfactory shutdown mprime, change worktodo.txt and restart mprime.

Dubslow 2012-01-19 03:35

Hurg. This blasted bug has stuck me too -- though when I can't be sure, it could be any time within the last week. I had an LL test registered and at 57% when I moved it behind a ~15-20 day assignment, to resume at that assignment's completion. I just checked now to find it had been unreserved. I have not added anything anywhere since the last time I checked, only to find it's suddenly gone. I have no idea why. It had been sitting patiently behind the other assignment for at least a few days, there is no reason why it would unreserve itself. Nothing changed. I did not add work or anything. The only thing I did was change OS, but that should not affect the outcome. The only bright side is that it didn't delete the 57% done save file, that would have been really annoying.

Though it doesn't matter (because it was fine for a few days before randomly unreserving for no reason that I can deserve) I do have [code]UnreserveDays=365
MaxExponents=50
[/code]in prime.txt. I have around 10 assignments across all three workers.
(I have since moved said LL back in front of the other assignment, with AdvancedTest, with the hopes of claiming Double Check credit. It seems a waste to throw away 57%.)

Edit: The exponent status page says it was reassigned earlier today.

And, from prime.log:
[code][Tue Jan 17 19:35:31 2012 - ver 26.6]
Updating computer information on the server
Sending expected completion date for [SPOILER]M200000033[/SPOILER]: Jan 23 2012
Sending expected completion date for [SPOILER]M45231161[/SPOILER]: Jan 30 2012
Sending expected completion date for [SPOILER]M33244193[/SPOILER]: Jan 19 2012
Sending expected completion date for [SPOILER]M25185907[/SPOILER]: Jan 23 2012
Sending expected completion date for [SPOILER]M26024893[/SPOILER]: Jan 28 2012
[Tue Jan 17 20:47:44 2012 - ver 26.6]
Updating computer information on the server
Sending expected completion date for [SPOILER]M200000033[/SPOILER]: Jan 23 2012
Sending expected completion date for [SPOILER]M45231161[/SPOILER]: Jan 30 2012
PrimeNet error 43: Invalid assignment key
ap: no such assignment key, GUID: [SPOILER]9dd3d23cd438497fcdb8372155860bc8[/SPOILER], key: [SPOILER]C01A52FA8057A6F380E7FEF9AEEA9980[/SPOILER]
Sending expected completion date for [SPOILER]M33244193[/SPOILER]: Jan 19 2012
Sending expected completion date for [SPOILER]M25185907[/SPOILER]: Jan 23 2012
Sending expected completion date for [SPOILER]M26024893[/SPOILER]: Jan 28 2012
[Tue Jan 17 22:00:00 2012 - ver 26.6]
Updating computer information on the server
Sending expected completion date for [SPOILER]M200000033[/SPOILER]: Jan 23 2012
Sending expected completion date for [SPOILER]M33244193[/SPOILER]: Jan 19 2012
Sending expected completion date for [SPOILER]M25185907[/SPOILER]: Jan 23 2012
Sending expected completion date for [SPOILER]M26024893[/SPOILER]: Jan 28 2012
[/code]That seems pretty clear then it's not actually the same bug as Bdot and others have been having. I will set PrimeNet=0 for the duration of the AdvancedTest.

Mini-Geek 2012-01-20 02:29

[QUOTE=Dubslow;286669][code][Tue Jan 17 19:35:31 2012 - ver 26.6]
Updating computer information on the server
Sending expected completion date for [SPOILER][removed][/SPOILER]: Jan 23 2012
...
PrimeNet error 43: Invalid assignment key
ap: no such assignment key, GUID: [SPOILER][removed by Mini-Geek][/SPOILER], key: [SPOILER][removed by Mini-Geek][/SPOILER]
[/code][/QUOTE]Using [spoiler] doesn't really hide your assignment keys or exponents. All someone needs to do is select the text (or be a bot, which doesn't care about silly things like black-on-black text).

KyleAskine 2012-01-20 02:35

I actually just got something unreserved by mprime. I have a bunch of P-1's queued up, and lo and behold, mprime removed them all and added another random LL that I have no desire to test.

I added the P-1's back, and did a manual communication with the server with the -m switch, and it seemed happy. As soon as I actually ran it with the -d switch it immediately reserved me a new exponent again and unreserved everything for the second time.

I turned off communication with PrimeNet for the time being on my Linux box, out of immense frustration.


All times are UTC. The time now is 19:27.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.