mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-07-12, 15:20   #1
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default Unreserving Problem

Hi,

Since many years I get sometimes exponents unreserved. See this log for example:

Code:
[Comm thread Jul 12 09:48] Updating computer information on the server
[Comm thread Jul 12 09:48] Sending expected completion date for M69432359: Jul 15 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M69676049: Jul 20 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M69676169: Jul 25 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M70448837: Jul 30 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M70503361: Aug  4 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M70506521: Aug  9 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M71444189: Aug 14 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M812000023: Jul 12 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M813000169: Jul 15 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M814000067: Jul 17 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M815000119: Jul 20 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M816000001: Jul 22 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M817000007: Jul 25 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M818000021: Jul 27 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M819000047: Jul 30 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M821000003: Aug  1 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M822000017: Aug  3 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M823000043: Aug  6 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M824000011: Aug  8 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M825000119: Aug 11 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M826000003: Aug 13 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M827000011: Aug 16 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M828000001: Aug 18 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M829000049: Aug 20 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M832000027: Aug 23 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M833000083: Aug 25 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M834000031: Aug 28 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M835000031: Aug 30 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M836000003: Sep  1 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M837000161: Sep  4 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M838000103: Sep  6 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M841000067: Sep  9 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M842000017: Sep 11 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M843000007: Sep 13 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M844000097: Sep 16 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M845000003: Sep 18 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M846000073: Sep 20 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M847000097: Sep 23 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M848000051: Sep 25 2016
[Comm thread Jul 12 09:49] Sending expected completion date for M849000083: Sep 27 2016
[Comm thread Jul 12 09:49] Done communicating with server.
[Worker #1 Jul 12 10:18] Iteration: 33000000 / 69432359 [47.52%], ms/iter:  5.734, ETA: 58:01:39
[Worker #2 Jul 12 11:04] Trial factoring M812000023 to 2^76 is 84.51% complete.  Time: 7951.263 sec.
[Worker #1 Jul 12 11:07] Iteration: 33500000 / 69432359 [48.24%], ms/iter:  5.811, ETA: 58:00:19
[Worker #1 Jul 12 11:55] Iteration: 34000000 / 69432359 [48.96%], ms/iter:  5.764, ETA: 56:43:54
[Worker #1 Jul 12 12:43] Iteration: 34500000 / 69432359 [49.68%], ms/iter:  5.792, ETA: 56:11:57
[Worker #2 Jul 12 13:17] Trial factoring M812000023 to 2^76 is 92.96% complete.  Time: 7936.362 sec.
[Worker #1 Jul 12 13:32] Iteration: 35000000 / 69432359 [50.40%], ms/iter:  5.822, ETA: 55:40:49
[Worker #1 Jul 12 14:21] Iteration: 35500000 / 69432359 [51.12%], ms/iter:  5.911, ETA: 55:42:58
[Worker #2 Jul 12 15:07] M812000023 no factor from 2^75 to 2^76, Wf8: 717A1B92
[Comm thread Jul 12 15:07] Sending result to server: UID: rudimeier/volos, M812000023 no factor from 2^75 to 2^76, Wf8: 717A1B92, AID: C76BF6315206B3022A26D42B6F1AC1A2
[Comm thread Jul 12 15:07]
[Worker #2 Jul 12 15:07] Starting trial factoring of M813000169 to 2^76
[Worker #2 Jul 12 15:07] Trial factoring M813000169 to 2^76.
[Comm thread Jul 12 15:07] PrimeNet success code with additional info:
[Comm thread Jul 12 15:07] CPU credit is 9.4238 GHz-days.
[Comm thread Jul 12 15:07] Done communicating with server.
[Worker #1 Jul 12 15:08] Iteration: 36000000 / 69432359 [51.84%], ms/iter:  5.715, ETA: 53:04:13
[Worker #1 Jul 12 15:56] Iteration: 36500000 / 69432359 [52.56%], ms/iter:  5.646, ETA: 51:38:41
[Main thread Jul 12 16:15] Stopping all worker threads.
[Worker #2 Jul 12 16:15] Worker stopped.
[Worker #1 Jul 12 16:15] Stopping primality test of M69432359 at iteration 36702601 [52.86%]
[Worker #1 Jul 12 16:15] Worker stopped.
[Main thread Jul 12 16:15] Execution halted.
[Main thread Jul 12 16:15] Choose Test/Continue to restart.

##  up to here is all fine
##  then I've stopped and re-started mprime.

[Main thread Jul 12 16:59] Mersenne number primality test program version 28.9
[Main thread Jul 12 16:59] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Main thread Jul 12 16:59] Logical CPUs 1,5 form one physical CPU.
[Main thread Jul 12 16:59] Logical CPUs 2,6 form one physical CPU.
[Main thread Jul 12 16:59] Logical CPUs 3,7 form one physical CPU.
[Main thread Jul 12 16:59] Logical CPUs 4,8 form one physical CPU.
[Main thread Jul 12 16:59] Starting workers.
[Comm thread Jul 12 16:59] Unreserving M70448837
[Comm thread Jul 12 16:59] Unreserving M70503361
[Worker #1 Jul 12 16:59] Worker starting
[Worker #1 Jul 12 16:59] Setting affinity to run worker on any logical CPU.
[Worker #2 Jul 12 16:59] Waiting 5 seconds to stagger worker starts.
[Comm thread Jul 12 16:59] Unreserving M70506521
[Worker #1 Jul 12 16:59] Setting affinity to run helper thread 1 on any logical CPU.
[Worker #1 Jul 12 16:59] Setting affinity to run helper thread 2 on any logical CPU.
[Comm thread Jul 12 16:59] Unreserving M71444189
[Worker #1 Jul 12 16:59] Resuming primality test of M69432359 using FMA3 FFT length 3840K, Pass1=384, Pass2=10K, 3 threads
[Worker #1 Jul 12 16:59] Iteration: 36702602 / 69432359 [52.86%].
[Comm thread Jul 12 16:59] Done communicating with server.
This makes me crazy. I have already
UnreserveDays=300
in prime.txt and any of these exponents would be done within one month.

Is it a bug or am I doing something wrong?

Actually I would like to disable any auto-unreserving completely. I NEVER want to unreserve any exponent. Is that possible?

BTW it would be nice to have a message about _why_ it unreserves an exponent.
rudi_m is offline   Reply With Quote
Old 2016-07-12, 15:54   #2
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

65748 Posts
Default

All those 69M and 70M exponents are in Category 1 which means they need to finish within 30 days. To get Category 1 you must have "DaysOfWork=5" or less and the 71M is Cat 2 requiring "DaysOfWWork=10" or less. I think that is why it is unreserving them. See the rules here:
http://www.mersenne.org/thresholds/

Also check prime.txt if you have "MaxExponents=" set to anything.


Now with the Category system you have to do Category 3 or 4 exponents if you wish to queue up work for a month or more. The computer is not always connected to the internet?

By the way it is pretty much a waste doing trial factoring on a CPU now, but it is of course your choice.
ATH is offline   Reply With Quote
Old 2016-07-12, 16:15   #3
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

205716 Posts
Default

Quote:
Originally Posted by rudi_m View Post
Since many years I get sometimes exponents unreserved.
This makes me crazy. I have already
UnreserveDays=300
in prime.txt and any of these exponents would be done within one month.

Is it a bug or am I doing something wrong?
It is a bug. My best guess is that at startup prime95 is somehow mis-estimating the expected completion dates.

There is no option to stop unreserving. For now, I suggest trying UnreserveDays=500000000.

The only other workaround is to start each LL test. Once a savefile exists the exponent should not be unreserved. Of course, rearranging worktodo to run each LL test for a few iterations is extremely tedious.

I'll add an option next time I update prime95.
Prime95 is offline   Reply With Quote
Old 2016-07-12, 17:04   #4
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

I also had problems with exponents getting unreserved. Sometimes exponents would suddenly get wacky completion dates far into the future, and that would cause the remaining lines of worktodo.txt to get unreserved. But sometimes the completion dates were normal and unreserving still happened.

I think maybe it was caused by blindly copying local.txt from an existing computer to a new one. The problem is that if the ComputerGUID line is the same on different computers, then PrimeNet thinks they are the same computer. So PrimeNet is constantly getting contradictory information about which exponents are being worked on and what the rolling average speed is. Same thing if you are using FixedHardwareUID=1, which causes a HardwareGUID line to be generated: if you end up with identical HardwareGUID lines on multiple different computers, then I think that also triggers the problem.

Go to your http://www.mersenne.org/cpus/ page in your account, and look at how many registered computers the system thinks you have. Make a list of every exponent that your fleet of computers is current working on, and check if all of them appear on the http://www.mersenne.org/cpus/ page, or only some of them.

If you have identical ComputerGUID lines in multiple local.txt files on different computers, then just stop mprime/Prime95, delete the ComputerGUID line from local.txt, and then restart the program. It should automatically generate a brand new unique ComputerGUID line in the local.txt file. If you have identical HardwareGUID lines on different computers, then delete them and stop using FixedHardwareUID=1.



When supplying an initial version of local.txt to a brand new computer, I remove almost everything, leaving only the basics:

Code:
ComputerID=<<your computer name here>>
Memory=<<nnnn>> during 7:30-23:30 else <<nnnn>>
Affinity=100
WorkerThreads=1
ThreadsPerTest=<<number of cores>>
The program and PrimeNet can create all the other lines by themselves.


However, I do keep the old local.txt file (including the old ComputerGUID line) if the new computer is completely identical to the old one and the old one has been permanently retired. For example when running instances of virtual machines in the cloud, and the old instance is terminated and a new one is launched, taking over existing worktodo and save files.

Last fiddled with by GP2 on 2016-07-12 at 17:10
GP2 is offline   Reply With Quote
Old 2016-07-12, 17:30   #5
GP2
 
GP2's Avatar
 
Sep 2003

2×5×7×37 Posts
Default

Quote:
Originally Posted by ATH View Post
All those 69M and 70M exponents are in Category 1 which means they need to finish within 30 days. To get Category 1 you must have "DaysOfWork=5" or less and the 71M is Cat 2 requiring "DaysOfWWork=10" or less. I think that is why it is unreserving them.
Take a closer look:

Code:
[Comm thread Jul 12 09:48] Sending expected completion date for M69432359: Jul 15 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M69676049: Jul 20 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M69676169: Jul 25 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M70448837: Jul 30 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M70503361: Aug  4 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M70506521: Aug  9 2016
[Comm thread Jul 12 09:48] Sending expected completion date for M71444189: Aug 14 2016
...
[Main thread Jul 12 16:15] Stopping all worker threads.
...
[Main thread Jul 12 16:59] Starting workers.
[Comm thread Jul 12 16:59] Unreserving M70448837
[Comm thread Jul 12 16:59] Unreserving M70503361
[Comm thread Jul 12 16:59] Unreserving M70506521
[Comm thread Jul 12 16:59] Unreserving M71444189
All of these exponents are Category 1 except the last which is Category 2.

Of the four exponents that were unreserved, three were Category 1 and one was Category 2. These categories have different expiry timelines, yet exponents of different categories got unreserved at the same time. That makes it unlikely that the categories are the right explanation.

The three Category 1 exponents were scheduled to complete in less than 30 days, and in any case the time limit for Category 1 is 90 days. Only Category 0 is 30 days. The fourth exponent was Category 2, which can take up to 180 days. So they were all within the time limits.

Another possibility: Category 1 exponents get recycled if not started within 20 days and Category 2 exponents get recycled if not started within 30 days. However, these different categories were all unreserved at the same time, so this would not be the explanation unless these exponents were assigned exactly 20 days ago and exactly 30 days ago, respectively, which would be a strange coincidence.

Probably there was a glitch which temporarily pushed the completion date of 69676169 far enough into the future, more than 180 days, which made the system think that the remaining exponents would not get started for a long time, so it unreserved them. In my own experience I believe such glitches were caused for me when PrimeNet thought that multiple different computers were the same computer and then continually got contradictory information from all of them about rolling average speeds. You can check your http://www.mersenne.org/cpus/ listing to see if PrimeNet thinks that you have fewer computers than you actually do, which may be caused by identical ComputerGUID lines or HardwareGUID lines in the local.txt file.

Quote:
By the way it is pretty much a waste doing trial factoring on a CPU now, but it is of course your choice.
Agreed. For trial factoring I think CPUs are slower than GPUs by up to a factor of 100 now.

Last fiddled with by GP2 on 2016-07-12 at 18:09
GP2 is offline   Reply With Quote
Old 2016-07-12, 19:05   #6
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·112·47 Posts
Default

Quote:
Originally Posted by GP2 View Post
Take a closer look:
This is, without question, a client-side issue. If the server had recycled the candidate the message instead would have been something like "Invalid Assignment Key".
chalsall is offline   Reply With Quote
Old 2016-07-12, 23:42   #7
rudi_m
 
rudi_m's Avatar
 
Jul 2005

B616 Posts
Default

Quote:
Originally Posted by Prime95 View Post
It is a bug. My best guess is that at startup prime95 is somehow mis-estimating the expected completion dates.

There is no option to stop unreserving. For now, I suggest trying UnreserveDays=500000000.
.
Thanks for clarifying this. I will try UnreserveDays=500000000.

I don't know what could trigger the bug in this particular case of my posted logs. Maybe it's worth to note that there was a reboot between stop/start. Normally I can stop/start without problems.

Most notably I remember other cases (regarding prime.txt:MinLoad/MaxLoad) where the ETA calculation seems to go wrong after threads were pausing, which may cause unreserving too.

About the other posts mentioning "Category 1 rules, etc". I understand that the _server_ would unassign the exponent. But _my_ client should do what I want, specially if I manually assigned the exponents and nobody else _finished_ the same job already.

Last fiddled with by rudi_m on 2016-07-13 at 00:05
rudi_m is offline   Reply With Quote
Old 2016-07-13, 00:01   #8
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default

Quote:
Originally Posted by ATH View Post
By the way it is pretty much a waste doing trial factoring on a CPU now, but it is of course your choice.
I know you are right.

Some years ago I've started to do factoring on some CPU cores because it's less invasive for the system when doing LL on _all_ cores. Actually I could just let one or two cores per system idle nowadays. But I still want to finish my minor project "Factoring the smallest exponent of each million range to the limit". As you see on "work distribution" map I've managed to write a column of ones into the Available/P-1 column up to 600M already :)

If somebody with GPU wants to complete my non-sense project I can give you the work file. ;) (about 350 jobs, factoring 76 to 79 bits)

I would need 3 more years when using only 2 recent CPU cores ...

Last fiddled with by rudi_m on 2016-07-13 at 00:21
rudi_m is offline   Reply With Quote
Old 2016-07-13, 00:20   #9
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

C7016 Posts
Default

Another way I've seen assignments get unassigned is this:

start mprime, and leave running
add work to worktodo.add
run mprime -c
see the assignments get registered and added to worktodo.txt

and later worktodo.txt will empty itself. I haven't investigated why it happens.
Mark Rose is offline   Reply With Quote
Old 2016-07-13, 08:42   #10
GP2
 
GP2's Avatar
 
Sep 2003

2×5×7×37 Posts
Default

Quote:
Originally Posted by rudi_m View Post
But _my_ client should do what I want, specially if I manually assigned the exponents and nobody else _finished_ the same job already.
If you manually assigned the exponents, you can just leave them under "Manual testing" for the time being and they're guaranteed not to get unreserved for six months.

If you have a backlog of exponents to test, then instead of listing those exponents within multiple long worktodo files where they're vulnerable to the unreserving bug, you can keep your worktodo files short and store a global centralized list of Test= or DoubleCheck= lines in some separate file, and then just create one-line worktodo.add files as necessary whenever old exponents complete.
GP2 is offline   Reply With Quote
Old 2016-07-13, 15:11   #11
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

2×13×131 Posts
Default

Quote:
Originally Posted by rudi_m View Post
...I would need 3 more years when using only 2 recent CPU cores ...
Or, if the statement is correct that GPUs factor 100x faster than a CPU, it would only take .03 more years on a GPU (~ 11 days). Check my math; that seems ridiculously out of whack but probably true. LOL

Seems more power efficient to do that and just let your extra CPU cores idle for 3 years.
Madpoo is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
P95 unreserving exponents bcp19 PrimeNet 12 2012-01-26 04:28
54.5M to 55.0M to 2^63 - Unreserving this range Corbyguy Lone Mersenne Hunters 4 2008-07-29 08:09
Unreserving exponents via manual forms or not? Boulder PrimeNet 3 2007-05-29 10:01
Temp-files after unreserving exponents Matthias C. Noc Software 1 2004-09-17 08:54
question about unreserving exponents ixfd64 Lounge 5 2003-12-05 03:45

All times are UTC. The time now is 13:26.


Fri Jul 7 13:26:31 UTC 2023 up 323 days, 10:55, 0 users, load averages: 1.74, 1.40, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔