mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-04-11, 16:58   #1
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

21010 Posts
Default Why is mprime's estimate of time to complete so wrong?

I'm seeing a large (factor of 3.6 or so) difference between two estimated completion times of a PRP test. This is one typical line of standard output, which I directed to a file

[Worker #7 Apr 11 17:29] Iteration: 36530000 / 110825779 [32.96%], ms/iter: 2.684, ETA: 55:22:57

So that's saying 55 hours from now.

However, running mprime -m, followed by checking the status (option 3), I see

[Worker thread #7]
M110825779, PRP, Mon Apr 19 23:05 2021
M110825543, PRP, Sun May 2 05:28 2021
M110825587, PRP, Fri May 14 12:45 2021
M110825509, PRP, Wed May 26 20:02 2021
M110812549, PRP, Tue Jun 8 03:17 2021

As a rough estimate, the April 19th 23:05 is 198 hours away. So why does stdout indicate 55 hours to completion, but mprimes's status show 198 hours? I've tried a manual connection to the server (option 10 on mprime), and picked the option to upload completion times to the server. The server indicates the exponent will complete on the 19th April, so that agrees with mprimes status, but is very different to what standard output is showing.

The reason some of those gaps between the other estimates are unequal, is because some of those exponents have been partially tested on another machine, so ignore them.

The data at
https://www.mersenne.org/report_expo...exp_hi=&full=1
should not be used to estimate the time, as the exponent has not consistently been running with the same number of cores.


Dave

Last fiddled with by drkirkby on 2021-04-11 at 17:36
drkirkby is online now   Reply With Quote
Old 2021-04-11, 17:28   #2
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

6,143 Posts
Default

You have just encountered the standard time estimation problem.

There are many things that can make estimates bad. Perhaps the most likely is that you are running other higher priority tasks which reduce the time slice of the tests and makes the elapsed time for each iteration vary? Anyhow, I don't know what you do with your system, but anything that uses cycles, or changes the clocks on your system will mess up the estimates.
retina is offline   Reply With Quote
Old 2021-04-11, 17:30   #3
axn
 
axn's Avatar
 
Jun 2003

496610 Posts
Default

ETA is accurate, but since it is based on the current iteration time, it can go up and down.
Status is based on the expected performance of the CPU. It doesn't need the test to be running (obviously). This can be wildly inaccurate if Prime95 doesn't have good data on the CPU, but will become accurate overtime as Prime95 learns the true capability of the CPU (by adjusting a fudge factor called RollingAverage). RollingAverage starts at 1000 (representing nominal CPU performance), but is adjusted twice a day based on the CPU's observed performance.

FWIW, my CPU has a RollingAverage currently of 3487, which means the observed performance is about 3.5x of expected. It still isn't showing super accurate status times, but it is pretty close.

Last fiddled with by axn on 2021-04-11 at 17:36
axn is offline   Reply With Quote
Old 2021-04-11, 17:32   #4
slandrum
 
Jan 2021
California

71 Posts
Default

I've found that the first DC runs on a new computer get terrible time estimates on primenet. Usually super optimistic by a factor of 3 or more, but sometimes super pessimistic. By about the 3rd DC run, the time estimates start to get much better. Even so, there are a lot of things that can throw them way off.

Last fiddled with by slandrum on 2021-04-11 at 17:36
slandrum is offline   Reply With Quote
Old 2021-04-11, 17:42   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×29×59 Posts
Default

There are multiple ways of estimating completion time. One is to observe actual throughput of mprime/prime95 over days or weeks, including effect of other workloads, system shutdowns, etc, and extrapolate that mean throughput to task completion.
Another is to time iterations, currently, perhaps while the system is not doing other things, so mprime/prime95 is achieving near peak throughput possible, and extrapolate that to exponent completion.
I think status output uses the first, and the worker window of prime95 the second.
They naturally provide different estimates.

Last fiddled with by kriesel on 2021-04-11 at 17:42
kriesel is online now   Reply With Quote
Old 2021-04-11, 17:55   #6
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

2×3×5×7 Posts
Default

I have been changing the number of cores for the worker. Currently the worker has 18 cores, with the other 8-cores of a 26-core CPU devoted to other workers. Earlier in the day the had only 3 cores.

What software computes the estimated time to completion? I assumed it was mprime, which sent its estimates to the Primenet server. But from what I am reading above, that assumption is not correct.

Dave

Last fiddled with by drkirkby on 2021-04-11 at 17:56
drkirkby is online now   Reply With Quote
Old 2021-04-11, 21:22   #7
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

3228 Posts
Default

I set the benchmarks with a fixed FFT size of 6048 K, as that seems to be what's used with the 110 million assignments I have. The optimal setup is 2 workers, 52 cores, so both CPUs are running flat out with 26-cores in use each. I tried slightly less than 22, 23, 24, 25 and 26 cores per worker, but 26 cores gives the best results.

I think part of my problem was trying to do a 332646233 exponent at the same time as a the 110 million exponent. Obviously the former needs a much bigger FFT. It might be a bit tricky to find the best setup to use with one large exponent and a smaller one. It may be a case of it being better not to do that. But I don't fancy testing another big exponent.

I've now got the time per iteration down to about 1.63 ms, and 1.60 ms on a slightly smaller exponent.

[Worker #1 Apr 11 22:12] Iteration: 41590000 / 110825779 [37.52%], ms/iter: 1.634, ETA: 31:25:27
[Worker #2 Apr 11 22:12] Iteration: 43590000 / 110274583 [39.52%], ms/iter: 1.592, ETA: 29:28:57


I'm getting more and more tempted to just forget about 332646233, despite I've done more than 44% of it. I think it causes problems with throughput for smaller exponents, even when only one core is given to that exponent

Last fiddled with by drkirkby on 2021-04-11 at 21:22
drkirkby is online now   Reply With Quote
Old 2021-04-12, 11:55   #8
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

2×3×5×7 Posts
Default

The estimate seems to be getting more accurate. It is currently showing the following as the estimate for the date/time to complete.

M110825779, PRP, Thu Apr 15 21:47 2021

which is more than 4 days earlier than its previous estimate of

M110825779, PRP, Mon Apr 19 23:05 2021

I expect it to actually finish around 7 AM tomorrow morning (13th April) as it should finish in a little over 17 hours.

[Worker #1 Apr 12 12:52] Iteration: 73780000 / 110825779 [66.57%], ms/iter: 1.666, ETA: 17:08:48

Dave

Last fiddled with by drkirkby on 2021-04-12 at 11:57
drkirkby is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why can't I extend something that's certain to not complete in time? drkirkby Information & Answers 6 2021-02-18 18:35
Is Moore's Law wrong, or is it wrong-headed (6th time around) jasong jasong 12 2016-05-27 11:01
Expected Time To Complete A Quest Function SaneMur Information & Answers 33 2012-01-02 08:46
Time to complete project Citrix Prime Sierpinski Project 5 2006-01-09 03:45
Time to complete information JuanTutors Software 3 2004-06-28 10:47

All times are UTC. The time now is 13:19.

Sun May 16 13:19:39 UTC 2021 up 38 days, 8 hrs, 0 users, load averages: 0.82, 1.35, 1.80

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.