mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Lounge (https://www.mersenneforum.org/forumdisplay.php?f=7)
-   -   Only Best Times for Benchmarks? (https://www.mersenneforum.org/showthread.php?t=6620)

jinydu 2006-11-15 05:12

Only Best Times for Benchmarks?
 
I may have hinted at this question before, but I would like to ask it directly here.

[QUOTE='www.mersenne.org/bench.htm'[/QUOTE]Where more than one user has reported timings for a given CPU, anomalous timings are discarded and the best times are reported.[/QUOTE]

This seems to be a contradiction. One part of the sentence says that anomalous times are discarded, whereas the other part says that the best times are reported. What if the best time is much faster than all of the other reported times; wouldn't it then count as a anomalous?

Also, why use the best time? Why not use the average time?

cheesehead 2006-11-15 22:54

[quote=jinydu;91551]
[quote=www.mersenne.org/bench.htm]Where more than one user has reported timings for a given CPU, anomalous timings are discarded and the best times are reported.[/quote][/quote]I think the last part should be read as "... are discarded, and [I]then[/I] the best times are reported."

[quote]Also, why use the best time? Why not use the average time?[/quote]Since the (elapsed) timings necessarily include the time needed to run any other software (OS or applications) as well as the Prime95 time, the best non-anomalous time probably includes the least non-Prime95 contribution and thus most accurately reflects Prime95 performance. (A best time that is significantly better than clustered next-best times would probably be interpreted as anomalous in the preceding step.)

Xyzzy 2006-11-15 22:59

I doubt the benchmark maintainer wants to spend a lot of time averaging stuff.

S485122 2006-11-16 06:51

I did a lot of benchmarks in safe mode. No software running execpt Prime95. There is no significant difference between running one or two instances of Prime95. But the differences between "best times" from one run to another can go up to 2% on LL tests and up to 6% on factorisation tests.

This is why I think that an average of times would be more significant if enough iterations are used for each test.

It is possible that the policy of "best times" has to do with the development of the software see [url]ftp://mersenne.org/gimps/p4notes.doc[/url] specially the part "Trace cache and branch prediction" (there are some parts of the algorithm that use a variable number of cycles...)

A possible compromise solution could be that the benchmarks outputs:
- best time.
- worst time.
- average time.
-standard deviation.

But this would give a lot of data to process.


All times are UTC. The time now is 14:31.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.