![]() |
|
|
#1 |
|
Dec 2003
23·33 Posts |
There should be a possibility for benchmark directed work selection. The purpose would be to select the kind of work there this particular machine performs relatively best compared to other machines and itself. Here is an example. I've cut some parts of the very long benchmark, and it is still very long, but watch the 5120K and 6144K FFT sizes compared to lower FFT sizes for 2 to 5 threads:
Code:
Dual-Core AMD Opteron(tm) Processor 8218 CPU speed: 2599.20 MHz, 16 cores CPU features: RDTSC, CMOV, Prefetch, 3DNow!, MMX, SSE, SSE2 L1 cache size: 64 KB L2 cache size: 1 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes L1 TLBS: 32 L2 TLBS: 512 Prime95 64-bit version 25.8, RdtscTiming=1 Best time for 768K FFT length: 28.834 ms. Best time for 896K FFT length: 34.414 ms. Best time for 1024K FFT length: 38.210 ms. Best time for 1280K FFT length: 48.860 ms. Best time for 1536K FFT length: 59.560 ms. Best time for 1792K FFT length: 71.957 ms. Best time for 2048K FFT length: 80.426 ms. Best time for 2560K FFT length: 106.308 ms. Best time for 3072K FFT length: 129.540 ms. Best time for 3584K FFT length: 155.860 ms. Best time for 4096K FFT length: 173.945 ms. Best time for 5120K FFT length: 227.923 ms. Best time for 6144K FFT length: 281.568 ms. Best time for 7168K FFT length: 345.544 ms. Best time for 8192K FFT length: 398.719 ms. Timing FFTs using 2 threads. Best time for 768K FFT length: 22.682 ms. Best time for 896K FFT length: 27.307 ms. Best time for 1024K FFT length: 31.386 ms. Best time for 1280K FFT length: 43.396 ms. Best time for 1536K FFT length: 51.444 ms. Best time for 1792K FFT length: 60.334 ms. Best time for 2048K FFT length: 67.747 ms. Best time for 2560K FFT length: 89.804 ms. Best time for 3072K FFT length: 106.367 ms. Best time for 3584K FFT length: 126.970 ms. Best time for 4096K FFT length: 143.489 ms. Best time for 5120K FFT length: 138.987 ms. Best time for 6144K FFT length: 175.762 ms. Best time for 7168K FFT length: 219.918 ms. Best time for 8192K FFT length: 266.881 ms. Timing FFTs using 3 threads. Best time for 768K FFT length: 18.719 ms. Best time for 896K FFT length: 21.458 ms. Best time for 1024K FFT length: 24.209 ms. Best time for 1280K FFT length: 40.481 ms. Best time for 1536K FFT length: 47.380 ms. Best time for 1792K FFT length: 53.547 ms. Best time for 2048K FFT length: 60.011 ms. Best time for 2560K FFT length: 83.622 ms. Best time for 3072K FFT length: 97.714 ms. Best time for 3584K FFT length: 110.732 ms. Best time for 4096K FFT length: 125.141 ms. Best time for 5120K FFT length: 94.532 ms. Best time for 6144K FFT length: 117.084 ms. Best time for 7168K FFT length: 145.253 ms. Best time for 8192K FFT length: 176.473 ms. Timing FFTs using 4 threads. Best time for 768K FFT length: 18.527 ms. Best time for 896K FFT length: 21.075 ms. Best time for 1024K FFT length: 23.856 ms. Best time for 1280K FFT length: 41.502 ms. Best time for 1536K FFT length: 48.093 ms. Best time for 1792K FFT length: 54.171 ms. Best time for 2048K FFT length: 60.568 ms. Best time for 2560K FFT length: 84.294 ms. Best time for 3072K FFT length: 98.118 ms. Best time for 3584K FFT length: 111.096 ms. Best time for 4096K FFT length: 126.028 ms. Best time for 5120K FFT length: 87.296 ms. Best time for 6144K FFT length: 116.009 ms. Best time for 7168K FFT length: 158.566 ms. Best time for 8192K FFT length: 208.560 ms. Timing FFTs using 5 threads. Best time for 768K FFT length: 17.855 ms. Best time for 896K FFT length: 20.398 ms. Best time for 1024K FFT length: 23.074 ms. Best time for 1280K FFT length: 40.311 ms. Best time for 1536K FFT length: 47.006 ms. Best time for 1792K FFT length: 53.275 ms. Best time for 2048K FFT length: 59.612 ms. Best time for 2560K FFT length: 82.323 ms. Best time for 3072K FFT length: 96.333 ms. Best time for 3584K FFT length: 109.008 ms. Best time for 4096K FFT length: 123.110 ms. Best time for 5120K FFT length: 73.077 ms. Best time for 6144K FFT length: 87.911 ms. Best time for 7168K FFT length: 113.765 ms. Best time for 8192K FFT length: 146.130 ms. Timing FFTs using 6 threads. Best time for 768K FFT length: 17.910 ms. Best time for 896K FFT length: 20.730 ms. Best time for 1024K FFT length: 23.852 ms. Best time for 1280K FFT length: 41.088 ms. Best time for 1536K FFT length: 48.035 ms. Best time for 1792K FFT length: 54.162 ms. Best time for 2048K FFT length: 60.760 ms. Best time for 2560K FFT length: 83.698 ms. Best time for 3072K FFT length: 97.699 ms. Best time for 3584K FFT length: 110.973 ms. Best time for 4096K FFT length: 125.351 ms. Best time for 5120K FFT length: 72.757 ms. Best time for 6144K FFT length: 90.876 ms. Best time for 7168K FFT length: 123.422 ms. Best time for 8192K FFT length: 160.236 ms. Timing FFTs using 7 threads. Best time for 768K FFT length: 17.193 ms. Best time for 896K FFT length: 19.610 ms. Best time for 1024K FFT length: 22.400 ms. Best time for 1280K FFT length: 40.450 ms. Best time for 1536K FFT length: 47.121 ms. Best time for 1792K FFT length: 53.477 ms. Best time for 2048K FFT length: 60.124 ms. Best time for 2560K FFT length: 83.103 ms. Best time for 3072K FFT length: 95.903 ms. Best time for 3584K FFT length: 109.122 ms. Best time for 4096K FFT length: 123.223 ms. Best time for 5120K FFT length: 72.847 ms. Best time for 6144K FFT length: 89.303 ms. Best time for 7168K FFT length: 110.689 ms. Best time for 8192K FFT length: 145.042 ms. Timing FFTs using 8 threads. Best time for 768K FFT length: 17.258 ms. Best time for 896K FFT length: 19.897 ms. Best time for 1024K FFT length: 22.797 ms. Best time for 1280K FFT length: 40.709 ms. Best time for 1536K FFT length: 47.470 ms. Best time for 1792K FFT length: 54.496 ms. Best time for 2048K FFT length: 61.021 ms. Best time for 2560K FFT length: 83.619 ms. Best time for 3072K FFT length: 97.455 ms. Best time for 3584K FFT length: 110.362 ms. Best time for 4096K FFT length: 124.052 ms. Best time for 5120K FFT length: 78.607 ms. Best time for 6144K FFT length: 104.639 ms. Best time for 7168K FFT length: 142.611 ms. Best time for 8192K FFT length: 188.151 ms. Timing FFTs using 9 threads. Best time for 768K FFT length: 16.454 ms. Best time for 896K FFT length: 19.103 ms. Best time for 1024K FFT length: 21.728 ms. Best time for 1280K FFT length: 39.754 ms. Best time for 1536K FFT length: 46.686 ms. Best time for 1792K FFT length: 53.212 ms. Best time for 2048K FFT length: 59.828 ms. Best time for 2560K FFT length: 82.171 ms. Best time for 3072K FFT length: 96.068 ms. Best time for 3584K FFT length: 108.942 ms. Best time for 4096K FFT length: 122.689 ms. Best time for 5120K FFT length: 73.240 ms. Best time for 6144K FFT length: 91.163 ms. Best time for 7168K FFT length: 114.364 ms. Best time for 8192K FFT length: 144.629 ms. Timing FFTs using 10 threads. Timing FFTs using 11 threads. Timing FFTs using 12 threads. Timing FFTs using 13 threads. Timing FFTs using 14 threads. Timing FFTs using 15 threads. Timing FFTs using 16 threads. Best time for 768K FFT length: 16.669 ms. Best time for 896K FFT length: 19.120 ms. Best time for 1024K FFT length: 21.750 ms. Best time for 1280K FFT length: 39.157 ms. Best time for 1536K FFT length: 46.627 ms. Best time for 1792K FFT length: 53.512 ms. Best time for 2048K FFT length: 60.949 ms. Best time for 2560K FFT length: 85.695 ms. Best time for 3072K FFT length: 100.253 ms. Best time for 3584K FFT length: 114.100 ms. Best time for 4096K FFT length: 128.703 ms. Best time for 5120K FFT length: 84.452 ms. Best time for 6144K FFT length: 110.193 ms. Best time for 7168K FFT length: 149.797 ms. Best time for 8192K FFT length: 196.755 ms. Best time for 58 bit trial factors: 2.805 ms. Best time for 59 bit trial factors: 2.861 ms. Best time for 60 bit trial factors: 2.857 ms. Best time for 61 bit trial factors: 3.049 ms. Best time for 62 bit trial factors: 3.047 ms. Best time for 63 bit trial factors: 3.640 ms. Best time for 64 bit trial factors: 4.265 ms. Best time for 65 bit trial factors: 5.207 ms. Best time for 66 bit trial factors: 6.205 ms. Best time for 67 bit trial factors: 6.169 ms. Of course the user should be advised against using more than one thread per test as well, but if the user insists on faster results, mprime should reccommend three thrads and 5120K FFT sized LL tests. Test different affinities and see how many tests can be run in parallell before the memory bus is saturated. Then recommend to use the rest for trial factoring. I am sure that by by leting mprime have a "smart work selection" option where it invests an hour or so in benchmarking before doing a guided selection of thread configuration and work type, GIMPS would get a two digit percentage more work done. The worst thing that could happen is that more primes would be dicovered out of sequence. |
|
|
|
|
|
#2 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
185016 Posts |
|
|
|
|
|
|
#3 |
|
Sep 2006
Brussels, Belgium
110101001112 Posts |
As I already pointed out more than once ;-) the Prime95 benchmarks, compute a BEST time, this is usefull for George to optimise the program, but I computed that the standard deviation of those best times over a few runs can be higher than 5%. I suppose this even worse on multiCPU, multicore computers. At the moment the only good benchmark is a real run with actual exponents, Of course getting the same range of results as those collected by the benchmarks would take some time, this would even be worse if one tried different combinations of cores, threads and sizes.
I already suggested to George to publish the AVERAGE time as well as the BEST time in the benchmarks, I would even suggest to increase the number of iterations in the benchmarks, as real life shows that even on a computer that is not used for anything else the average times fluctuate quite a bit as well : the difference between the minimum average time and the maximum average time can be as high as 3 % even if the average is computed over 65536 itterations. Jacob |
|
|
|
|
|
#4 | |
|
"Richard B. Woods"
Aug 2002
Wisconsin USA
22·3·641 Posts |
Quote:
Most systems are used for something else while they are running GIMPS work! That's the reason for the GIMPS selling-points about "lowest priority", "won't noticeably affect your system", "your unused CPU time" and so on. Without knowing what this other-than-GIMPS load is, use of average times in establishing a "benchmark" defeats the purpose of the "benchmark". That's, I think, why George uses best-times -- to minimize the effect of non-GIMPS load, rather than anything connected to fluctuations in the GIMPS load itself. If you know the fluctuations in the GIMPS load are 5% or 3%, then just add 5% or 3%, or half of those, or whatever, to the best-time figure to get what you want. (If prime95 could access the OS's internal records of individual task-specific time consumption, it would have a better basis than best-elapsed-time.) Last fiddled with by cheesehead on 2008-12-23 at 12:59 |
|
|
|
|
|
|
#5 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
622410 Posts |
Quote:
|
|
|
|
|
|
|
#6 | |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×33×109 Posts |
Quote:
Code:
Compare your results to other computers at http://www.mersenne.org/bench.htm Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz CPU speed: 2405.47 MHz, 4 cores CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2 L1 cache size: 32 KB L2 cache size: 4 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 256 Prime95 32-bit version 25.7, RdtscTiming=1 Best time for 768K FFT length: 17.174 ms. Best time for 896K FFT length: 20.532 ms. Best time for 1024K FFT length: 23.260 ms. Best time for 1280K FFT length: 29.176 ms. Best time for 1536K FFT length: 35.555 ms. Best time for 1792K FFT length: 42.402 ms. Best time for 2048K FFT length: 47.009 ms. Best time for 2560K FFT length: 62.070 ms. Best time for 3072K FFT length: 76.142 ms. Best time for 3584K FFT length: 90.302 ms. Best time for 4096K FFT length: 100.833 ms. Best time for 5120K FFT length: 129.338 ms. Best time for 6144K FFT length: 156.098 ms. Best time for 7168K FFT length: 189.703 ms. Best time for 8192K FFT length: 208.199 ms. Timing FFTs using 2 threads. Best time for 768K FFT length: 9.123 ms. Best time for 896K FFT length: 10.815 ms. Best time for 1024K FFT length: 12.521 ms. Best time for 1280K FFT length: 15.090 ms. Best time for 1536K FFT length: 18.392 ms. Best time for 1792K FFT length: 21.901 ms. Best time for 2048K FFT length: 24.318 ms. Best time for 2560K FFT length: 32.047 ms. Best time for 3072K FFT length: 39.076 ms. Best time for 3584K FFT length: 46.290 ms. Best time for 4096K FFT length: 51.727 ms. Best time for 5120K FFT length: 66.975 ms. Best time for 6144K FFT length: 82.615 ms. Best time for 7168K FFT length: 99.016 ms. Best time for 8192K FFT length: 110.694 ms. Timing FFTs using 3 threads. Best time for 768K FFT length: 10.161 ms. Best time for 896K FFT length: 11.362 ms. Best time for 1024K FFT length: 16.671 ms. Best time for 1280K FFT length: 12.194 ms. Best time for 1536K FFT length: 14.688 ms. Best time for 1792K FFT length: 17.338 ms. Best time for 2048K FFT length: 19.348 ms. Best time for 2560K FFT length: 25.623 ms. Best time for 3072K FFT length: 31.289 ms. Best time for 3584K FFT length: 36.908 ms. Best time for 4096K FFT length: 41.500 ms. Best time for 5120K FFT length: 52.298 ms. Best time for 6144K FFT length: 63.825 ms. Best time for 7168K FFT length: 77.246 ms. Best time for 8192K FFT length: 86.574 ms. Timing FFTs using 4 threads. Best time for 768K FFT length: 9.042 ms. Best time for 896K FFT length: 10.105 ms. Best time for 1024K FFT length: 14.639 ms. Best time for 1280K FFT length: 10.687 ms. Best time for 1536K FFT length: 12.513 ms. Best time for 1792K FFT length: 14.438 ms. Best time for 2048K FFT length: 16.276 ms. Best time for 2560K FFT length: 20.367 ms. Best time for 3072K FFT length: 24.882 ms. Best time for 3584K FFT length: 29.445 ms. Best time for 4096K FFT length: 33.453 ms. Best time for 5120K FFT length: 41.928 ms. Best time for 6144K FFT length: 51.142 ms. Best time for 7168K FFT length: 61.490 ms. Best time for 8192K FFT length: 69.359 ms. Best time for 58 bit trial factors: 4.420 ms. Best time for 59 bit trial factors: 4.441 ms. Best time for 60 bit trial factors: 4.422 ms. Best time for 61 bit trial factors: 4.430 ms. Best time for 62 bit trial factors: 7.398 ms. Best time for 63 bit trial factors: 7.431 ms. Best time for 64 bit trial factors: 6.886 ms. Best time for 65 bit trial factors: 6.838 ms. Best time for 66 bit trial factors: 6.850 ms. Best time for 67 bit trial factors: 6.832 ms. i think that it possibly due to L2 caches |
|
|
|
|
|
|
#7 | ||
|
Sep 2006
Brussels, Belgium
13·131 Posts |
Quote:
What I implied, but did not write, is the suggestion to rerun the offending benchmarks a few times, to see if the results were more coherent. Quote:
Jacob |
||
|
|
|
|
|
#8 | ||
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
24·389 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#9 | |
|
Dec 2003
23·33 Posts |
Quote:
This thread is taking a wrog turn. This machine is an extreme case to illustrate my point: Every different configuration of CPU, cache, frequenzy, RAM amount, timings and frequenzies, etc, perform differently. All have some strenghts and some weaknesses compared to itself and to other machines participating in GIMPS. The client should have an option to benchmark itself and try to find the perfect thread combination and kind of work for each machine based on benchmarks and wishes of the owner. |
|
|
|
|
|
|
#10 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
24·389 Posts |
Okay, so it is not a fluke and not throttling then what could it be? Even just saying that the cache/CPU/etc. configuration is different still does not feel right to me to fully explain the huge discrepancy. Perhaps I am just stubborn, but 5120K faster than 2560K (requires double the data throughput from memory in less than half the time) just can't be sensible. What have I forgotten/overlooked/never-knew?
|
|
|
|
|
|
#11 | |
|
Jul 2006
Calgary
52·17 Posts |
Quote:
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Polynomial selection | Max0526 | NFS@Home | 9 | 2017-05-20 08:57 |
| 2^877-1 polynomial selection | fivemack | Factoring | 47 | 2009-06-16 00:24 |
| Polynomial selection | CRGreathouse | Factoring | 2 | 2009-05-25 07:55 |
| Guided Missile. | mfgoode | Puzzles | 46 | 2006-12-17 16:38 |
| Motherboard Selection Help | jugbugs | Hardware | 13 | 2004-06-04 15:59 |