![]() |
Glucas on 2xNehalem 4 cores 2.93 GHz
Glucas -O3 -xSSE4 with 8 threads on 8 cores (no hyper-threading).
Best time for 2304K FFT: 8.26 msec/iter . |
MacLucasFFTW on PS3.
MacLucasFFTW on PS3.
2048k fft sec/iter = 0.084 4096k fft sec/iter = 0.194 |
[quote=S485122;168080]It seems some AMD owners are spicing the benchmarks with fake results, for instance :
[code]AMD Athlon 64 3200+ 2210 7.13 7.96 9.90 12.07 14.89 16.80 24.55 30.21 35.81 39.69 2.05 while the other lines are more like AMD Athlon 64 3200+ 2583 34.53 38.38 48.54 59.48 72.12 80.09 107.96 132.09 163.28 178.01 10.63[/code] [/quote] My benchmark for the Socket 939 San Diego processor, Asus AN8-VM Mainboard,DDR-2 ram-modules in Single Mode, 32 bit-Windows XP AMD Athlon 64 3700+ 2431 40.76 52.02 63.12 76.83 85.76 112.53 136.69 Maybe some faster Hardware are out there! |
I have a small question, is it suppose to look like this or is this usually not happening.
Benchmark was run with FullBench=1 + AllBench=1 on a Q9650 @ Stock speed of 3000 MHz. Windows 7 Ultimate (RTM) x64 4x1024 Corsair DDR2 @ 4-4-4-12 800 MHz. 2.14 volts [code] Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz CPU speed: 3000.09 MHz, 4 cores CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4 L1 cache size: 32 KB L2 cache size: 6 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 256 Prime95 64-bit version 25.11, RdtscTiming=1 Best time for 4K FFT length: 0.038 ms. Best time for 5K FFT length: 0.055 ms. Best time for 6K FFT length: 0.061 ms. Best time for 7K FFT length: 0.079 ms. Best time for 8K FFT length: 0.080 ms. Best time for 10K FFT length: 0.122 ms. Best time for 12K FFT length: 0.140 ms. Best time for 14K FFT length: 0.176 ms. Best time for 16K FFT length: 0.185 ms. Best time for 20K FFT length: 0.252 ms. Best time for 24K FFT length: 0.292 ms. Best time for 28K FFT length: 0.370 ms. Best time for 32K FFT length: 0.391 ms. Best time for 40K FFT length: 0.513 ms. Best time for 48K FFT length: 0.604 ms. Best time for 56K FFT length: 0.765 ms. Best time for 64K FFT length: 0.804 ms. Best time for 80K FFT length: 1.146 ms. Best time for 96K FFT length: 1.327 ms. Best time for 112K FFT length: 1.660 ms. Best time for 128K FFT length: 1.771 ms. Best time for 160K FFT length: 2.206 ms. Best time for 192K FFT length: 2.671 ms. Best time for 224K FFT length: 3.281 ms. Best time for 256K FFT length: 3.604 ms. Best time for 320K FFT length: 4.608 ms. Best time for 384K FFT length: 5.499 ms. Best time for 448K FFT length: 6.760 ms. Best time for 512K FFT length: 7.409 ms. Best time for 640K FFT length: 9.593 ms. Best time for 768K FFT length: 11.886 ms. Best time for 896K FFT length: 15.357 ms. Best time for 1024K FFT length: 17.829 ms. Best time for 1280K FFT length: 23.684 ms. Best time for 1536K FFT length: 29.008 ms. Best time for 1792K FFT length: 35.708 ms. Best time for 2048K FFT length: 40.108 ms. Best time for 2560K FFT length: 52.605 ms. Best time for 3072K FFT length: 62.156 ms. Best time for 3584K FFT length: 76.636 ms. Best time for 4096K FFT length: 85.756 ms. Best time for 5120K FFT length: 108.884 ms. Best time for 6144K FFT length: 127.012 ms. Best time for 7168K FFT length: 154.952 ms. Best time for 8192K FFT length: 169.785 ms. Best time for 10240K FFT length: 219.665 ms. Best time for 12288K FFT length: 255.917 ms. Best time for 14336K FFT length: 313.237 ms. Best time for 16384K FFT length: 342.033 ms. Best time for 20480K FFT length: 499.655 ms. Best time for 24576K FFT length: 592.448 ms. Best time for 28672K FFT length: 717.238 ms. Best time for 32768K FFT length: 794.108 ms. Timing FFTs using 2 threads. Best time for 4K FFT length: 0.038 ms. Best time for 5K FFT length: 0.055 ms. Best time for 6K FFT length: 0.061 ms. Best time for 7K FFT length: 0.079 ms. Best time for 8K FFT length: 0.080 ms. Best time for 10K FFT length: 0.087 ms. Best time for 12K FFT length: 0.097 ms. Best time for 14K FFT length: 0.114 ms. Best time for 16K FFT length: 0.120 ms. Best time for 20K FFT length: 0.153 ms. Best time for 24K FFT length: 0.175 ms. Best time for 28K FFT length: 0.214 ms. Best time for 32K FFT length: 0.229 ms. Best time for 40K FFT length: 0.560 ms. Best time for 48K FFT length: 0.582 ms. Best time for 56K FFT length: 0.661 ms. Best time for 64K FFT length: 0.690 ms. Best time for 80K FFT length: 0.685 ms. Best time for 96K FFT length: 0.786 ms. Best time for 112K FFT length: 0.956 ms. Best time for 128K FFT length: 1.021 ms. Best time for 160K FFT length: 1.234 ms. Best time for 192K FFT length: 1.434 ms. Best time for 224K FFT length: 1.747 ms. Best time for 256K FFT length: 1.929 ms. Best time for 320K FFT length: 2.460 ms. Best time for 384K FFT length: 2.931 ms. Best time for 448K FFT length: 3.579 ms. Best time for 512K FFT length: 3.943 ms. Best time for 640K FFT length: 5.282 ms. Best time for 768K FFT length: 6.794 ms. Best time for 896K FFT length: 9.112 ms. Best time for 1024K FFT length: 11.336 ms. Best time for 1280K FFT length: 16.220 ms. Best time for 1536K FFT length: 20.281 ms. Best time for 1792K FFT length: 24.588 ms. Best time for 2048K FFT length: 28.492 ms. Best time for 2560K FFT length: 37.029 ms. Best time for 3072K FFT length: 44.779 ms. Best time for 3584K FFT length: 52.661 ms. Best time for 4096K FFT length: 60.376 ms. Best time for 5120K FFT length: 76.160 ms. Best time for 6144K FFT length: 91.910 ms. Best time for 7168K FFT length: 108.115 ms. Best time for 8192K FFT length: 122.895 ms. Best time for 10240K FFT length: 156.009 ms. Best time for 12288K FFT length: 185.455 ms. Best time for 14336K FFT length: 218.730 ms. Best time for 16384K FFT length: 249.957 ms. Best time for 20480K FFT length: 309.510 ms. Best time for 24576K FFT length: 368.779 ms. Best time for 28672K FFT length: 431.535 ms. Best time for 32768K FFT length: 498.112 ms. Timing FFTs using 3 threads. Best time for 4K FFT length: 0.038 ms. Best time for 5K FFT length: 0.055 ms. Best time for 6K FFT length: 0.061 ms. Best time for 7K FFT length: 0.079 ms. Best time for 8K FFT length: 0.080 ms. Best time for 10K FFT length: 0.158 ms. Best time for 12K FFT length: 0.173 ms. Best time for 14K FFT length: 0.191 ms. Best time for 16K FFT length: 0.207 ms. Best time for 20K FFT length: 0.246 ms. Best time for 24K FFT length: 0.288 ms. Best time for 28K FFT length: 0.336 ms. Best time for 32K FFT length: 0.380 ms. Best time for 40K FFT length: 0.837 ms. Best time for 48K FFT length: 0.911 ms. Best time for 56K FFT length: 0.989 ms. Best time for 64K FFT length: 1.034 ms. Best time for 80K FFT length: 1.251 ms. Best time for 96K FFT length: 1.411 ms. Best time for 112K FFT length: 1.504 ms. Best time for 128K FFT length: 1.725 ms. Best time for 160K FFT length: 1.804 ms. Best time for 192K FFT length: 1.597 ms. Best time for 224K FFT length: 1.851 ms. Best time for 256K FFT length: 2.047 ms. Best time for 320K FFT length: 2.500 ms. Best time for 384K FFT length: 2.929 ms. Best time for 448K FFT length: 3.527 ms. Best time for 512K FFT length: 4.057 ms. Best time for 640K FFT length: 8.055 ms. Best time for 768K FFT length: 9.822 ms. Best time for 896K FFT length: 11.984 ms. Best time for 1024K FFT length: 17.667 ms. Best time for 1280K FFT length: 17.163 ms. Best time for 1536K FFT length: 22.042 ms. Best time for 1792K FFT length: 27.139 ms. Best time for 2048K FFT length: 31.808 ms. Best time for 2560K FFT length: 41.841 ms. Best time for 3072K FFT length: 50.510 ms. Best time for 3584K FFT length: 60.511 ms. Best time for 4096K FFT length: 69.348 ms. Best time for 5120K FFT length: 87.050 ms. Best time for 6144K FFT length: 103.222 ms. Best time for 7168K FFT length: 123.203 ms. Best time for 8192K FFT length: 139.015 ms. Best time for 10240K FFT length: 182.308 ms. Best time for 12288K FFT length: 216.148 ms. Best time for 14336K FFT length: 258.419 ms. Best time for 16384K FFT length: 288.518 ms. Best time for 20480K FFT length: 375.468 ms. Best time for 24576K FFT length: 449.556 ms. Best time for 28672K FFT length: 526.856 ms. Best time for 32768K FFT length: 606.468 ms. Timing FFTs using 4 threads. Best time for 4K FFT length: 0.038 ms. Best time for 5K FFT length: 0.055 ms. Best time for 6K FFT length: 0.061 ms. Best time for 7K FFT length: 0.079 ms. Best time for 8K FFT length: 0.080 ms. Best time for 10K FFT length: 0.166 ms. Best time for 12K FFT length: 0.177 ms. Best time for 14K FFT length: 0.194 ms. Best time for 16K FFT length: 0.212 ms. Best time for 20K FFT length: 0.244 ms. Best time for 24K FFT length: 0.282 ms. Best time for 28K FFT length: 0.326 ms. Best time for 32K FFT length: 0.362 ms. Best time for 40K FFT length: 0.783 ms. Best time for 48K FFT length: 0.850 ms. Best time for 56K FFT length: 0.880 ms. Best time for 64K FFT length: 0.940 ms. Best time for 80K FFT length: 1.145 ms. Best time for 96K FFT length: 1.260 ms. Best time for 112K FFT length: 1.395 ms. Best time for 128K FFT length: 1.520 ms. Best time for 160K FFT length: 1.666 ms. Best time for 192K FFT length: 1.527 ms. Best time for 224K FFT length: 1.722 ms. Best time for 256K FFT length: 1.942 ms. Best time for 320K FFT length: 2.289 ms. Best time for 384K FFT length: 2.731 ms. Best time for 448K FFT length: 3.240 ms. Best time for 512K FFT length: 3.758 ms. Best time for 640K FFT length: 7.334 ms. Best time for 768K FFT length: 8.864 ms. Best time for 896K FFT length: 10.519 ms. Best time for 1024K FFT length: 15.633 ms. Best time for 1280K FFT length: 14.863 ms. Best time for 1536K FFT length: 19.601 ms. Best time for 1792K FFT length: 24.379 ms. Best time for 2048K FFT length: 28.085 ms. Best time for 2560K FFT length: 38.088 ms. Best time for 3072K FFT length: 46.749 ms. Best time for 3584K FFT length: 55.335 ms. Best time for 4096K FFT length: 63.823 ms. Best time for 5120K FFT length: 80.981 ms. Best time for 6144K FFT length: 97.393 ms. Best time for 7168K FFT length: 115.800 ms. Best time for 8192K FFT length: 132.176 ms. Best time for 10240K FFT length: 170.732 ms. Best time for 12288K FFT length: 207.549 ms. Best time for 14336K FFT length: 243.647 ms. Best time for 16384K FFT length: 283.567 ms. Best time for 20480K FFT length: 358.136 ms. Best time for 24576K FFT length: 433.866 ms. Best time for 28672K FFT length: 509.462 ms. Best time for 32768K FFT length: 585.658 ms. Best time for 58 bit trial factors: 2.918 ms. Best time for 59 bit trial factors: 2.863 ms. Best time for 60 bit trial factors: 2.842 ms. Best time for 61 bit trial factors: 3.230 ms. Best time for 62 bit trial factors: 3.452 ms. Best time for 63 bit trial factors: 3.878 ms. Best time for 64 bit trial factors: 5.003 ms. Best time for 65 bit trial factors: 5.326 ms. Best time for 66 bit trial factors: 5.865 ms. Best time for 67 bit trial factors: 5.763 ms. [/code] |
[QUOTE=Shinzok;185513]I have a small question, is it suppose to look like this or is this usually not happening.
Benchmark was run with FullBench=1 + AllBench=1 on a Q9650 @ Stock speed of 3000 MHz. Windows 7 Ultimate (RTM) x64 4x1024 Corsair DDR2 @ 4-4-4-12 800 MHz. 2.14 volts [/QUOTE] Looks pretty normal to me. Why do you ask? |
Where can one find the benchmark applications?
:smile: |
[quote=storm5510;185616]Where can one find the benchmark applications?
:smile:[/quote] You use Prime95. (optionally) First, close Prime95, and add FullBench=1 on a new line at the top of prime.txt (in the Prime95 folder). This will make it run on very small (down to 4K, normally 768K) and very large (up to 32M, normally 8M) FFT sizes instead of just the ones you might run into in GIMPS. (end optional) Choose Options > Benchmark in Prime95, and wait for it to finish (takes several minutes). You have to use Test > Stop to stop any work before running the Benchmark. Needless to say, you should try to keep tasks, even mouse moving, to a minimum during it so that it gives consistent results. I'm pretty sure that it will communicate the results to the v5 server once it completes (in the next comm., of course, which you can do immediately with Advanced > Manual Communication), if that Prime95 instance is set to use PrimeNet for assignments and such. If not and you want them included, you could probably email the results.txt to :woltman: (prior to PrimeNet v5 this was the only way) |
Thank you!
:smile: |
Well cause running 4 threads = almost useless. Cause 2 threads will always be faster. Weird.
How come it slower with 3 and 4 threads :( When only doing 2 threads: Well I'm currently doing 2 Double-checks of a M21xxxxxxx and they both have around 0.034 sec per iteration (FFT sizes 1280K) and as soon as I run the 2 Lucas-Lehmer of a M43xxxxxxx with FFT sizes 2560K on the other 2 cores It becomes: 2 Double-checks of a M21xxxxxxx and they both have around 0.074 sec per iteration (FFT sizes 1280K) 2 Lucas-Lehmer of a M43xxxxxxx will both have around 0.143 sec per iteration (FFT sizes 2560K) If I only do the 2 LL on 3rd+4th worker they both run around 0.069 sec per iteration (FFT sizes 2560) How come my 1st+2nd thread become slower as soon as the 3rd+4th thread gets activated. My CPU should not have any problems with 4 threads. It can't be the memory bandwith cause thats more than what the Bus speed of my CPU is. Memory runs a 400 Bus speed (800 MHz. DDR2) with 4-4-4-12. which should be more than enough to feed the CPU with data cause it runs on a 333 Bus speed. Or should I use the Quad-bumbed bus speed and the DDR2 speed for comparing. (Cause then yes I'm 533 Mhz. short as 800+533=1333 ) |
[quote=Shinzok;186075]Well cause running 4 threads = almost useless. Cause 2 threads will always be faster. Weird.
How come it slower with 3 and 4 threads :([/quote]It's because of limitation on simultaneous memory accesses by the cores. Prime95 accesses memory faster than almost any other application -- its top speed is usually determined by how fast the memory controller can provide it with all the fetches it requests. Multicore systems are usually not designed to allow maximum memory access speed to all cores simultaneously, because it's so rare for any application to be able to use that simultaneous maximum effectively. |
[QUOTE=Shinzok;186075]Memory runs a 400 Bus speed (800 MHz. DDR2) with 4-4-4-12. which should be more than enough to feed the CPU with data cause it runs on a 333 Bus speed.
Or should I use the Quad-bumbed bus speed and the DDR2 speed for comparing. (Cause then yes I'm 533 Mhz. short as 800+533=1333 )[/QUOTE]To achieve maximum performance with your Core2Quad, you should overclock bus speed to 400MHz (1600 effective) and more importantly upgrade memory to at least DDR2-1066. Even then you will encounter memory/bus saturation for tests involving FFT lengths greater than 256k. |
| All times are UTC. The time now is 22:58. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.