mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Perpetual benchmark thread... (https://www.mersenneforum.org/showthread.php?t=59)

henryzz 2008-10-20 19:19

[quote=starrynte;145813][code]
Intel(R) Pentium(R) 4 CPU 3.00GHz
CPU speed: 3060.56 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.14, RdtscTiming=1
Best time for 512K FFT length: 17.214 ms.
Best time for 640K FFT length: 22.350 ms.
Best time for 768K FFT length: 27.172 ms.
Best time for 896K FFT length: 33.198 ms.
Best time for 1024K FFT length: 38.122 ms.
Best time for 1280K FFT length: 46.649 ms.
Best time for 1536K FFT length: 56.462 ms.
Best time for 1792K FFT length: 68.155 ms.
Best time for 2048K FFT length: 76.344 ms.
Best time for 2560K FFT length: 100.902 ms.
Best time for 3072K FFT length: 121.601 ms.
Best time for 3584K FFT length: 147.856 ms.
Best time for 4096K FFT length: 163.470 ms.
Best time for 58 bit trial factors: 9.453 ms.
Best time for 59 bit trial factors: 9.536 ms.
Best time for 60 bit trial factors: 9.343 ms.
Best time for 61 bit trial factors: 9.450 ms.
Best time for 62 bit trial factors: 13.165 ms.
Best time for 63 bit trial factors: 13.158 ms.
Best time for 64 bit trial factors: 15.198 ms.
Best time for 65 bit trial factors: 15.172 ms.
Best time for 66 bit trial factors: 15.356 ms.
Best time for 67 bit trial factors: 15.302 ms.
[/code]
Hyperthreading enabled, benchmark was run with affinity set to both "cores"
Memory: 3 GB[/quote]
compared to the core 2 architecture clock to clock ur p4 is about half the speed

petrw1 2008-10-20 19:34

[QUOTE=Jeff Gilchrist;144379]Processor: [B]Q9550[/B] @ 3.4 GHz (overclocked)
Motherboard: [B]ASUS Maximus II Formula[/B]
OS: [B]Vista 64-bit[/B]
Memory: [B]4GB DDR2-1066[/B]
mprime: [B]25.6 build 6 Prime95 Windows64[/B]

[CODE][Fri Oct 03 06:49:07 2008]
Best time for 1024K FFT length: 16.414 ms.
Best time for 2048K FFT length: 33.525 ms.
Best time for 4096K FFT length: 72.135 ms.
Best time for 8192K FFT length: 147.020 ms.
Timing FFTs using 2 threads.
Best time for 1024K FFT length: 8.723 ms.
Best time for 2048K FFT length: 17.517 ms.
Best time for 4096K FFT length: 37.173 ms.
Best time for 8192K FFT length: 77.260 ms.
Timing FFTs using 3 threads.
Best time for 1024K FFT length: 11.759 ms.
Best time for 2048K FFT length: 14.379 ms.
Best time for 4096K FFT length: 31.147 ms.
Best time for 8192K FFT length: 61.572 ms.
Timing FFTs using 4 threads.
Best time for 1024K FFT length: 10.224 ms.
Best time for 2048K FFT length: 11.916 ms.
Best time for 4096K FFT length: 25.352 ms.
Best time for 8192K FFT length: 50.371 ms.
[/CODE][/QUOTE]

Others have reported that due to other overhead or bottlenecks you can't get 4 cores to have 4 times the throughput as 1 ... HOWEVER ... it strikes me as very unexpected that in your test 3 threads is slower than 2 and even 4 is slower than 2 with the 1024 FFT. Other sizes I extracted show the expected improvement I suppose.

T.Rex 2008-10-23 14:43

mprime on prototype Nehalem 2.4 GHz 8 cores
 
Hi,

I've got results of mprime on a (prototype) Nehalem 8 cores 2.4 GHz .

I'll not provide detailed results since I do not know yet if it is confidential or not. I'll just focus on mprime scalability.
(Note that there are some errors while allocating memory for some of the FFTs.)

At 8 MB FFT:
For 2 cores, the scalability is about 98.5% (1.97) .
For 4 cores, the scalability is about 82.2% (3.29) .
For 6 cores, the scalability is about 71.0% (4.26) .
For 8 cores, the scalability is about 66.6% (5.33) .
Not so bad.



George,

1) Do you plan to work on a version of Prime95 that will fully take profit of the Nehalem processor ?

2) Are there some limitations within Prime95 about the number of cores ? Is it able to run on a machine with more than 8 cores ?

T.

T.Rex 2008-10-23 14:57

Glucas on Nehalem 2.4 GHz 8 cores
 
Hi,

Here are some complementary results, dealing with the scalability of Glucas (tests done using my "personnal" tricks about Glucas...).
As you can see, the scalability is ... incredible !! Much better than what I saw on our Bull Itanium2 NovaScale machine, I think.

But I need to provide a comparison of Glucas and mprime performance on the same FFT size for 1 core. This comparison is useful because Glucas probably does NOT fill the CPU caches as good as mprime does ! So that it is more scalable but not as powerful... Later.

T.

Exponent: 43112609 .
For 2 cores, the scalability is about 99.4% (1.99) .
For 3 cores, the scalability is about 98.7% (2.96) .
For 4 cores, the scalability is about 98.5% (3.94) .
For 5 cores, the scalability is about 95.5% (4.78) .
For 6 cores, the scalability is about 98.0% (5.88) .
For 7 cores, the scalability is about 92.5% (6.47) .
For 8 cores, the scalability is about 93.6% (7.49) .

Uncwilly 2008-10-23 22:18

[QUOTE=T.Rex;146285]I've got results of mprime on a (prototype) Nehalem 8 cores 2.4 GHz .

I'll not provide detailed results since I do not know yet if it is confidential or not. I'll just focus on mprime scalability.
(Note that there are some errors while allocating memory for some of the FFTs.)[/QUOTE]Which version of mprime? What about parallel through-put? (multiple instances/tasks)

James Heinrich 2008-10-23 23:08

[QUOTE=petrw1;145941]it strikes me as very unexpected that in your test 3 threads is slower than 2 and even 4 is slower than 2 with the 1024 FFT. Other sizes I extracted show the expected improvement I suppose.[/QUOTE]Take a look at my results in [url=http://www.mersenneforum.org/showthread.php?t=10208]this thread (post #5)[/url] -- take a look at the attachment and you can see the anomaly for 768, 896 and 1024 FFT sizes (where 2 threads is nearly twice as fast as 1 thread, but 3 threads reverts back to nearly the same time as 1 thread, and 4 threads still isn't as fast as 2). Fortunately for all FFT sizes above 1024 it gets faster (with diminishing returns, of course) as you throw more threads at it.

starrynte 2008-10-25 20:08

[quote=starrynte;145813][code]
Intel(R) Pentium(R) 4 CPU 3.00GHz
CPU speed: 3060.56 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.14, RdtscTiming=1
Best time for 512K FFT length: 17.214 ms.
Best time for 640K FFT length: 22.350 ms.
Best time for 768K FFT length: 27.172 ms.
Best time for 896K FFT length: 33.198 ms.
Best time for 1024K FFT length: 38.122 ms.
Best time for 1280K FFT length: 46.649 ms.
Best time for 1536K FFT length: 56.462 ms.
Best time for 1792K FFT length: 68.155 ms.
Best time for 2048K FFT length: 76.344 ms.
Best time for 2560K FFT length: 100.902 ms.
Best time for 3072K FFT length: 121.601 ms.
Best time for 3584K FFT length: 147.856 ms.
Best time for 4096K FFT length: 163.470 ms.
Best time for 58 bit trial factors: 9.453 ms.
Best time for 59 bit trial factors: 9.536 ms.
Best time for 60 bit trial factors: 9.343 ms.
Best time for 61 bit trial factors: 9.450 ms.
Best time for 62 bit trial factors: 13.165 ms.
Best time for 63 bit trial factors: 13.158 ms.
Best time for 64 bit trial factors: 15.198 ms.
Best time for 65 bit trial factors: 15.172 ms.
Best time for 66 bit trial factors: 15.356 ms.
Best time for 67 bit trial factors: 15.302 ms.
[/code]Hyperthreading enabled, benchmark was run with affinity set to both "cores"
Memory: 3 GB[/quote]
[code]
Intel(R) Pentium(R) 4 CPU 3.00GHz
CPU speed: 3060.63 MHz, with hyperthreading
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 25.7, RdtscTiming=1
Best time for 768K FFT length: 25.065 ms.
Best time for 896K FFT length: 30.673 ms.
Best time for 1024K FFT length: 34.442 ms.
Best time for 1280K FFT length: 42.977 ms.
Best time for 1536K FFT length: 51.798 ms.
Best time for 1792K FFT length: 63.482 ms.
Best time for 2048K FFT length: 70.768 ms.
Best time for 2560K FFT length: 93.152 ms.
Best time for 3072K FFT length: 110.999 ms.
Best time for 3584K FFT length: 134.020 ms.
Best time for 4096K FFT length: 150.764 ms.
Best time for 5120K FFT length: 189.191 ms.
Best time for 6144K FFT length: 238.677 ms.
Best time for 7168K FFT length: 288.895 ms.
Best time for 8192K FFT length: 317.715 ms.
Timing FFTs using 2 threads on 1 physical CPUs.
Best time for 768K FFT length: 23.380 ms.
Best time for 896K FFT length: 28.422 ms.
Best time for 1024K FFT length: 31.196 ms.
Best time for 1280K FFT length: 40.520 ms.
Best time for 1536K FFT length: 48.719 ms.
Best time for 1792K FFT length: 59.153 ms.
Best time for 2048K FFT length: 64.755 ms.
Best time for 2560K FFT length: 85.534 ms.
Best time for 3072K FFT length: 105.290 ms.
Best time for 3584K FFT length: 127.394 ms.
Best time for 4096K FFT length: 140.563 ms.
Best time for 5120K FFT length: 184.461 ms.
Best time for 6144K FFT length: 225.321 ms.
Best time for 7168K FFT length: 275.776 ms.
Best time for 8192K FFT length: 301.549 ms.
Best time for 58 bit trial factors: 8.966 ms.
Best time for 59 bit trial factors: 9.196 ms.
Best time for 60 bit trial factors: 9.116 ms.
Best time for 61 bit trial factors: 9.153 ms.
Best time for 62 bit trial factors: 12.603 ms.
Best time for 63 bit trial factors: 12.635 ms.
Best time for 64 bit trial factors: 14.812 ms.
Best time for 65 bit trial factors: 14.605 ms.
Best time for 66 bit trial factors: 14.794 ms.
Best time for 67 bit trial factors: 14.800 ms.
[/code]
same settings but with prime95 v25.7

Jeff Gilchrist 2008-10-26 23:30

Processor: [B]Q9550[/B] @ 3.6 GHz (overclocked, FSB 425 MHz)
Motherboard: [B]ASUS Maximus II Formula[/B]
OS: [B]Vista 64-bit[/B]
Memory: [B]4GB DDR2-1066[/B]
mprime: [B]25.7 build 3 Prime95 Windows64[/B]

[CODE][Sun Oct 26 19:25:23 2008]
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
CPU speed: 3612.62 MHz, 4 cores
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2, SSE4
L1 cache size: 32 KB
L2 cache size: 6 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
TLBS: 256
Prime95 64-bit version 25.7, RdtscTiming=1
Best time for 768K FFT length: 11.349 ms.
Best time for 896K FFT length: 13.704 ms.
Best time for 1024K FFT length: 15.643 ms.
Best time for 1280K FFT length: 19.539 ms.
Best time for 1536K FFT length: 24.027 ms.
Best time for 1792K FFT length: 28.624 ms.
Best time for 2048K FFT length: 31.802 ms.
Best time for 2560K FFT length: 41.749 ms.
Best time for 3072K FFT length: 51.379 ms.
Best time for 3584K FFT length: 61.101 ms.
Best time for 4096K FFT length: 68.241 ms.
Best time for 5120K FFT length: 87.185 ms.
Best time for 6144K FFT length: 105.502 ms.
Best time for 7168K FFT length: 127.791 ms.
Best time for 8192K FFT length: 139.663 ms.
Timing FFTs using 2 threads.
Best time for 768K FFT length: 5.999 ms.
Best time for 896K FFT length: 7.189 ms.
Best time for 1024K FFT length: 8.358 ms.
Best time for 1280K FFT length: 10.238 ms.
Best time for 1536K FFT length: 12.612 ms.
Best time for 1792K FFT length: 15.073 ms.
Best time for 2048K FFT length: 16.939 ms.
Best time for 2560K FFT length: 21.967 ms.
Best time for 3072K FFT length: 27.114 ms.
Best time for 3584K FFT length: 32.141 ms.
Best time for 4096K FFT length: 36.050 ms.
Best time for 5120K FFT length: 45.961 ms.
Best time for 6144K FFT length: 56.203 ms.
Best time for 7168K FFT length: 67.575 ms.
Best time for 8192K FFT length: 74.263 ms.
Timing FFTs using 3 threads.
Best time for 768K FFT length: 6.945 ms.
Best time for 896K FFT length: 7.851 ms.
Best time for 1024K FFT length: 11.722 ms.
Best time for 1280K FFT length: 8.816 ms.
Best time for 1536K FFT length: 10.691 ms.
Best time for 1792K FFT length: 12.695 ms.
Best time for 2048K FFT length: 14.382 ms.
Best time for 2560K FFT length: 18.672 ms.
Best time for 3072K FFT length: 23.025 ms.
Best time for 3584K FFT length: 27.082 ms.
Best time for 4096K FFT length: 30.635 ms.
Best time for 5120K FFT length: 37.543 ms.
Best time for 6144K FFT length: 45.222 ms.
Best time for 7168K FFT length: 54.311 ms.
Best time for 8192K FFT length: 60.726 ms.
Timing FFTs using 4 threads.
Best time for 768K FFT length: 6.236 ms.
Best time for 896K FFT length: 7.039 ms.
Best time for 1024K FFT length: 10.274 ms.
Best time for 1280K FFT length: 7.718 ms.
Best time for 1536K FFT length: 9.064 ms.
Best time for 1792K FFT length: 10.622 ms.
Best time for 2048K FFT length: 12.070 ms.
Best time for 2560K FFT length: 15.726 ms.
Best time for 3072K FFT length: 19.174 ms.
Best time for 3584K FFT length: 22.596 ms.
Best time for 4096K FFT length: 25.721 ms.
Best time for 5120K FFT length: 31.599 ms.
Best time for 6144K FFT length: 37.974 ms.
Best time for 7168K FFT length: 45.434 ms.
Best time for 8192K FFT length: 51.364 ms.
Best time for 58 bit trial factors: 2.358 ms.
Best time for 59 bit trial factors: 2.347 ms.
Best time for 60 bit trial factors: 2.687 ms.
Best time for 61 bit trial factors: 2.834 ms.
Best time for 62 bit trial factors: 3.233 ms.
Best time for 63 bit trial factors: 4.130 ms.
Best time for 64 bit trial factors: 4.477 ms.
Best time for 65 bit trial factors: 4.932 ms.
Best time for 66 bit trial factors: 4.905 ms.
Best time for 67 bit trial factors: 4.889 ms.
[/CODE]

petrw1 2008-11-04 22:56

[QUOTE=fivemack;143670]Get a 64-bit operating system; Vista or 64-bit XP. There's absolutely no point, particularly if you're doing maths and so 64-bit multiplications come in handy, in running a 32-bit OS on contemporary hardware.[/QUOTE]

I'm pretty much sold on the idea of 64 Bit Vista (I only shudder slightly as I type this).

If I do so is there any noticeable GIMPS benefit to having MORE THAN 4GB of RAM (DDR2-1066)?

Oh and does it really matter if I get Home / Premium / Business / Ultimate / etc versions?

cheesehead 2008-11-04 23:45

[quote=petrw1;147893]Oh and does it really matter if I get Home / Premium / Business / Ultimate / etc versions?[/quote]Only Business and Ultimate users are allowed by Microsoft to "down"grade to XP Pro.

[URL]http://download.microsoft.com/download/5/f/4/5f4c83d3-833e-4f11-8cbd-699b0c164182/royaltyoemreferencesheet.pdf[/URL]

[URL]http://www-307.ibm.com/pc/support/site.wss/VSTA-DWNGRD.html[/URL]

[URL]http://direct2dell.com/smallbusiness/archive/2008/05/01/windows-vista-downgrade-service-amp-xp-end-of-life.aspx[/URL]

[URL]http://www.mydigitallife.info/2008/08/22/how-to-downgrade-from-windows-vista-business-or-ultimate-oem-edition-and-install-windows-xp-professional-without-extra-charge/[/URL]

[URL]http://www.engadget.com/2007/09/21/microsoft-giving-vista-business-ultimate-users-downgrade-to/[/URL]

petrw1 2008-11-05 00:50

[QUOTE=cheesehead;147899]Only Business and Ultimate users are allowed by Microsoft to "down"grade to XP Pro.
[/QUOTE]

What about from the aspect of GIMPS performance, throughput, etc?


All times are UTC. The time now is 22:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.