mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2005-08-12, 21:34   #1
Knappo
 
Knappo's Avatar
 
Aug 2005
69469, Germany

3×5 Posts
Question Normal time per iteration? (P4 540)

I'm using my normal Pentium IV 3,2 Ghz (540 - Prescott-core, 1MB L2) on WinXP SP2 to do those "senseless" shoots in the dark by lucas-lehmering large numbers. (Hey, those CPUs are quite expensive, therefore I want my money back :-).

But after I've looked at the benchmark-page at mersenne.org I'm wondering whether either my CPU is faulty or the table...

Version 24.13 reported the following bench:

Intel(R) Pentium(R) 4 CPU 3.20GHz
CPU speed: 3118.19 MHz
CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 32-bit version 24.13, RdtscTiming=1
Best time for 512K FFT length: 27.816 ms.
Best time for 640K FFT length: 37.417 ms.
Best time for 768K FFT length: 45.254 ms.
Best time for 896K FFT length: 54.743 ms.
Best time for 1024K FFT length: 58.854 ms.
Best time for 1280K FFT length: 76.560 ms.
Best time for 1536K FFT length: 93.805 ms.
Best time for 1792K FFT length: 115.074 ms.
Best time for 2048K FFT length: 126.422 ms.
Best time for 2560K FFT length: 166.760 ms.
Best time for 3072K FFT length: 204.985 ms.
Best time for 3584K FFT length: 249.574 ms.
Best time for 4096K FFT length: 277.491 ms.
Best time for 58 bit trial factors: 12.358 ms.
Best time for 59 bit trial factors: 12.874 ms.
Best time for 60 bit trial factors: 12.880 ms.
Best time for 61 bit trial factors: 12.659 ms.
Best time for 62 bit trial factors: 20.187 ms.
Best time for 63 bit trial factors: 20.166 ms.
Best time for 64 bit trial factors: 19.544 ms.
Best time for 65 bit trial factors: 18.617 ms.
Best time for 66 bit trial factors: 19.711 ms.
Best time for 67 bit trial factors: 19.622 ms.

In comparison to http://www.mersenne.org/bench.htm my CPU is only half as fast as it should....

It would be nice if somebody could explain the difference to me...

Tnx, Knappo
Knappo is offline   Reply With Quote
Old 2005-08-13, 04:48   #2
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

Were you, perhaps, running your L-L test at the same time as you ran the benchmark?

If so, then your timings make sense. They're similar to benchmarks reported by patrik, also for a Prescott with 1MB L2 cache, in his Feb 19 posting at http://www.mersenneforum.org/showpost.php?p=50131&postcount=166 in the "Perpetual benchmark thread..." elsewhere in this subforum.

Look at "2. Benchmark while Test=27013621,67,1 running in the background."

Code:
Intel(R) Pentium(R) 4 CPU 3.20GHz
CPU speed: 3207.83 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 16 KB
L2 cache size: 1024 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 version 23.9, RdtscTiming=1
Best time for 384K FFT length: 22.850 ms.
Best time for 448K FFT length: 27.544 ms.
Best time for 512K FFT length: 30.459 ms.
Best time for 640K FFT length: 37.607 ms.
Best time for 768K FFT length: 45.587 ms.
Best time for 896K FFT length: 54.744 ms.
Best time for 1024K FFT length: 60.321 ms.
Best time for 1280K FFT length: 80.209 ms.
Best time for 1536K FFT length: 98.357 ms.
Best time for 1792K FFT length: 118.863 ms.
Best time for 2048K FFT length: 133.361 ms.
Best time for 2560K FFT length: 168.011 ms.
Best time for 3072K FFT length: 208.208 ms.
Best time for 3584K FFT length: 253.071 ms.
Best time for 4096K FFT length: 285.369 ms.
(And the improvement in speed between v23.9 and v24.13 would be only a few percent for Prescotts, consistent with what we see here.)

Last fiddled with by cheesehead on 2005-08-13 at 05:00
cheesehead is offline   Reply With Quote
Old 2005-08-13, 08:41   #3
Knappo
 
Knappo's Avatar
 
Aug 2005
69469, Germany

3·5 Posts
Default

Actually i did the bench with affinity set to 1 (so the process had 50% (one complete virtual proc)). Theoretically you could argument, that there is a second proc (thanks to HyperThreading) and that the CPU can nearly double the output when affinity is set and two prime95-instances are running.

But the question remains.... You vote for a failure in the table?
Knappo is offline   Reply With Quote
Old 2005-08-13, 09:15   #4
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

5458 Posts
Default

Quote:
Originally Posted by Knappo
Actually i did the bench with affinity set to 1 (so the process had 50% (one complete virtual proc)). Theoretically you could argument, that there is a second proc (thanks to HyperThreading) and that the CPU can nearly double the output when affinity is set and two prime95-instances are running.

But the question remains.... You vote for a failure in the table?
There has been a lot of talk about HyperThreading on these forums already.
But to summarize, NO you cannot argue that there is a second processor and you will NOT double the output with affinity set and run two instances. When running 2 instances of Prime95, one on CPU0 (the real processor) and one on CPU1 (the virtual processor) you will about halve your results.

So try running with affinity set at CPU0 and only one instance.

Last fiddled with by delta_t on 2005-08-13 at 09:16
delta_t is offline   Reply With Quote
Old 2005-08-13, 09:53   #5
Knappo
 
Knappo's Avatar
 
Aug 2005
69469, Germany

3·5 Posts
Default

Ok, I've tested that. The point is, there is in fact a speedup (nearly 200%). But when i run two instances of Prime, I can check two exponents parallel and therefore it is the same output rate. So the time per iteration drops, but the general output remains the same (two instances with 50% of the speed of one instance).
Knappo is offline   Reply With Quote
Old 2005-08-13, 10:12   #6
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

16516 Posts
Default

Yes, that's right. LL-test away.
delta_t is offline   Reply With Quote
Old 2005-08-13, 11:52   #7
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

11110000011002 Posts
Default

Quote:
Originally Posted by Knappo
Actually i did the bench with affinity set to 1 (so the process had 50% (one complete virtual proc)).
So 100% of a virtual proc = 50% of a real proc (when the other virtual proc is equally loaded). When you time a virtual proc's performance with a real clock, you find that you can't fool Mother Nature.

Quote:
Theoretically you could argument, that there is a second proc (thanks to HyperThreading) and that the CPU can nearly double the output when affinity is set and two prime95-instances are running.
Well, it can nearly double its virtual output, but you'd have to use two virtual clocks, one to each virtual CPU, in order to see this virtual doubling. (Can't fool ...) When you use a real clock, you get real measurement.

Quote:
But the question remains.... You vote for a failure in the table?
No. The table is for results when only the benchmark is running. There's nothing wrong with the table.

If you could time a benchmark running on one virtual CPU with a virtual clock whose rate was proportional to the percentage of real CPU that the virtual CPU received, you'd get results similar to the real clock/real CPU case. When the virtual CPU gets 50% of the real CPU and the virtual clock therefore runs 50% as fast as the real clock, your virtual timings would look similar to the real clock/real CPU case (which is what the benchmark table shows).

Quote:
Ok, I've tested that. The point is, there is in fact a speedup (nearly 200%).
See? When you're doing only one task with the (real) CPU, it's twice as fast as when the (real) CPU is splitting its (real) time between two virtual CPUs, and you'll see the same speedup when you run the benchmark with nothing else going.
cheesehead is offline   Reply With Quote
Old 2005-08-13, 16:31   #8
Knappo
 
Knappo's Avatar
 
Aug 2005
69469, Germany

3×5 Posts
Default

Tnx cheesehead, those points where resolved by my posting from this morning.
1. The benchmark is faster when it is running without any aditional task on any virtual/real processor (about 200% - as said)
2. The "time per iteration"-value in "normal" prime95 operation is also around 50% when running alone.
3. The Prime95 output remains the same, because it does not matter, whether I run two instances at 50% of the speed or one instance with 100%.
Knappo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Per iteration time Jwb52z PrimeNet 6 2011-09-09 04:06
Time per iteration em99010pepe Riesel Prime Search 7 2007-08-30 08:54
iteration time under XP Unregistered Software 20 2004-09-30 06:35
What's your per iteration time? hyh1048576 Hardware 34 2003-08-30 05:49
iteration time log crash893 Software 1 2002-11-13 05:45

All times are UTC. The time now is 08:39.


Sun Oct 24 08:39:55 UTC 2021 up 93 days, 3:08, 0 users, load averages: 1.14, 1.25, 1.17

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.