mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2005-05-27, 03:59   #34
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

115710 Posts
Default

Quote:
Originally Posted by PhilF
At this point, as another sanity check, I would like to invite anyone else reading this thread who has a 128K L2 Celeron to run a benckmark and report it here.
A further note to this benchmark http://www.mersenneforum.org/showpos...0&postcount=88:
90% of the time when I start mprime on this machine it runs at the speed in this benchmark (actually faster now as I have overclocked it), but the other 10% of starts it runs much slower, sometimes 30-40% slower.

Once running it keeps running at the same speed it started at, so the first time I start mprime I check that it is running at full speed and if not I stop and then restart until it is.

Other programs I run on this machine (mainly gmp-ecm) don't display this behaviour. It is running Linux so I can't do a benchmark with the test program.
geoff is offline   Reply With Quote
Old 2005-05-27, 04:54   #35
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2E216 Posts
Default P4 1.7Ghz / 256K L2 Cache

George,

I put a P4 1.7Ghz Northwood with 256K L2 cache in the same board that has been running the 2.8Ghz Celeron we have been discussing here. Times were better, but still below where they should be. I did some checking, and discovered Windows XP was using some driver of its own for the integrated VGA. After loading the proper VGA drivers, the iteration times improved considerably. I thought I was on to something, so I re-installed the 2.8G Celeron. Now it is even slower than it was before!

So I have given up on the 128K Celeron, but I will post the new benchmark for it if you want. The benchmark attached to this post is the one for the P4 1.7Ghz 256K cache. If you would like the FullBench=1 version for this processor, let me know.

I feel like I have been leading you on a wild goose chase. On the other hand, it is possible the memory performance of this board is so bad that any differences in cache optimization will be exaggerated, making it easier to spot.
Attached Files
File Type: txt 1_7G256K.txt (4.8 KB, 174 views)
PhilF is offline   Reply With Quote
Old 2005-05-27, 05:06   #36
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

14448 Posts
Default

Benchmark results for C2200 attached
Attached Files
File Type: txt C2200_128.txt (9.4 KB, 149 views)
sdbardwick is offline   Reply With Quote
Old 2005-05-27, 07:28   #37
koekie
 
koekie's Avatar
 
Dec 2002
Amsterdam, Netherlands

7610 Posts
Default

Benchmarks for a P-IV 2,8Ghz with 1MB L2 (HP Cpmpaq d330 DT) 512MB ram
Attached Files
File Type: txt results.txt (9.4 KB, 148 views)
koekie is offline   Reply With Quote
Old 2005-05-27, 07:37   #38
ric
 
ric's Avatar
 
Jul 2004
Milan, Ita

35 Posts
Default Xeon 3.0GHz (1MB L2 cache)...

... running 2003 SP1. If of any interest, I've results for the benchmark of TST2 as well (but at first sight, they look the same).
Attached Files
File Type: txt Xeon_3_1ML2_results.txt (9.4 KB, 148 views)
ric is offline   Reply With Quote
Old 2005-05-27, 14:00   #39
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41·199 Posts
Default

Thanks everyone. I have enough data to analyze for now. I'm going to add a 10 level pass 2 option mainly for 128KB caches.

I need to understand pass 1 cache behavior better. It is not working quite as I expected.

We'll probably do this again soon.
Prime95 is offline   Reply With Quote
Old 2005-05-27, 14:29   #40
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

13428 Posts
Default

George, just an FYI. I find this quite interesting.

The 2.8Ghz 128K Celeron with such lousy P95 performance will encode 30 minutes of video in 5 hours.

It takes the much better for P95 1.7Ghz 256K P4 7 hours to encode that same video.
PhilF is offline   Reply With Quote
Old 2005-05-27, 21:20   #41
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010110102 Posts
Default

FFT-Size 32MB? OMFG

so we can do M79.300.000 in 4MB FFT,
so I expect we can do M600.000.000 with 32MB FFT

From ric's 3GHz Xeon benchmark I can read around 1.77s per iteration...
so one Test with an exponent around 600.000.000 will last ~34 years ;)
TheJudger is offline   Reply With Quote
Old 2005-05-28, 12:21   #42
ric
 
ric's Avatar
 
Jul 2004
Milan, Ita

35 Posts
Default 32M FFT

Quote:
Originally Posted by TheJudger
From ric's 3GHz Xeon benchmark I can read around 1.77s per iteration...
so one Test with an exponent around 600.000.000 will last ~34 years ;)
PRESTO!
I need to start right now, if I want to see the result (given average life duration) ;-P
ric is offline   Reply With Quote
Old 2005-05-29, 12:40   #43
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

1011010012 Posts
Default

Quote:
Originally Posted by PhilF
George, just an FYI. I find this quite interesting.

The 2.8Ghz 128K Celeron with such lousy P95 performance will encode 30 minutes of video in 5 hours.

It takes the much better for P95 1.7Ghz 256K P4 7 hours to encode that same video.
You found out, that different algorithms with different behaviour regarding memory accesses, calculations or the like, can show different results on different cores with different cache configurations (size and latency)

I suppose, such video encoding kernels are mostly optimized for L1 cache usage and throughput. The Northwood-based 128K Celeron has a faster but smaller L1 cache and a much higher clock speed than the Prescott-based 256k Celeron and it's core (Northwood) has a somewhat higher average performance per clock (especially with the 512kB L2 vs. 1MB L2 models).

OTOH every not cached data will cause a delay. And if an algorithm needs more cache than another, it may be hurt much more, if the cache is smaller. So, what you saw, is a normal situation in the world of CPUs

BTW, the integrated video steals bandwith from apps like Prime95. So try to use low resolutions with low refresh rate and less colors to improve iteration times. Maybe that was, what changed the iteration times after the driver change. Just remember, that drivers included with Windows are also written by the hardware vendors, not Microsoft.
Dresdenboy is offline   Reply With Quote
Old 2005-05-29, 16:31   #44
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2×32×41 Posts
Default

Quote:
Originally Posted by Dresdenboy
BTW, the integrated video steals bandwith from apps like Prime95. So try to use low resolutions with low refresh rate and less colors to improve iteration times. Maybe that was, what changed the iteration times after the driver change. Just remember, that drivers included with Windows are also written by the hardware vendors, not Microsoft.
Good points. But with that 128K Celeron, iteration times were still absolutely horrible even when booted with text based linux and using mprime. I think the only hope for 128K Celerons (especially when used with a motherboard with slow memory) is for George to figure out a way to better optimize the code for such a small cache.

I suppose I could try disabling the integrated video and installing a PCI video card, but I do remember trying the Celeron on another motherboard that did not have integrated video and getting about the same iteration times.

I did some more speed comparisons, and came up with the following:
---------------------------------------

2.8Ghz Celeron, 128K L2 cache:

1792K FFT iteration time: 243 ms.
5 hours to encode 30 minutes of video.

---------------------------------------

1.7Ghz P4, 256K L2 cache:

1792K FFT iteration time: 146 ms.
7.5 hours to encode 30 minutes of video.

---------------------------------------
To me, these are amazing differences.
PhilF is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmarks MurrayInfoSys Information & Answers 3 2011-04-14 17:10
LLR benchmarks Oddball No Prime Left Behind 11 2010-08-06 21:39
benchmarks Unregistered Information & Answers 15 2009-08-18 16:44
Benchmarks for i7 965 lavalamp Hardware 21 2009-01-06 04:32
Benchmarks Vandy Hardware 6 2002-10-28 13:45

All times are UTC. The time now is 07:44.


Thu Feb 2 07:44:10 UTC 2023 up 168 days, 5:12, 1 user, load averages: 0.97, 0.84, 0.84

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔