mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2005-05-26, 21:23   #23
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

23·1,021 Posts
Default

PhilF, your numbers should look like this:

timer 0: 50134272
timer 1: 49263768
timer 2: 516596
timer 4: 5440508
timer 5: 4377412
timer 6: 10304916
timer 9: 8565792
timer 10: 9682704
timer 13: 4817356
timer 14: 4815880
timer 16: 7625880
timer 17: 4390924
timer 18: 6869292
timer 20: 11805536
timer 21: 11852772
timer 24: 4928024
timer 26: 27772

Timer 0 is total clock cycles spent in pass2
Timer 1 is total clocks spent in pass 1
Timers 4-14 are clocks spent in various stages of pass 2
Timers 16-24 are clocks spent in various stages of pass 1

Your horrific timer 4 value indicates the start of pass 2 is being read out of memory instead of the cache. Timers 5-14 are also bad indicating that there are more L2 cache misses than there should be (I assume because prefetch instructions are tossing needed data out of the cache).

Your timer 16-24 values are also bad, but I don't know why. I think the data should fit in 128KB.

The bad news is that to fix pass 2 (write a 10 level pass2 instead of 11 levels), that will double the amount of memory that pass 1 must process.


Just as a sanity check, can you make sure the L2 cache is enabled by the BIOS? or run some kind of memory latency test program that verifies the size and speed of the L2 cache?

Last fiddled with by Prime95 on 2005-05-26 at 21:32
Prime95 is online now   Reply With Quote
Old 2005-05-26, 21:45   #24
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2×32×41 Posts
Default

Running Memtest v1.51, I get the following speeds:

L1 Cache: 8K 19580 MB/s
L2 Cache: 128K, 19580 MB/s
Memory: 262MB 578 MB/s

Cache is definitely enabled in the BIOS. Also, I have tried two different 128K L2 cache chips in three different motherboards, all with the same dismal P95 speed but generally fast for everything else.
PhilF is offline   Reply With Quote
Old 2005-05-26, 23:15   #25
delta_t
 
delta_t's Avatar
 
Nov 2002
Anchorage, AK

1011001012 Posts
Default

How do those Pentium M numbers look?
delta_t is offline   Reply With Quote
Old 2005-05-26, 23:57   #26
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

23·1,021 Posts
Default

Quote:
Originally Posted by delta_t
How do those Pentium M numbers look?
Unfortunately for you, there are no numbers that are unusually high. Timer 4 and/or 16 would be high if there was a lot of data being read from main memory instead of the cache.

I think the reason the Pentium M is slower is because it is based on the inferior P3 FPU and probably has less cache bandwidth than the P4.
Prime95 is online now   Reply With Quote
Old 2005-05-27, 00:14   #27
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1FE816 Posts
Default

PhilF, those timers are so out-of-whack I am baffled. For example, timer 17 comes right after timer 16 has copied all of 16KB of FFT data to a scratch area. There is no way that 16KB scratch area's data could have been ousted from the cache yet your machine is taking 4 times as many clocks to process that 16KB of data!!

A look at the benchmarks page shows that there are 2.0 GHz Celerons that are faster than yours.

I'm grasping at straws... Is your CPU hyper-threaded? Do you have a separate video card?


See http://www.mersenneforum.org/showpos...1&postcount=77 - 500 MHz slower yet a lot better iteration times. And http://www.mersenneforum.org/showpos...0&postcount=88 too. And http://www.mersenneforum.org/showpos...&postcount=164

Last fiddled with by Prime95 on 2005-05-27 at 00:34
Prime95 is online now   Reply With Quote
Old 2005-05-27, 00:23   #28
StickBoy
 
May 2005

216 Posts
Default

AMD64 3200+ Winchester (OC'd to 3800+ speeds @ 2.4GHz) 512KB cache
Attached Files
File Type: txt results.txt (8.3 KB, 147 views)
StickBoy is offline   Reply With Quote
Old 2005-05-27, 00:32   #29
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2×32×41 Posts
Default

At this point, as another sanity check, I would like to invite anyone else reading this thread who has a 128K L2 Celeron to run a benckmark and report it here.

It isn't HT, there isn't any such thing as a HT Celeron, except for maybe a Celeron D. This motherboard does not support HT anyway, and has a max FSB of 400 Mhz. Also, this motherboard has built-in VGA, using the SiS chip set. The other motherboard I tried is an Intel 845WM motherboard, which has a PCI VGA card installed.

I'm grasping at straws too:

Those two benchmarks you mentioned were done with v23.5 and v23.7 of P95. Is it possible something changed in 23.8 that broke the 128K cache size optimization?

Is it possible it has anything to do with SSE2? I ask because even though P95 reports it as SSE2 capable, maybe it isn't using it properly.

I just ran a benchmark using mprime and linux booted from a floppy (pure text mode). The 1024K best time is 131.266 ms (just as bad as P95). I don't know if that rules anything out, but I thought it might be helpful to report.

Last fiddled with by PhilF on 2005-05-27 at 00:37
PhilF is offline   Reply With Quote
Old 2005-05-27, 00:58   #30
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2×32×41 Posts
Default

I saw where one of those benchmarks you mentioned was run on 24.6. I tried that, and while some FFT sizes were faster, it is still way below par:

Intel(R) Celeron(R) CPU 2.80GHz
CPU speed: 2799.66 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
L1 cache size: 8 KB
L2 cache size: 128 KB
L1 cache line size: 64 bytes
L2 cache line size: 128 bytes
TLBS: 64
Prime95 version 24.6, RdtscTiming=1
Best time for 512K FFT length: 55.836 ms.
Best time for 640K FFT length: 56.837 ms.
Best time for 768K FFT length: 72.752 ms.
Best time for 896K FFT length: 78.432 ms.
Best time for 1024K FFT length: 91.140 ms.
Best time for 1280K FFT length: 141.583 ms.
Best time for 1536K FFT length: 177.466 ms.
Best time for 1792K FFT length: 229.740 ms.
Best time for 2048K FFT length: 277.671 ms.

Now I am worried the problem is on my end, but I've ruled out everything I can think of.
PhilF is offline   Reply With Quote
Old 2005-05-27, 02:34   #31
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

205618 Posts
Default

Quote:
Originally Posted by sdbardwick
All Celerons starting from the C300A (1998) were 128KB until the fairly recent introduction of the 256KB Celeron D, so I'd speculate that the vast majority of the Celeron user base is 128KB.
Well, there were these:

http://www.geek.com/procspec/intel/p...ualatincel.htm
Xyzzy is offline   Reply With Quote
Old 2005-05-27, 02:46   #32
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

22·3·67 Posts
Default

Drat! Had a sinking feeling I overstated the case (e.g. the C-M post), but couldn't remember which models bumped the cache, or even if P3 based ones ever did.

As amends, I'll hook up my C2000 and run a test for George when I get home.
sdbardwick is offline   Reply With Quote
Old 2005-05-27, 03:08   #33
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

22×3×67 Posts
Default

Celeron 2000 @ 2200, 128KB L2, Win2K SP4, full bench to follow

[Thu May 26 20:22:35 2005]
timer 0: 68889412
timer 1: 79066828
timer 2: 565664
timer 4: 11827912
timer 5: 5208252
timer 6: 16336508
timer 9: 10526548
timer 10: 11680360
timer 13: 6249304
timer 14: 5607952
timer 16: 13887784
timer 17: 7118996
timer 18: 8799880
timer 20: 14880708
timer 21: 18463204
timer 24: 14317144
timer 26: 48148
sdbardwick is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmarks MurrayInfoSys Information & Answers 3 2011-04-14 17:10
LLR benchmarks Oddball No Prime Left Behind 11 2010-08-06 21:39
benchmarks Unregistered Information & Answers 15 2009-08-18 16:44
Benchmarks for i7 965 lavalamp Hardware 21 2009-01-06 04:32
Benchmarks Vandy Hardware 6 2002-10-28 13:45

All times are UTC. The time now is 21:24.


Tue Feb 7 21:24:28 UTC 2023 up 173 days, 18:53, 1 user, load averages: 1.19, 1.02, 0.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔