mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2012-01-28, 16:08   #1
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3,943 Posts
Default Memory settings questions

In testing 27.3, my Sandy Bridge appears to be limited by memory bandwidth. I'm not an expert in this area. Can someone confirm that my memory settings are optimal (or at least reasonable)?

I have two sticks of OCZ OCZ3P2000LV2G. The chip is I5-2500K with a multiplier of 41 instead of the stock 33. CPU-Z says memory is running at 800 MHz, FSB:DRAM ratio is 1:6, CAS-RAS-etc is 9-9-9-20-1T.

Would adding two more sticks improve memory bandwidth?
Prime95 is offline   Reply With Quote
Old 2012-01-28, 18:01   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·3,943 Posts
Default

I just tried to up the memory to PC1866. It boots but prime95 gives a BSOD. Maybe if I up the memory voltages or play with the timings, I can get this to work.
Prime95 is offline   Reply With Quote
Old 2012-01-28, 18:11   #3
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

1015810 Posts
Default

My own observations indicate that those timings are in a normal range. My partner's i7-920 runs triple channel at 8-8-8-?, I think. (Can't check at the moment.) It is all stock settings, and the RAM is actually rated (and running) at 1333, or 667MHz. My PhenomII 1090T has Corsair (4x4GB) dual channel, which is rated at 667MHz, but runs happily at 800. The SPD timings are 9-9-9-24-41 2T, but it's running at 1T without problems.

It didn't apply to this PhenomII setup, but on an earlier 2 channel board, going from 2 sticks to 4 forced changing from 1T to2T. I just added two sticks to the current board, and was able to keep the 1T.

Unless adding 2 more sticks let you have more channels running, I don't think it would improve bandwidth.

There is probably someone around here who knows i5 setups better than I do.

EDIT: Would reducing latency at the same frequency improve your situation?

Last fiddled with by kladner on 2012-01-28 at 19:00
kladner is offline   Reply With Quote
Old 2012-01-28, 19:07   #4
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

What are the symptoms that make you think it's memory bandwidth? I generally don't notice anything out of the ordinary on my 2600K (typically it's at 39 multiplier equivalent) and I'm running memory at 667Hz/DDR3-1333 settings (and I think standard 9-9-9-24 timings).
Dubslow is offline   Reply With Quote
Old 2012-01-28, 20:59   #5
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3,943 Posts
Default

Quote:
Originally Posted by Dubslow View Post
What are the symptoms that make you think it's memory bandwidth? I generally don't notice anything out of the ordinary on my 2600K (typically it's at 39 multiplier equivalent) and I'm running memory at 667Hz/DDR3-1333 settings (and I think standard 9-9-9-24 timings).
Running version 27.3, one core has per iteration time of about 15 ms. Running 3 cores, I get about 16 ms. each. Running 4 cores I get about 19 ms. each.
Prime95 is offline   Reply With Quote
Old 2012-01-28, 21:31   #6
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default

Quote:
Originally Posted by Dubslow View Post
What are the symptoms that make you think it's memory bandwidth? I generally don't notice anything out of the ordinary on my 2600K (typically it's at 39 multiplier equivalent) and I'm running memory at 667Hz/DDR3-1333 settings (and I think standard 9-9-9-24 timings).
Quote:
Originally Posted by Prime95 View Post
Running version 27.3, one core has per iteration time of about 15 ms. Running 3 cores, I get about 16 ms. each. Running 4 cores I get about 19 ms. each.
I have seen this over at PrimeGrid. A non-hyperthreaded Core i7-2600K @ 4.4 GHz (8 MB L3 cache) was more than 90 minutes faster (9.75 hours instead of 11.25) when running 4 321-LLR tasks (LLR 3.8.x / gwnumlib 26.x) than a Core i5-2500K @ 4.4 GHz (6 MB L3 cache). With only 3 tasks running the Core i5-2500K was as fast as the Core i7-2600K with 4 tasks running.
Ralf Recker is offline   Reply With Quote
Old 2012-01-28, 21:33   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

173168 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Running version 27.3, one core has per iteration time of about 15 ms. Running 3 cores, I get about 16 ms. each. Running 4 cores I get about 19 ms. each.
With my memory running at 800 MHz, I timed a simple memory read and memory read/write loop. Reads takes 15 clocks per 64-byte cache line. Read/writes take about 24 clocks per cache line.

2560K FFT has 20MB of data. A 2-pass FFT reads/writes this data twice. It must also read about 8MB of sin/cos data. So best case time in the 4 core case on my 4.1GHz machine is: ((20M * 2) / 64 * 24 + 8M / 64 * 15) / 4.1G * 4, or about 17 ms.

Thus, unless I can improve my memory bandwidth I won't be able to get 4 cores to run as fast as the single core case.
Prime95 is offline   Reply With Quote
Old 2012-01-28, 21:41   #8
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

I've always experienced something like that, though usually 4-core-times/1-core-times < 1.1, so it's not too bad. Either way, 4*.9 > 2*.95 > 1. The nature of our computer tech means that we'll generally be memory limited, unless either CPU caches get larger or memory gets significantly faster. (I've been asking myself for the last 6 months how much the extra 2 MB L3 cache on 2600 vs. 2500 is, i.e. is it worth the extra $100 price difference?)
Dubslow is offline   Reply With Quote
Old 2012-01-28, 22:54   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3,943 Posts
Default

I can live with a 4-cores/1-cores timing ratio of 1.1. A 19/15 ratio of 1.26 leaves a lot of potential performance on the table.

I need to ponder whether there are any changes I can make to the FFT that increase the number of floating-point ops but decrease memory usage. This would slow down the FFT in the 1-core case, but increase throughput in the 4-core case.

Remember I'm not optimizing for my machine, I'm trying to improve the throughput of all Sandy Bridge systems. I started this thread primarily to make sure my system is configured properly and thus represents a typical Sandy Bridge system.

P.S. I'm now running at 930MHz with timings of 9-9-9-30 with a command rate of 2T. This is a little bit faster than the previous 800 MHz setup.
Prime95 is offline   Reply With Quote
Old 2012-01-28, 22:57   #10
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

13×43 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I just tried to up the memory to PC1866. It boots but prime95 gives a BSOD. Maybe if I up the memory voltages or play with the timings, I can get this to work.
There are some answers on OCZ forum.

As far as increasing bandwidth goes, IIRC i5 2500K only supports dual channel, so if you already have 2 DIMM, one for each channel, I don't think adding DIMMs will increase the bandwidth.
ldesnogu is offline   Reply With Quote
Old 2012-01-29, 01:08   #11
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21278 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I have two sticks of OCZ OCZ3P2000LV2G. The chip is I5-2500K with a multiplier of 41 instead of the stock 33. CPU-Z says memory is running at 800 MHz, FSB:DRAM ratio is 1:6, CAS-RAS-etc is 9-9-9-20-1T.

Would adding two more sticks improve memory bandwidth?
Well, I guess you've populated your two modules correct (one module per memory channel). When you add another pair of memory you have more ranks per channel. For a fixed clockrate the theoretical bandwidth is the same but the practical bandwidth can be a little bit higher because read & writes to different ranks have a lower latency than a read & write to the same rank. With more ranks there is a higher chance that memory acceses are not on the same rank. On the other hand more ranks usually means lower clockrate which has a much bigger penalty. As an example here you can see the validated clockrates for unbuffered DIMMs on a Xeon 55xx series CPU. Those numbers may change for other CPUs but generally more modules(ranks) per channel yield a lower clockrate. For best performance you usually want a single dual-rank module per channel with the highest supported/possible clockrate.


Quote:
Originally Posted by Prime95 View Post
I can live with a 4-cores/1-cores timing ratio of 1.1. A 19/15 ratio of 1.26 leaves a lot of potential performance on the table.
Keep in mind that you have to share L3 capacity and bandwidth aswell. Another reason for bad scaling can be the turbomode (higher clockspeed for a single core, lower clockspeed when all four cores are busy) but I guess this is not the reason in your case.

Oliver
TheJudger is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Memory Settings Fred Software 5 2016-05-03 00:51
Stage 2 Memory Settings gamer30 Software 17 2012-08-23 20:02
"Hybrid Memory Cube" offers 1 Tb/s memory bandwith at just 1.4 mW/Gb/s ixfd64 Hardware 4 2011-12-14 21:24
optimal memory settings for the P-1 stage S485122 Software 16 2007-05-28 12:08
Memory settings and swapping problem MaxP Hardware 3 2003-12-20 05:08

All times are UTC. The time now is 00:34.


Fri May 20 00:34:34 UTC 2022 up 35 days, 22:35, 2 users, load averages: 1.36, 1.51, 1.49

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔