mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Best settings and upgrade path for i7-2600 (https://www.mersenneforum.org/showthread.php?t=16541)

emily 2012-02-18 19:43

Best settings and upgrade path for i7-2600
 
What are the optimal worker settings for LL testing on Sandy Bridge Core i7-2600? And what would be a sensible upgrade path for me? (SNB-E or Ivy Bridge?)

I overclock to 3.9 GHz (the max for non-K CPU), I use mprime with Debian GNU/Linux and my CPU has 4 hyperthreaded cores for 8 threads total with 8MB of L3 cache (95W TDP). I use 16 GB of dual-channel RAM (at 1333MHz), allowing up to 13GB for P-1, and a 500MB/s SSD for the swap partition. The PC is also used for other tasks and operates 24/7 (eating up 350W of power).

My goal is to maximize my chances of finding a prime as soon as possible while minimizing power usage and keep my CPU temperature low. At the same time, I want to be able to use the PC for other tasks without mprime affecting me much. The other tasks include CPU-intensive graphics/video tasks but not gaming, as well as web browsing.

With 8 workers (1 thread each) doing LL testing on 8 exponents my per-iteration times are in the 0.060s and my CPU temperature is up to 71-73 C with CPU usage at 100% (it's 40 C when it's 0% in use). When I use te PC for other tasks, mprime slows down a bit (iteration time 0.075).

With 4 workers (2 threads each) doing LL on 4 exponents the per-iteration time is 0.034 and CPU temp is 68 C with CPU usage at 97-99%.

With 4 workers (1 thread each) doing LL on 4 exponents the per-iteration time is 0.031 and CPU usage is 55%. This frees up 4 threads on the processor for other tasks.

With 2 workers (2 threads each) doing LL on 2 exponents the per-iteration time is 0.025.

With 2 workers (4 threads each) doing LL on 2 exps the per-iteration time is 0.018 at 86% CPU usage.

In all cases each worker runs on a different core. What settings should I use?

What upgrade path would you recommend for me? There is 4-core Sandy Bridge E (SNB-E) for increased memory bandwidth (quad-channel vs dual-channel) and more overclockability at 130W TDP (300 Euro). There's also the 6-core/12-thread SNB-E (same watts and memory) but it's too pricey (500 Euro). But if I wait I'd get an Ivy Bridge (higher IPC at 77W TDP) or wait even more for Ivy Bridge E. (I don't use Intel HD graphics).

How much would the quad-channel memory bandwidth of SNB-E help me for mprime? Is it worth to get the 4-core/quad-channel 3820 or go for the 6-core 3860? I'll overclock in any case. Any idea what the TDP of Ivy Bridge E is going to be and when it will be available in EU?

I believe AMD doesn't have anything to offer to me as an upgrade for the i7, right?

Prime95 2012-02-18 20:19

First step is to go here to get v27.3: [url]http://mersenneforum.org/showthread.php?t=16535[/url]

Next up, try disabling hyperthreading in the BIOS. You very likely will run cooler and maybe a tiny bit faster and a little less power draw.

If you can, try running memory at 1600 MHz. (Can a non-K Sandy Bridge do that?)

As for upgrading, you have a great machine right now. A SNB-E is an expensive upgrade as you'll need a new motherboard too. Faster memory is your cheapest upgrade (if your machine can run memory faster). Ivy Bridge may not be a cost-effective upgrade as mprime may be memory bandwidth limited. The thread above is trying to figure that out. Forget about AMD.

Dubslow 2012-02-18 21:53

Doing more than one worker per physical core is usually less efficient. The thread mentioned above is mostly about memory bandwidth limitations (besides the new MPrime version), and has quite a few SB-E benchmarks comparing different settings.

Also, if you have a discrete GPU, consider running mfakto/mfaktc/CUDALucas (see the GPU subforum [url=http://www.mersenneforum.org/forumdisplay.php?f=92]here[/url]).

emily 2012-02-19 00:06

Non-SNB-E i7 CPUs officially support up to 1333 MHz RAM, but very often they can go higher. SNB-E support 1600 MHz. Thanks for the suggestions to use the discrete GPU and mprime 27.3, I'll try both!

As for the HT, MPrime might prefer it disabled but won't this slow down other tasks and the OS? If I run 4 workers on 4 threads on the 4 cores, wouldn't it help the other tasks to have 4 more threads available?

Dubslow 2012-02-19 00:53

When I had hyperthreading active, I'd do as I just posted in the other thread:

In Linux (Ubuntu at least) logicals pair to physicals as [0,4] [1,5] [2,6] [3,7]. So I run four workers with two threads each. Worker 1 thread 1 runs on core 0, worker 1 thread 2 runs on core 4, Worker 2 thread 1 runs on core 1, etc. I get about the same throughput that was as if HT was off. On the other hand, all logical cores are occupied, so it won't increase responsiveness. However, I've generally found everything to be responsive anyways. Various things you can try are listed in undoc.txt: particular things I'd put you to is setting PauseWhileRunning, LowMemWhileRunning, and Nice (for the last one it'd be Nice=19).

emily 2012-02-19 11:08

Thanks for the useful answers. I wonder, what does the helper thread do in mprime's workers?

I know hyperthreading is about allowing two instructions to use the processor pipeline concurrently as long as they need different execution resources on the pipeline. But I thought all mprime's calculation iterations do the same thing, no?

emily 2012-02-19 12:39

by the way, with mprime 27.3 which uses AVX on my 3.9GHz i7, with 4 workers (1 thread each, HT still enabled, RAM still at 1333) the per-iteration time is 0.027-0.033ms :)

Dubslow 2012-02-19 18:50

[QUOTE=emily;289954]Thanks for the useful answers. I wonder, what does the helper thread do in mprime's workers?

I know hyperthreading is about allowing two instructions to use the processor pipeline concurrently as long as they need different execution resources on the pipeline. But I thought all mprime's calculation iterations do the same thing, no?[/QUOTE]
I can't really answer these questions directly, but I can make the general comment that I personally believe the extra thread is more for the OS's scheduler benefit than for actually speeding up the test, at least when it comes to HT. If you have 2 threads for one worker on two cores, in other words actual multithreading, then someone else will have to answer your questions.

ldesnogu 2012-02-19 23:10

[QUOTE=emily;289954]I know hyperthreading is about allowing two instructions to use the processor pipeline concurrently as long as they need different execution resources on the pipeline. But I thought all mprime's calculation iterations do the same thing, no?[/QUOTE]
In fact, hyperthreading will switch threads when an operation stalls the processor for a rather long time, for instance when there's a need to fetch data from main memory. That's the reason why highly tuned programs such as mprime don't benefit from it.

emily 2012-02-20 18:41

Attention MSI Z68A-GD65 Gen3 motherboard and Corsair Vengeance DDR3-1600/C8 owners: BIOS 23.4 and lower won't let you run this RAM at 1600MHz, you'll need BIOS 23.6!

Now I disabled hyperthreading, run the RAM at 1600MHz (CPU still 3.9GHz) and I try mprime (AVX) with 4 single-thread workers spanned at the four cores: the per-iteration time is 0.021-0.024ms :D (down from 0.027-0.033ms when HT was enabled and RAM was 1333MHz).

Looks like I'll let HT disabled :)

And the temperature without HT is 65 C now... cores 1 and 4 are 62-63 C and cores 2 and 3 are 65-66 C. This is I think a little bit lower than when running 4 LL threads with HT on, and a lot lower than when running 8 LL threads with HT (73 C on that case!)

Tip: to see temps on i7 under GNU/Linux use the i7z program.

fivemack 2012-02-21 00:11

[QUOTE=emily;290128]Now I disabled hyperthreading, run the RAM at 1600MHz (CPU still 3.9GHz) and I try mprime (AVX) with 4 single-thread workers spanned at the four cores: the per-iteration time is 0.021-0.024ms :D (down from 0.027-0.033ms when HT was enabled and RAM was 1333MHz).

Looks like I'll let HT disabled :)[/QUOTE]

Umm, you have changed two things at once, and I suspect it's the faster RAM that makes most of the difference - what changes when you turn HT back on?

emily 2012-02-21 00:40

I'll test it the next time I reboot (I rarely reboot).

Since HT isn't useful for mprime, the only advantage of i7 compared to i5 is the extra 2MB L3 cache (8MB vs 6MB). [B]Is the 8MB cache useful for prime?[/B] or should I expect the same performance from an i5 for the same overclock? But since the difference between i7 and i5 is about 100 EUR I think it probably doesn't worth it...

Since the machine achieves 0.018s per-iteration time on one core, I think the 0.023s speed on 4 cores must be due to the memory bandwidth not being enough to feed all cores... So I'll try to see if I can put faster RAM on it!

The motherboard supports 2133 RAM, the highest I can find is DDR3-2000 RAM and the CPU officially goes up to 1333 (but works OK with my 1600). [B]Is there anyone here running i7-2600 (non-K/non-E/non-X) with the RAM set at 2000?[/B]

[B]Suggest me a worthwhile RAM upgrade for prime?[/B] I've 16GB of DDR3-[B]1600[/B] at [B]8-8-8-24[/B]. I can either give 260 EUR for the same amount of DDR3-[B]2000[/B] at [B]9-11-9-27[/B] or 240 EUR for DDR3-[B]1600[/B] at [B]7-8-7-20[/B]. What's better, higher bandwidth or lower latency?

When I buy a new CPU I'll keep this i7 with a cheap mobo and some RAM as a second computer, for prime, etc (to replace my aged P4-3.4). For the next CPU I can either get an i5-2500K (or the new Ivy Bridge i5) and a better cooler to overclock it as much as possible (people say it goes up to 5GHz, I think I can expect 4.6-4.8GHz with my air flow, I've got 9 fans but they're low-RPM), or wait to save money and get an SNB-E 3820 with a new mobo (which has quad-channel RAM) which I think goes up to 4GHz. [B]Would a dual-channel i5 at 4.8GHz be faster than a quad-channel i7E at 4GHz?[/B]

bcp19 2012-02-21 01:31

Here is a benchmark of a stock 2400, a stock 2500k and a 4.3GHz 2500k:

HP-NEW 2011-11-13 Intel Core i5-2400 @ 3.10GHz Windows64,Prime95,v26.6,build 3 [B]3143[/B] 10.78 14.04 17.05 20.71 23.04 29.03 35.75 42.89 47.34 3.35
Main 2011-12-01 Intel Core i5-2500K @ 3.30GHz Windows64,Prime95,v26.6,build 3 [B]3356[/B] 10.32 13.19 16.21 19.67 21.40 27.35 33.48 40.64 44.57 3.13
Main 2012-02-17 Intel Core i5-2500K @ 3.30GHz Windows,Prime95,v27.2,build 1 [B]4251[/B] 5.06 6.48 8.12 9.52 10.61 13.51 16.71 20.18 22.79 2.47

The 4.3 data is from 27.2, which was about 20% faster than 26.6, but you can see the improvement over stock.

LaurV 2012-02-21 04:23

[QUOTE=emily;290128]cores 1 and 4 are 62-63 C and cores 2 and 3 are 65-66 C.[/QUOTE]
so, you have a v-type heatpipes fan, haven't you? :P (that is usually 2-3 degrees colder on the side where the air goes in, at the "opening" of the "V", and hotter at the "tip" of the "V". That became my case too after I changed the standard i7 fan to a coolermaster V6gt, I even measured the temperature of each heatpipe, and the heatpipes at the opening of the "V" are always colder then the others.

emily 2012-02-21 11:01

It's a CoolerMaster Hyper 612S cooler with one 120mm fan on the side blowing air out. On the side there is the RAM which gets far too hot. Above the cooler there are two 140mm fans blowing hot air out.

So maybe it's hotter on the RAM's side and cooler on the side with the fans.

But I know I didn't apply the thermal grease as good as I could. :'(

The temps depend on the room temps of course, right now it's 59 C - 62 C. But the same cores will always be hotter than the others.

emily 2012-03-01 04:28

New question: the i7 has 8MB lf L3 cache shared with the Intel GPU. I use a discrete GPU, but if I switch to the Intel GPU for my Compiz GNU/Linux desktop will this slow down MPrime due to less cache available to the CPU cores? (since it would be used by the GPU) How much L3 cache does the Sandy Bridge Intel GPU use? I assume that when using the discrete GPU, the Intel GPU doesn't eat up any cache memory, right?

Dubslow 2012-03-01 04:52

Why would you switch to the I HD G? It's crap.

Did you try Googling it?

Edit: [url]http://www.bit-tech.net/hardware/graphics/2011/01/27/intel-hd-graphics-3000-performance-review/3[/url]

emily 2012-03-01 18:42

Hey thanks for the link, it looks like L3 cache size doesn't matter much. I should google before asking...

There are some advantages of using Intel HD Graphics 2000/3000 compared to a discrete GPU:

1. Lower power consumption and less heat generated inside the chassis / better airflow
2. On GNU/Linux OS, able to use the Intel drivers.
3. On Windows OS,[B] able to use Quick Sync for video encoding[/B], which is very fast and also high-quality. This is probably the reason most people consider using Intel HD graphics.
4. Able to use the discrete GPU on another computer without spending money to buy a second discrete graphics card :)

And a few disadvantages:

1. No GPGPU tasks, at least last time I checked there wasn't a driver for OpenCL on SB.
2. Output only to two monitors instead of three or four.
3. Slow performance, barely playable 3D games
4. Must use the Intel drivers :P

For me, I used i7-2600's HD Graphics 2000 initially, with two monitors. But LucidLogix Virtu (which enables dual-GPU usage) doesn't work on GNU/Linux and I wanted a third monitor and Intel couldn't support that (it will be supported in Ivy Bridge or Haswell), so I got Sapphire HD6770 Flex which provides output to three DVI/HDMI/VGA monitors (or four, if you use DisplayPort). For now I finished most of the work that needed three monitors, I don't play games on this machine these days, and some bugs in the open-source Radeon driver bother me a lot. I think this AMD card consumes about 10-18W of power in idle, which isn't much but if I can save it by switching back to Intel it can only be good. After all, AMD GPUs don't do LL tests yet...

Dubslow 2012-03-01 18:44

IB IG (HD 2500, 4000) will support OpenCL.

emily 2012-03-01 19:14

Yes IB will have OpenCL.

Will someday GIMPSers be able to factor or LL on it?

debrouxl 2012-03-02 08:45

We're probably closer from the day where it becomes [i]possible[/i] to run LL tests on the IGP, than from the day where it becomes [i]efficient[/i] to run LL tests on the IGP :smile:
I mean, LL tests on the IGP becoming more efficient (in terms of wall clock time and power consumption) than LL tests on the CPU.


All times are UTC. The time now is 11:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.