mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Need help deciding between Athlon II X4 620 and i5 (https://www.mersenneforum.org/showthread.php?t=12668)

em99010pepe 2009-12-05 09:44

Good catch! I though it was a 2x4GB.
The code of the memory is KHX2000C9D3T1K2/4GX, you are correct, thank you. So all this means I can't have in the future 16 GB of memory on the machine...I will have to replace the 2 GB modules for 4 GB ones....

For the 1366 based system, motherboard and CPU are more expensive than 1156 based systems.

I don't want AMD's, I prefer Intel processors.

Thank you.

hj47 2009-12-05 10:22

Still figuring out a dedicated crunching rig - would it be better to have slower ram, say 1333MHz or 1600MHz with 'tighter' timings than say fast ram (2000MHz) with loose timings?

petrw1 2009-12-05 21:47

[QUOTE=garo;196737]Core temperatures in the 54-64 range depending on ambient temperature and the core number - 1&3 are running hotter for some reason and 4 is running the coolest (it is running P-1).[/QUOTE]

Just installed CoreTemp.

Idling temp is 28-34
Full Load Stock (not-OC) temp is 51-57 (core 1 hottest also).

What is considered a safe max temp?

petrw1 2009-12-06 04:09

[QUOTE=petrw1;197938]Just installed CoreTemp.

Idling temp is 28-34
Full Load Stock (not-OC) temp is 51-57 (core 1 hottest also).

What is considered a safe max temp?[/QUOTE]

Modest OC...I just used EasyTune and for the first attempt went to 3.0 Ghz. CoreTemp went to between 61 and 65.

sdbardwick 2009-12-06 04:43

[quote=garo;197365]Two things that made me chose the i5.
1. Overclocking potential. I have it Prime95-stable at 3800MHz with a good cooler and 0.1V extra.
2. There is minimal slowdown when running 4 LL tests in parallel. I would really like to see iteration times for the 620/630 when running four LLs.[/quote]
Finally able to spend a little [very little] time with the 630. Not enough time for any real experimentation. But I did get some results for 1280K FFT.
[code]
1 core: .26
2 cores: .26 (on both)
3 cores: .27 (on all 3)
4 cores: .29 (on all 4)
[/code]
I just ran the same exponent on all the cores and then deactivated each in turn give the changes. Didn't matter what cores were active, just the number of active cores.

garo 2009-12-06 18:54

@sdbardwick. Hmm! Not bad at all. The slowdown is in the 10-15% range which is very acceptable. Is this with DDR2 or DDR3?

@petrw1: I've heard the Core i5 and i7 can tolerate pretty high temperatures. I think you should be fine into the 70s. Just do a google. The Noctua is very very good so it keeps the temps well within the limit.

petrw1 2009-12-06 19:16

[QUOTE=garo;198026]@petrw1: I've heard the Core i5 and i7 can tolerate pretty high temperatures. I think you should be fine into the 70s. Just do a google. The Noctua is very very good so it keeps the temps well within the limit.[/QUOTE]

Thanks....just to compare I installed coretemp on my Q9550 with no OC but also stock Intel Cooler. Running GIMPS 100% on 4 cores.

Temp readings are 66 / 59 / 59 / 64....virtually the same as my i5 with Noctua cooler OC'd from 2.67 to 3.0

em99010pepe 2009-12-06 20:49

1 Attachment(s)
[quote=petrw1;198028]
Temp readings are 66 / 59 / 59 / 64....virtually the same as my i5 with Noctua cooler OC'd from 2.67 to 3.0[/quote]

Which Noctua cooler?
My temps are 62/58/58/58 on Q6600@2.9GHz running LLR 100 % on 4 cores, ambient temperature 15 ºC. Cooler is Artic Cooling Freezer Pro 7.

petrw1 2009-12-07 04:09

[QUOTE=em99010pepe;198038]Which Noctua cooler?
My temps are 62/58/58/58 on Q6600@2.9GHz running LLR 100 % on 4 cores, ambient temperature 15 ºC. Cooler is Artic Cooling Freezer Pro 7.[/QUOTE]

Noctua NH-U12P SE2

My ambient temp will be close to 28C. There are 4 PC's in the room and the window is painted shut

em99010pepe 2009-12-07 09:40

[quote=petrw1;198083]Noctua NH-U12P SE2

My ambient temp will be close to 28C. There are 4 PC's in the room and the window is painted shut[/quote]

That explain your temps, lot's of heat being dissipated in a high ambient temperature.

CADavis 2009-12-07 20:10

[QUOTE=petrw1;197938]Just installed CoreTemp.

Idling temp is 28-34
Full Load Stock (not-OC) temp is 51-57 (core 1 hottest also).

What is considered a safe max temp?[/QUOTE]

for 24/7 safe max for i7 is 80-85 core temp. safe for benchmarking, maxing out is up to 100 really.

not sure if it's the same with i5 though.

hj47 2009-12-08 10:41

Don't want to hijack the thread, but it's essentially the same dilemma I'm facing.

I've come up with three different configs, an i5, an i7 and two Athlon II X4 systems (final price in US$):

[B]CORE i5 [/B]

Core i5-750:........................ $234
Gigabyte P55-UD3:................... $132
4G Kit DDR3 2000 G.Skill Ripjaws:... $145
Cooler Master Elite 335:............ $57
380W Antec EarthWatts:.............. $69
Scythe Ultra Kaze 120mm Fan 2000rpm: $22
Cooler Master Hyper 212+:........... $52

[COLOR=Blue][B]TOTAL:.............................. $711 (US$650)[/B][/COLOR]

[B]CORE i7[/B]

Core i7-860:........................ $335
Gigabyte P55-UD3:................... $132
4G Kit DDR3 2000 G.Skill Ripjaws:... $145
Cooler Master Elite 335:............ $57
380W Antec EarthWatts:.............. $69
Scythe Ultra Kaze 120mm Fan 2000rpm: $22
Cooler Master Hyper 212+:........... $52

[B][COLOR=Blue]TOTAL:.............................. $812 (US$742)[/COLOR][/B]
[B]
AMD ATHLON II X4 620[/B]

AMD Athlon II X4 620:............... $119
2G DDR3 1600 Patriot-Signature:..... $59
Gigabyte MA 785GMT-UD2H:............ $99
Coolermaster Hyper TX3:............. $29
380W Antec EarthWatts:.............. $69
Cooler Master Elite 335:............ $57

[COLOR=Blue][B]TOTAL:.............................. $432 (US$395)
(x2):............................... $864 (US$790)[/B][/COLOR]

In terms of pure prime95 output, does anyone know what the best value is? With the two AMD systems, I would obviously be paying more for power, but I have twice the cores.

With the i7, I don't know what the most efficient way to utilise the threads would be. A single worker between 1 physical and 1 virtual thread? A single worker between 2 physical and 2 virtual threads?

The most attractive looks to be the i5 as it is a nice balance between FLOPS/cost/wattage, but the more I think about it the more I get confused :blush::cry:

em99010pepe 2009-12-08 11:25

hj47,

What are the spec of your current system?

My dilemma is other, my current system is 10x faster than my previous one. I went from an AMD 3000+ (2GHz) to a Q6600@2.9 GHz. If I upgrade my current system to a core i5 it will only be 1.6x faster (overclocked). Maybe I'll wait for the core i9, I don't know.

Carlos

garo 2009-12-08 12:10

hj47, you are not comparing like for like. The Scythe is absent from the AMD config as is 2GB of memory. I think it is a tough call. You will probably get more throughput with 2AMDs so it really boils down to whether you can accommodate two machines. You should be able to get your Core i5 up to 3400-3600 but I'm not sure how well the 620 overclocks.

Also, drop the i7 because hyperthreading does not really help much if you are running four LLs in parallel.

hj47 2009-12-08 12:34

[quote=garo;198172]Also, drop the i7 because hyperthreading does not really help much if you are running four LLs in parallel.[/quote]

What about if you use 8 threads for 1 LL, would that be > running 4 LL's on 8 threads?

My configs are not the same for a couple of reasons, for example with the AMD systems I thought it may be more economical to use less but faster memory than a lot of slower memory.

Anyways all your input is appreciated, it's still a tough call.

em99010pepe 2009-12-08 13:04

[URL="http://www.tomshardware.com/reviews/core-i7-870-1156,2482.html"]LGA 1156 Memory Performance: What Speed DDR3 Should You Buy[/URL]

garo 2009-12-08 17:01

[quote=hj47;198176]What about if you use 8 threads for 1 LL, would that be > running 4 LL's on 8 threads?

[/quote]
Look at the perpetual benchmark thread. But also note that benchmarks are not always reliable and I find running n tests in parallel to be a better test of actual throughput.

Prime95 2009-12-09 01:25

Stay away from the i7. It costs $100 more than i5 and all you get is hyperthreading which prime95 cannot use very effectively.

willmore 2009-12-09 02:48

Yes, but if those four virtual threads are only used by the UI and other 'fluffy' stuff, will the four 'hard' threads be much impacted? That, IMHO, may be worth it. The real i7 machines with 3 memory channels should feed the CPU quite nicely. (speaking as someone with a Q6600 that is 'sucking air' for memory bandwidth.)

Prime95 2009-12-10 03:01

[QUOTE=willmore;198290]The real i7 machines with 3 memory channels should feed the CPU quite nicely. (speaking as someone with a Q6600 that is 'sucking air' for memory bandwidth.)[/QUOTE]

All users are reporting that the i5/i7s with 2 memory channels scale quite nicely. My 2 memory channel i7 runs 4 workers at the same per-iteration timing as 1 worker.

willmore 2009-12-10 19:45

That's wonderful news. My next desktop might be a 4 core dual channel system. I was curious if the third channel would be necessary or not. I guess 'not'. Thanks for the info, George.

Prime95 2009-12-11 00:40

[QUOTE=willmore;198405]I was curious if the third channel would be necessary or not. I guess 'not'.[/QUOTE]

True for version 25. Version 26 will place more demands on the memory subsystem.

henryzz 2009-12-11 08:17

[quote=Prime95;198436]True for version 25. Version 26 will place more demands on the memory subsystem.[/quote]
i presume that will be optional
i am pretty sure that a large percentage of systems are still based on core 2 which needs less demands not more

hj47 2009-12-11 08:37

Out of curiosity, does anyone know how a single LL performs on an i7 860 using all 8 threads?

Mini-Geek 2009-12-11 12:34

[quote=hj47;198468]Out of curiosity, does anyone know how a single LL performs on an i7 860 using all 8 threads?[/quote]
The [URL="http://www.mersenneforum.org/showthread.php?t=59&page=11"]benchmark thread[/URL] has some i7 benchmarks, which includes 8-threaded tests. I don't see any 860 models specifically, but you can probably find one with a similar GHz.

hj47 2009-12-11 12:52

Ahhh I see.

Is there a rough calculation I can perform to see whether it would be faster to run a single LL test on 8 threads vs. 4 tests on 4 threads?

Mini-Geek 2009-12-11 13:48

[quote=hj47;198490]Ahhh I see.

Is there a rough calculation I can perform to see whether it would be faster to run a single LL test on 8 threads vs. 4 tests on 4 threads?[/quote]
Unfortunately, no Prime95 benchmark will automatically test multiple tests at a time, and due to factors such as memory bandwidth it's not as simple as multiplying benchmark speeds. The best thing would be to try it for yourself.
But in general, you'll get more throughput with 4 tests on 4 threads than 1 test on 8 threads, and even more throughput by keeping 2 tests to low memory bandwidth work, like TF (e.g. worker 1/2/3/4 run LL/TF/LL/TF). You might get a little more throughput by having each worker use two logical cores (to try to use the hyperthreading). If you have an i7 with just 4 threads running, you might get more speed if you set the affinity of the workers with the AffinityScramble option.[code]You can arbitrarily change how the program assigns affinity to CPUs, to make sure no threads have to share a physical core.
The program makes its best guess at assigning workers and helper threads
to CPUs for optimal speed. However, new architectures or situations we
haven't considered may make different affinity setting desirable. In
local.txt set
AffinityScramble=string
Where the string "0123456789ABCDEFGHIJKLMNOPQRSTUV" is the "make no
changes" string. For example, let's say you have a system with 8 logical
cores with 4 workers each using a helper thread. The program would
ordinarily assign the worker and helper threads to [0,1], [2,3], [4,5], [6,7].
However, if you think [0,2], [1,3], [4,6], [5,7] would give better performance,
you would set AffinityScramble=02134657 to test out your theory.
[/code]

lycorn 2009-12-11 18:43

[quote=Prime95;198436] [B]Version 26[/B] will place more demands on the memory subsystem.[/quote]

Ho!Ho! Good news!
Any estimate as to the release date?
That could influence my upgrade schedule and architecture choices.

ET_ 2009-12-11 20:09

Version 26
 
[QUOTE=lycorn;198540]Ho!Ho! Good news!
Any estimate as to the release date?
That could influence my upgrade schedule and architecture choices.[/QUOTE]

And by chance a multi-threaded trial-factoring?

Luigi

Prime95 2009-12-11 22:35

[QUOTE=lycorn;198540]Any estimate as to the release date?[/QUOTE]

No time soon. I've rewritten a few of the FFTs between 2M and 4M. Left to do: rewrite all 2 pass FFTs, optimize for 64-bit, optimize for P4, K8, K10. A long, long, long vacation starting in January will slow development too.

I've changed the way the code is organized so that many different FFT implementations can be supported. The current FFT code remains as one of the FFT implementations, and I've been adding several other variations of radix-4 FFTs to study which ones might be faster.

Prime95 2009-12-11 22:35

[QUOTE=ET_;198551]And by chance a multi-threaded trial-factoring?[/QUOTE]

Highly unlikely. See previous post - my plate is pretty full.

petrw1 2009-12-11 22:41

[QUOTE=Prime95;198564]A long, long, long vacation starting in January...[/QUOTE]

Define "...long, long, long". :sleep:

Going anywhere specific? ... or taking one for the team, knowing there is a strong correlation between your vacations and Prme finds. :smile:

Enjoy the vacation :toot:

Prime95 2009-12-11 23:00

[QUOTE=petrw1;198566]Define "...long, long, long".[/QUOTE]

Nine weeks. South America.

Uncwilly 2009-12-11 23:42

[QUOTE=Prime95;198567]Nine weeks. South America.[/QUOTE]Time for a new prime!!!

willmore 2009-12-12 04:54

[QUOTE=Prime95;198436]True for version 25. Version 26 will place more demands on the memory subsystem.[/QUOTE]

Because of CPU execution improvements, the balance between the CPU and memory being the bottleneck will be pushed more towards the memory? Just guessing here.

Oh, annoying feature request. Any way we can configure the thread/worker allocation in benchmarking? There's probably a way to fudge this. I just find that determining the optimal allocation for best throughput to be a lengthy PITA. I guess it's easy on the new Core-i chips, because, as you say, memory isn't an issue. Sadly not the case for my 3.2GHz C2Q. 8M of L2 just isn't enough and that slow FSB doesn't help, either. ;( On the other side, I have the Sempron 140 which has just as much memory BW, but a slower single core CPU. He's not memory starved. :)

willmore 2009-12-12 04:55

[QUOTE=Uncwilly;198570]Time for a new prime!!![/QUOTE]

Yeah! We can run the verification run on a GPU. :)

cheesehead 2009-12-12 05:05

[quote=ET_;198551]And by chance a multi-threaded trial-factoring?[/quote]George has previously pointed out that performance improvements in TF have relatively little effect on GIMPS throughput.

If we [I]doubled[/I] TF speed, the optimum bit level for an exponent range would be raised by only 1.

For example, if the current TF limit were 73, a doubling of TF speed would mean taking it to 74 instead, which has only a 1/73 extra chance of finding a factor in return for the doubling of TF speed. Not much throughput leverage there.

cheesehead 2009-12-12 05:07

[quote=Prime95;198567]Nine weeks. South America.[/quote]Iguassu Falls, by any chance?

- - -

(Coincidentally, David Letterman's guest just said she's going to Buenos Aires soon.)

Prime95 2009-12-12 15:33

[QUOTE=willmore;198598]Because of CPU execution improvements, the balance between the CPU and memory being the bottleneck will be pushed more towards the memory? Just guessing here.[/QUOTE]

In two-pass FFTs, you can reduce memory requirements with the cost of some extra complex multiplies. The current FFTs were optimized for a P4 where a cache line took ~150 clocks to read in. My Core i7 takes ~30 clocks to read a cache line. It makes sense to re-evaluate some of the FFT design choices in light of these new circumstances.

Prime95 2009-12-12 15:34

[QUOTE=cheesehead;198601]Iguassu Falls, by any chance?
(Coincidentally, David Letterman's guest just said she's going to Buenos Aires soon.)[/QUOTE]

Both of those are on the itinerary :)

Batalov 2009-12-15 05:16

what about a X3450?
 
Googled this relatively fresh review --
[URL]http://ixbtlabs.com/articles3/cpu/intel-xeon-x3450-p1.html[/URL]
Did anyone even consider an X3450 (possibly with 16Gb+ of RDIMM/UDIMM)?

(Well, I know that [I]I[/I] didn't until tonight. I am mainly thinking about this in the context of an efficient algebra box with more than 12Gb of memory...
Had to look up workstation boards like ASUS P7F-E or X. The desktop boards specifically appear to disclaim ECC even though [I]now[/I] it seems to cost them nothing to allow for it - the CPU does all the job anyway!)

willmore 2009-12-16 21:20

[QUOTE=Prime95;198637]In two-pass FFTs, you can reduce memory requirements with the cost of some extra complex multiplies. The current FFTs were optimized for a P4 where a cache line took ~150 clocks to read in. My Core i7 takes ~30 clocks to read a cache line. It makes sense to re-evaluate some of the FFT design choices in light of these new circumstances.[/QUOTE]

I agree that every new CPU that changes the balance in costs of any operation, be it +, *, etc. or memory access, should be accomponied by a reconsideration of the code structure. It's just that the small gains to be had may not be enough to justify the work of rewriting the code. I take it the changes between the P4(netburst) and i7(*nehelem) are finally enough to make it worthwhile? Yay. :) I, along with the others, eagerly await what this refactoring will bring. But, first, the vacation.

Ahh, okay. You're trading fewer CPU operations for more memory operations. Makes sense to me. Happy hunting! Oh, and have fun on the vacation. We'll keep the CPUs warm while you're away. :)

petrw1 2009-12-20 21:47

[QUOTE=petrw1;197956]Modest OC...I just used EasyTune and for the first attempt went to 3.0 Ghz. CoreTemp went to between 61 and 65.[/QUOTE]

Went to the next (top) level of OC using EasyTune.

Default is 2.67; steps are 2.8, 3.0, 3.2.

A few interesting observations:
- Going from the default 2.67 to 3.0 changed the iteration times (1260FFT) from 0.024 to 0.021. Going from 3.0 to 3.2 changed the iteration times very little. Some cores to 0.020 and others to still 0.021 with the odd 0.020. I calculated (and hoped) it whould have dropped to just under 0.020 (I guess it depends if it started at near 0.02400 or closer to 0.02449
- The CoreTemp has not changed(!) actually so far it is 1 degree cooler.
- The P4 equivalency factor changed back to 100%. It will now take a few weeks for it to adjust properly; in the mean time estimated completion times are about triple what they should be.

petrw1 2009-12-31 21:22

The i5-750 DOES scale well...
 
[CODE]
DC Benchmark
(FFT 1280)
PC (1 core) All Cores DC 1 P1-S1 1 P1-S2
E6550 (Duo) 30 33 33 34
Q9550 (Quad) 22 28 29 30
i5-750 (Quad) 20 21 20 21
Note: i5 is OC'd to 3.2 [/CODE]

To clarify the 4 numerical columns represent:
1. The time from the Benchmark page which represents the theoretical best time running DC one 1 core only.
2. My observed time when all (2 or 4) cores are running DC.
3. My observed DC times when 1 core is running P1-Stage 1 while the rest (1 or 3) are doing DC.
4. My observed DC times when 1 core is running P1-Stage 2 while the rest (1 or 3) are doing DC.

willmore 2010-01-02 05:39

Okay, so the i5 and the i7 run at full speed regardless of memory BW pressure. Good to know. Not like I wouldn't be aware of the impact that memory BW pressure has on Prime95 with my C2Q 6600 @ 3.2GHz. My good 1066-5-5-5-15 memory died and was temporarily replaced with 800-5-6-6-15 memory. Yes, the difference in Prime95 speed was *very* visible.

Short summary. 'dual bank' memory on 'Core' chipsets really aren't. They are, at best, two banks interleaved. If you want true dual banks, you need an AMD chip or an i5/i7.

ET_ 2010-01-02 11:46

Note that i5 seems more flexible on overclocking than i7.

Luigi

lfm 2010-01-02 21:58

[QUOTE=ET_;200625]Note that i5 seems more flexible on overclocking than i7.

Luigi[/QUOTE]

Is this due to 3 memory channels vs 2 memory channels or is it some other thing?

Batalov 2010-01-09 21:14

Nobody seems to have noticed that i3's have arrived:
[URL="http://www.newegg.com/Product/Product.aspx?Item=N82E16819115222"]i3-530[/URL] at $125 and [URL="http://www.newegg.com/Product/Product.aspx?Item=N82E16819115221"]i3-540[/URL] at $145 ...and [URL="http://www.newegg.com/Product/Product.aspx?Item=N82E16819115220"]i5-6xx[/URL] stuff.

Anandtech: [URL="http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3704"]The Clarkdale Review[/URL]

sdbardwick 2010-01-09 21:34

[quote=Batalov;201376]Nobody seems to have noticed that i3's have arrived:
[URL="http://www.newegg.com/Product/Product.aspx?Item=N82E16819115222"]i3-530[/URL] at $125 and [URL="http://www.newegg.com/Product/Product.aspx?Item=N82E16819115221"]i3-540[/URL] at $145 ...and [URL="http://www.newegg.com/Product/Product.aspx?Item=N82E16819115220"]i5-6xx[/URL] stuff.

Anandtech: [URL="http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3704"]The Clarkdale Review[/URL][/quote]
Boring from a GIMPS point of view - dual core only, no integrated memory controller.
Then again, they might be a good fit for a low-cost HTPC.

vsuite 2010-01-10 00:55

[quote=Prime95;198564]No time soon. I've rewritten a few of the FFTs between 2M and 4M. Left to do: rewrite all 2 pass FFTs, optimize for 64-bit, optimize for P4, K8, K10. A long, long, long vacation starting in January will slow development too.

I've changed the way the code is organized so that many different FFT implementations can be supported. The current FFT code remains as one of the FFT implementations, and I've been adding several other variations of radix-4 FFTs to study which ones might be faster.[/quote]
Great work so far and I look forward to the new code. I am using 1K7, 1P4Celeron, 1P4, 1P4 HT (4 threads), 1K8, 2 Core-2Qs, 1Phenom II-X4 (13 threads), and soon a K8-x2 (2 threads).

Are the new processors are all running P4 code currently? Once that changes, there should be dramatic speed increases for K10 and Core-2/i357.

You did not mention Core-2 or Core ix. I wonder if you just need to optimize for 32-bit and 64-bit Core-2/Core-ix and K8/K10. That means 4 new routines (for everything!!!). Of course k8 and k10 largely use the same instructions, and there may be only slight instruction latency differences which may require reordering, so I'm not sure whether you need separate instruction sequences if you don't need the new instructions. Ditto for Core-2 vs Core-i7. [Check whether any new Core i7 and especially K10 specific instructions are relevant to prime factoring or FFT, since K10 does not yet include SSE3 instructions (too bad)].

Given that you have already optimized for Pentium 4 and lower processors, and AMD XP and lower processors, I'm assuming you'll keep in all the existing code for older processors, but I wonder if it is necessary to optimize further for P4 (or any older processor) or to back port new FFT routines for these processors. I'm sure there are many times more P4 and K7 machines currently running than K8, K10, Core 2 and Core i7 (you'll know the exact disposition of systems producing the results), but as time progresses, more of the older systems will retire while the newer systems will be used, each with multi-thread capability. For example, I won't mind 5-10% speed increase in the K7 & P4s, but I doubt more is possible. On the other hand, I'm sure at least 10-30% speed increase is possible on each thread of the Core-2 and Phenom II systems.

vsuite 2010-01-10 00:59

BTW, is the code (already re-written) production ready, that is does it produce exactly correct results without further optimization? Does it run faster on any machine especially P4, K10, Core-2? And can the binaries be downloaded by those of us who want in on the leading edge? Please?

One more thing - does the code write regularly (eg every iteration) to one or several memory locations as flags/counters due to register pressure for example, it may help to put those counters/flags in separate 128 byte blocks. In one dual-threaded program I optimized for the P4 HyperThreading, all the processing was done using EAX, EBX, ECX, EDX, ESI, EDI, EBP, and due to register pressure, a separate 32bit integer variable was used for a counter which was decremented every iteration in each thread. On the Pentium 4, there was no problem with both variables being in the same 128 byte cache line, but on Core-2 and Phenom-II I got >31 and 41% speed increase, respectively simply by putting them in separate cache lines.

I also got 10-13% speed increase on the Core-2 and Phenom-II simply by eliminating prefetch.

Prime95 2010-01-11 04:11

[QUOTE=vsuite;201383]
Are the new processors are all running P4 code currently? Once that changes, there should be dramatic speed increases[/QUOTE]

Right now it is mostly Core 2 optimized. Version 24 was P4 optimized. Version 26 will let me optimize for both.

Dramatic speed increases? Don't get your hopes too high. Architecture specific optimizations are generally in the 5-10% range.

Prime95 2010-01-11 04:12

[QUOTE=vsuite;201384]I also got 10-13% speed increase on the Core-2 and Phenom-II simply by eliminating prefetch.[/QUOTE]

Software prefetching is still very important to large FFT performance.

hj47 2010-01-16 03:34

Well I FINALLY finished building my i5 rig after several complications regarding the wrong cooler being shipped, but eh, it was worth the wait.

The i5 is happily crunching on 4 LL's at 3.2GHz and being fed by 2000MHz of Kingston HyperX DDR3 memory.

Benchmarks [URL="http://www.mersenneforum.org/showthread.php?p=202068#post202068"]here[/URL]

Happy crunching :toot:

ET_ 2010-01-17 22:04

I just bought an i5 750 @ 2.66 GHz. It comes with its Intel cooler.

From what I read in this tthread, it should be safe to run GIMPS 100% with it, as Petrw said...

Luigi

petrw1 2010-01-18 00:10

[QUOTE=ET_;202183]I just bought an i5 750 @ 2.66 GHz. It comes with its Intel cooler.

From what I read in this tthread, it should be safe to run GIMPS 100% with it, as Petrw said...

Luigi[/QUOTE]

I bought the Noctua cooler that Garo recommended so I can't speak for the Intel Cooler personally. I have second hand opinions that it is not great but I'm sure it will work find with NO any maybe with some(?) overclocking.

petrw1 2010-01-18 02:30

Has anyone looked into the newest i5's?

So says newegg.ca............

[CODE]Intel Core i5-650 Clarkdale 3.2GHz
LGA 1156 73W Dual-Core Desktop
L3 Cache: 4MB
Manufacturing Tech: 32 nm
64 bit Support: Yes
Hyper-Threading Support: Yes
Integrated Graphics: Yes
Graphics Base Frequency: 733MHz
Virtualization Technology Support: Yes
Your Price:$199.99 [/CODE]


[CODE]Intel Core i5-661 Clarkdale 3.33GHz
LGA 1156 87W Dual-Core Desktop
L3 Cache: 4MB
Manufacturing Tech: 32 nm
64 bit Support: Yes
Hyper-Threading Support: Yes
Integrated Graphics: Yes
Graphics Base Frequency: 900MHz
Original Price: $214.99[/CODE]

[CODE]Intel Core i5-660 Clarkdale 3.33GHz
LGA 1156 73W Dual-Core Desktop
L3 Cache: 4MB
Manufacturing Tech: 32 nm
64 bit Support: Yes
Hyper-Threading Support: Yes
Integrated Graphics: Yes
Graphics Base Frequency: 733MHz
Original Price: $219.99[/CODE]

[CODE]Intel Core i5-670 Clarkdale 3.46GHz
LGA 1156 73W Dual-Core Desktop
L3 Cache: 4MB
Manufacturing Tech: 32 nm
64 bit Support: Yes
Hyper-Threading Support: Yes
Integrated Graphics: Yes
Graphics Base Frequency: 733MHz
Virtualization Technology Support: Yes
Your Price:$318.99 [/CODE]

sdbardwick 2010-01-18 05:48

Dual core only (so far) and lack of on-die memory controller strike me as unexciting for GIMPS; HTPC or general office systems might be attractive depending on cost.
AMD Athlon II/Phenom II look better from price/performance standpoint right now.

garo 2010-01-18 11:29

Absolutely no reason to go for one of those from a GIMPS perspective. An i5-750 is $199.99 so why would I spend the same money for a dual core?

monst 2010-01-18 14:18

[quote=ET_;202183]I just bought an i5 750 @ 2.66 GHz. It comes with its Intel cooler.

From what I read in this tthread, it should be safe to run GIMPS 100% with it, as Petrw said...

Luigi[/quote]

I ran the i5 750 for a couple of weeks using the Intel cooler. Running Prime95 full throttle, the chip ran at 77-80C.

Then I installed this Cooler Master cpu fan...
[url]http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=4434989&CatId=493[/url]

The chip runs consistently from 40-44C. I have it overclocked at 160MHz x 21 = 3.36GHz and it's really running well.

I'd highly recommend a better cooler. The one I installed is large and required me to completely remove the motherboard for installation. Also, make sure your case is large enough to accommodate any cooler you choose.

-- Rich

petrw1 2010-01-18 15:14

[QUOTE=monst;202262]IThe chip runs consistently from 40-44C. I have it overclocked at 160MHz x 21 = 3.36GHz and it's really running well.[/QUOTE]

WOW...

I have the Noctua; OC'd to 3.2 just using EasyTune and it runs at 60C.

The cooler was too big for my case so I left the sidewall off.

petrw1 2010-01-18 15:15

[QUOTE=garo;202240]Absolutely no reason to go for one of those from a GIMPS perspective. An i5-750 is $199.99 so why would I spend the same money for a dual core?[/QUOTE]

Oops, sorry....missed the Dual Core part.

garo 2010-01-18 15:15

That is a very nice temperature. With the Noctua I got 44-45 at stock but at the current speed of 3.8GHz and 1.375V, I get about 62-64C with an ambient of 20C.

em99010pepe 2010-01-28 13:26

I'm very disappointed by the core i5 750 performance running 4 LL's. Overclocked to 3.0 GHz is only 15 % faster than my previous processor, a Q6600@2.9GHz. I though at this stage to have at least a gain of 30 %. Looks like I need to overclock even further.

petrw1 2010-01-28 15:10

[QUOTE=em99010pepe;203558]I'm very disappointed by the core i5 750 performance running 4 LL's. Overclocked to 3.0 GHz is only 15 % faster than my previous processor, a Q6600@2.9GHz. I though at this stage to have at least a gain of 30 %. Looks like I need to overclock even further.[/QUOTE]

My only point of comparison is to my Q9500 (stock 2.87) and my i5-750 (stock 2.66) which is impressive. The benchmarks for 1280FFT are 22 and 20 ms respectively. Running 1 core that is what I get. BUT running 4 cores the Q9550 drops to 28 ms while the i5 is virtually unchanged at 20-21 ms.

lfm 2010-01-28 15:32

I usually say anything less than an order of magnitude performance improvement is negligible (that is to say I can hardly tell the difference). Makes my machines last longer that way at least. I don't have to start looking for replacements till they're over 3 years old, often much more.

em99010pepe 2010-01-28 20:28

[quote=petrw1;203569]My only point of comparison is to my Q9500 (stock 2.87) and my i5-750 (stock 2.66) which is impressive. The benchmarks for 1280FFT are 22 and 20 ms respectively. Running 1 core that is what I get. BUT running 4 cores the Q9550 drops to 28 ms while the i5 is virtually unchanged at 20-21 ms.[/quote]

Maybe for larger FFT the gap will open on performance. I am testing numbers of size n=1.9M, 1.7 ms. Now I overclocked even further and I am getting timings of 1.5 ms. I need a few more hours to have some results.

Carlos

em99010pepe 2010-01-30 18:59

Benches

[B]Q6600@2.9 GHz[/B] (Vista 64-bit, 6 GB DDR2 800 MHz)

15*2^1944026-1 is not prime. LLR Res64: DDBA8FB8DA55A41A Time : 3818.427 sec.

[B]Core i5 750@3.47 GHz[/B] (Windows 7 64-bit, 4 GB DDR3 1333 MHz)

15*2^1946425-1 is not prime. LLR Res64: 49B16A7783DC25F1 Time : 2930.602 sec.


All times are UTC. The time now is 23:26.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.