![]() |
|
|
#232 | |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
Quote:
|
|
|
|
|
|
|
#233 |
|
May 2013
East. Always East.
11×157 Posts |
I don't think there is much engineering put into new 2GB sticks. They are becoming a bit of a thing of the past... Hooray for the future.
Also, a budget build CPU isn't going to be able to handle 2400MHz RAM. All of ASUS's boards can handle up to 2800MHz (in the Z77 lineup, anyway) but they could only "guarantee" that with an i7-3770k, let alone even an i5-3570k. If you're going to run something like an i3, I can't see 2400MHz memory working. Nor would it be necessary. I've gone on a bit of an editing spree since I'm spewing nonsense all over the place. Haswells look to be costing as much as Ivy Bridge so go fourth gen all the way if you ask me. The i3's aren't actually out yet but you're getting two cores with i3 and four cores with i5 for about 50% more money. You can get a CPU for $180, a board for $150, the RAM you picked out is $50, and you can get a pretty cost effective GPUs for $200 apiece. On the other hand, I have a $250 CPU, a $250 board, $150 memory and a $400 GPU and I don't think my system is going to beat yours 1050 to 580. I'm kind of liking the sound of a budget PC, to be frank. Last fiddled with by TheMawn on 2013-08-13 at 04:29 |
|
|
|
|
|
#234 | |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
Quote:
# Test #5: AVX mode [now also including Mers-mod carry step] on 3.4 GHz Haswell quad, DDR3 2400 SDRAM (PC3 19200); times in ms/iteration: Code:
Mersenne-mod: Fermat-mod:
FFT len #threads [1 thread/core]
(Kdbl) 1 2 4 1 2 4
---- ----- ----- ----- ----- ----- -----
896 9.3 4.9 2.9 8.3 4.4 2.6
960 10.1 5.2 2.9 9.0 4.7 2.6
1024 10.4 5.5 3.0 9.7 5.0 2.9
1152 12.2 6.4 3.8
1280 14.2 7.4 4.1
1408 16.1 8.4 4.9
1536 17.4 9.8 5.3
1664 18.7 9.7 5.5
1792 21.3 11.1 6.6 19.2 10.1 5.8
1920 22.0 11.5 6.6 19.1 10.4 5.7
2048 23.9 12.4 7.1 22.2 11.6 6.6
2304 27.5 14.5 8.6
2560 32.2 16.8 9.5
2816 36.5 18.9 11.1
3072 35.8 19.3 12.2
3328 42.2 21.9 12.5
3584 43.4 22.9 14.6 38.6 20.5 12.9
3840 49.5 25.5 14.8 45.1 23.4 13.1
4096 49.1 26.2 16.0 44.5 23.6 15.0
4608 57.3 30.0 18.8
5120 65.4 34.5 21.0
5632 74.4 38.9 23.5
6144 68.2 38.1 27.6
6656 87.2 45.3 27.3
7168 83.0 45.6 33.7 74.8 41.2 30.8
7680 102.3 52.9 31.9 91.9 48.0 29.0
8192 104.7 52.2 37.3 86.9 47.4 34.4
-----------------------------------------------------------
Avg || Scaling: 1.903x 3.040x ---- 1.875x 2.908x
Avg runtime ratio for Mersenne-mod vs Fermat-mod for FFT lengths supporting both kinds of arithmetic:
1.127x 1.102x 1.099x
Notes: 0. I simply did 1000-iteration timings for all these, and made no effort to account for initialization overhead, thus the times are likely a few % pessimistic; 1. The 10-15% relative slowness of Mersenne-mod relative to Fermat-mod is expected for my code: Unlike George's which uses an optimized real-vector transform which is ideal for the real-signal Mersenne-mod IBDWT, my code is written around a more-general-purpose complex-signal FFT. Thus it is really more geared toward Fermat-mod arithmetic, where the negacyclic-transform-effecting DWT involves complex-valued weights and yields a so-called "right-angle" transform, which is ideal for handling via complex-signal FFT. For real-signal inputs such as those in Mersenne-mod arithmetic we need to wrap the dyadic-squaring step occurring between the forward and inverse FFT in a complex/real/complex wrapper step which typically results in a 10-20% runtime hit, depending on runlength and platform. 2. 3072K is the optimal runlength [among this menu of choices] for the most recent M-prime. My target for such official verifies is typically to get the per-iteration time below 10ms [8.64 ms translates to 10 Miters/day, by way of handy rule of thumb]. For the verify of M57885161 Serge Batalov used an SSE2 build of Mlucas and found that the best [in terms of absolute throughput, not per-core efficiency] option on the 32-core Xeon [pre-Sandy-Bridge, i.e. no AVX option] cluster he had access to was to run at the next-higher available FFT length of 3328K, using 32 threads [precisely speaking, a combination of 26 and 32-threads, corresponding to the 2 distinct modmul phases Mlucas arranges things in]. He was able to get right around 8.6 ms/iter that way - so now we are within spitting distance of that total throughput using just 4 Haswell cores. [If I OC'ed aggressively like George does on his system I could probably get the 3072K 4-thread timing down to right around 10 ms/iter]. It will be interesting to see what kind of parallel scalings we can get on Haswell-based [or Ivy Bridge] systems with more than 4 cores. There is usually a big dropoff in parallelism beyond 4 cores ... for M-prime verifies we are usually elated to get *any* added total-throughput boost on > 4 cores. |
|
|
|
|
|
|
#235 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Hmm.. maybe I should have gotten faster memory. I just tried some tests (my Haswell CPU is still in the box) on my Dual Core(i3 3220) Ivy B with a single 4GB 1600 MHz ram, this is what I get..
One thread: 11 ms. Two threads: 17 ms each. (DC tests) Duh. EDIT: Is P-1 less memory intensive than LL btw? Last fiddled with by kracker on 2013-08-17 at 23:08 |
|
|
|
|
|
#236 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
1D7716 Posts |
You have to use two sticks of memory to take advantage of dual-channel memory. In essence, you are running your memory subsystem at half of its capabilities.
|
|
|
|
|
|
#237 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
On my quad Haswell with dual channel 1600:
1 thread :9 ms 2 threads:10 ms 3 threads: 11 ms 4 threads: 14 ms Is P-1 as memory-bandwidth limited as of LL or ? |
|
|
|
|
|
#238 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
For any Haswell owners that are interested, an evaluation version 28.1 is available. I have some ideas to improve it further, but they will take some time to implement.
I sure hope it works correctly because I've started using it on my Haswell box. Download link: http://www.sendspace.com/file/k66yc4 |
|
|
|
|
|
#239 | |
|
"Nathan"
Jul 2008
Maryland, USA
100010110112 Posts |
Quote:
Any benefits to non-Haswell adopters of this version? And any plans to change the name of the secret forum to "David Haswellhoff"? |
|
|
|
|
|
|
#240 | ||
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Quote:
Quote:
I managed to improve the CPU temperatures 3-4°C by placing the heatsink ofcenter a few mm. Haswell die isn't located in the middle of the CPU package... Oliver |
||
|
|
|
|
|
#241 |
|
Jan 2008
France
2×52×11 Posts |
On my stock 4770K (HT enabled, RAM @2400) with Noctua NH-U14S (with a second fan), LinX makes the CPU go up to ~93°C with 8 threads (one core reached 97°C) and 2°C less with 4 threads (one core reached 97°C too). I get about 165 GFLOPS in both cases (this is way above overclocked results I found; I guess the last version is faster).
This is hot for sure. I think I'll have to play with VCORE to reduce it too... |
|
|
|
|
|
#242 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
165678 Posts |
Try http://www.sendspace.com/file/l4k4bk
It is untested. I installed ubuntu 10.04 in a virtualbox VM, but mprime doesn't recognize the FMA feature. No idea if this is a Ubuntu, VirtualBox, or mprime problem. Last fiddled with by Prime95 on 2013-09-01 at 04:56 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Haswell-E Prelim. Benchmark | sdbardwick | Hardware | 37 | 2015-02-10 18:49 |
| Prime95 and Haswell | Pleco | Information & Answers | 22 | 2014-07-13 16:03 |
| Haswell Rig | Mini-Geek | Hardware | 64 | 2014-05-27 13:22 |
| Prime95 version 27.1 early preview, not-even-close-to-beta release | Prime95 | Software | 126 | 2012-02-09 16:17 |
| Missing mouse-over preview text | retina | Forum Feedback | 1 | 2011-09-12 15:32 |