mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2013-07-21, 15:10   #210
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

7·292 Posts
Default

As far as I can work out latency for the 2133 CL9 is 8.44 ns and for the 2400 CL11 is 9.17 ns. For 2400 CL10 it is 8.33 which is actually better than the 2133 memory.
Could someone who has fast memory do a prime95 benchmark with two different memory latencies?
henryzz is online now   Reply With Quote
Old 2013-07-21, 21:42   #211
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by db597 View Post
Assuming an affordable price difference (hmm.. I'll just skip 1 lunch), would you guys go for 2133 CL9 or 2400 CL11? Trying to decide between them, both GSkill RipRaws X. So far, the P95 discussion seems focused on bandwidth and I don't hear about timings being mentioned. Does the more relaxed timing only a small secondary concern?

I suppose the best performance would be a 2400 CL10... but that cost is in another league (another 30%+ more).
I had the impression, possibly mistaken, that latency was an issue for multi-threaded P95.

One of the better discussions I've read of DDR operation and trade-offs is here, on Anandtech.
kladner is offline   Reply With Quote
Old 2013-07-27, 02:35   #212
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19·613 Posts
Default

Intel to Supply Apple with Special High-End Haswell Processors for MacBook Pro | Mac Rumors

Let the CUDA vs OpenCL flamewars rage...
ewmayer is offline   Reply With Quote
Old 2013-07-28, 21:30   #213
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

2·1,303 Posts
Default

another haswell news : fanless ones
http://anandtech.com/show/7168/haswe...ater-this-year
15 & 28 W tdp ( for laptop and tablet apparantly)
firejuggler is offline   Reply With Quote
Old 2013-07-29, 07:37   #214
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

2·52·11 Posts
Default

Both this Apple and "4.5W" SDP news are somehow related: this is heavy binning of chips. No need to say this means low volumes and high prices.
ldesnogu is offline   Reply With Quote
Old 2013-08-07, 21:41   #215
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19×397 Posts
Default You just can't win

On Sunday, I installed prime95 version 28.1 and resumed my LL testing. Iteration times dropped about 1 ms. -- from about 15.5 ms. to 14.5 ms. Wonderful.

Three days later I've suffered three unexplained spontaneous reboots. In the month of LL testing prior to 28.1 there were no unexplained reboots.

I've tried upping the CPU and memory voltages a smidge. We'll see if that solves the problem.
Prime95 is online now   Reply With Quote
Old 2013-08-07, 23:22   #216
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

11·157 Posts
Default

George: Does Linux has some sort of Event Viewer equivalent? The more computer inclined folk might be able to pinpoint the cause...

Were you monitoring thermals at the time?
TheMawn is offline   Reply With Quote
Old 2013-08-08, 03:18   #217
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19×613 Posts
Default

Quote:
Originally Posted by ewmayer View Post
But I ran into some truly bizarre timings with my new avx mersenne-dwt carry macros yesterday - need to look into those first. Briefly, the "genuine avx" macro runs horribly slow relative to the sse2 one (which in avx builds is simply the original sse2 carry macro adapted to take account of the differing data layout between sse2/avx). My initial thought was that this might be due to the avx macro using a small number of unaligned (16-byte aligned but not 32) ymm-register loads, but when I replaced those with a pair of aligned xmm loads and a "combine xmm data" operation ...[code snip snipped]... the result was even worse. I confirmed that the original vmovups was not the problem by fiddling with the data layout to allow all aligned loads - tiny, insignificant improvement.
Took a close look at the offending code here again the past several days and finally found the problem - briefly, I constructed the AVX-based Mersenne-mod carry macros by fusing the fancy-indexing footwork of the legacy SSE2 mersenne-mod-DWT carry macros and the AVX data-permute aspects of the AVX-based Fermat-mod carry macros - the result ran incredibly, awfully, unbelievably slowly in the initial implementation, which came online just before Haswell hit the market. I have traced the problem back to the mixing of legacy SSE instructions (using xmm-form registers) in the indexing-computation portions of the code with AVX instructions used for weights and carries in the new AVX code. The solution was to simply prepend a "v" to the legacy SSE instructions and (for ones where the VEX form of the instruction adds a third operand) to duplicate the original SRC+DEST operand (rightmost in AT&T/GCC-syntax inline ASM, leftmost in Intel ASM syntax) in order to satisfy the 3-operand syntax.

Here are some references:

o Intel's own "Avoiding AVX-SSE Transition Penalties" PDF:

Here is the money snippet:
Quote:
When using Intel® AVX instructions, it is important to know that mixing 256-bit Intel® AVX instructions with legacy (non VEX-encoded) Intel® SSE instructions may result in penalties that could impact performance. 256-bit Intel® AVX instructions operate on the 256-bit YMM registers which are 256-bit extensions of the existing 128-bit XMM registers. 128-bit Intel® AVX instructions operate on the lower 128 bits of the YMM registers and zero the upper 128 bits. However, legacy Intel® SSE instructions operate on the XMM registers and have no knowledge of the upper 128 bits of the YMM registers. Because of this, the hardware saves the contents of the upper 128 bits of the YMM registers when transitioning from 256-bit Intel® AVX to legacy Intel® SSE, and then restores these values when transitioning back from Intel® SSE to Intel® AVX (256-bit or 128-bit). The save and restore operations both cause a penalty that amounts to several tens of clock cycles for each operation.
o Agner Fog's "early in the AVX life cycle" commentary here.

Using the bastard hybrid AVX-FFT/SSE2-carry I deployed out of sheer desperation, runtime for Mersenne-mod was ~1.35x that of Fermat-mod at typical current LL-wavefront runlengths. With the mixed-SSE/AVX problem resolved, the "true AVX" Mersenne-mod per-iteration time is ~1.15x that of Fermat-mod.

To convey a sense of just how severe the timing penalties resulting from such SSE/AVX instruction mixing can be, the performance *penalty* component alone from using the mixed SSE/AVX carry macros here was greater than the entire *runtime* needed for either the "pure-AVX-FFT/SSE2-carry" hybrid in or the "true AVX 4 all" code.
ewmayer is offline   Reply With Quote
Old 2013-08-08, 03:18   #218
db597
 
db597's Avatar
 
Jan 2003

CB16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Three days later I've suffered three unexplained spontaneous reboots. In the month of LL testing prior to 28.1 there were no unexplained reboots. I've tried upping the CPU and memory voltages a smidge. We'll see if that solves the problem.
Sounds similar to when you first added AVX in 27.x... my previously rock stable build started to randomly reboot. There would be no errors detected by P95, but it would spontaneously reboot. I upped CPU VCore to no avail. Turns out it was voltages relating to memory that needed a boost (VCCSA, VCCIO and the DRAM voltage itself).
db597 is offline   Reply With Quote
Old 2013-08-08, 19:26   #219
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Quote:
Originally Posted by db597 View Post
I upped CPU VCore to no avail. Turns out it was voltages relating to memory that needed a boost (VCCSA, VCCIO and the DRAM voltage itself).
I upped the VCore and DRAM voltage one notch. It rebooted spontaneously this morning. I'm bumping it two notches more.

Last fiddled with by Prime95 on 2013-08-08 at 19:26
Prime95 is online now   Reply With Quote
Old 2013-08-08, 21:21   #220
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41708 Posts
Default

I'm thinking of upgrading a older core2 duo to a i3 haswell or ivy bridge... How much more performance is there from IB AVX to haswell? Do you think I should wait for Haswell or just bag a Ivy Bridge now?
kracker is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Haswell-E Prelim. Benchmark sdbardwick Hardware 37 2015-02-10 18:49
Prime95 and Haswell Pleco Information & Answers 22 2014-07-13 16:03
Haswell Rig Mini-Geek Hardware 64 2014-05-27 13:22
Prime95 version 27.1 early preview, not-even-close-to-beta release Prime95 Software 126 2012-02-09 16:17
Missing mouse-over preview text retina Forum Feedback 1 2011-09-12 15:32

All times are UTC. The time now is 19:50.


Fri Aug 6 19:50:19 UTC 2021 up 14 days, 14:19, 1 user, load averages: 3.61, 3.30, 3.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.