![]() |
wow i want one of those -_-
hmm now when you look at this its very intresting you know that there onto somthing good |
Unfortunately the K8L with double the floating point wont
be available to compete with Core 2 Duo, but AMD may be able to compensate by upping the clock speed. [url]http://www.theinquirer.net/?article=31729[/url] The voices from the Far East are claiming that by the end of 2006, AMD will have 65 nanometre cores running at 3.0, 3.2 and 3.4 GHz, which are set to be clocked faster than any Intel product at that time. |
[QUOTE=dsouza123]Unfortunately the K8L with double the floating point wont
be available to compete with Core 2 Duo, but AMD may be able to compensate by upping the clock speed.[/QUOTE]That won't be enough. These +20% or more performance improvements of Conroe compared to a equally clocked K8 are for usual benchmarks. For an optimized SSE2 FFT like that of Prime95 the difference would be much bigger and clock speed would have to be increased really much to compensate for this. |
The first 65nm AMD chips have been brought forward from 2007 into Q4/2006 ie December.
Also I think we need to wait for Conroe to come out before making comparisons. The only benchmarks (not under independent conditions) were performed for 32 bit apps. What will be interesting is the 64 bit AM2 Dualcore versus the Conroe in 64 bit mode. Difficult to predict but we shall see the proof in the benchmark results. |
[QUOTE=Peter Nelson]The first 65nm AMD chips have been brought forward from 2007 into Q4/2006 ie December.
Also I think we need to wait for Conroe to come out before making comparisons. The only benchmarks (not under independent conditions) were performed for 32 bit apps. What will be interesting is the 64 bit AM2 Dualcore versus the Conroe in 64 bit mode. Difficult to predict but we shall see the proof in the benchmark results.[/QUOTE] All the discussion has been with respect to GIMPS. However, I see nothing in the new design that is going to help out factoring algorithms. Especially for sieve based methods what matters most is latency (not bandwidth) to/from main memory on very large chunks of data. Speeding up the FSB, as opposed to the processor would help greatly. Reducing the cost of cache misses when they occur would help greatly. I do not see the new designs as having much improvement at all with regard to these issues. |
R.D. Silverman
Some the issues are addressed by the AMD K8L and HT3. The latency by HT3 on Opterons from this link [url]http://news.soft32.com/hypertransport-30-released-to-the-public_1324.html[/url] “As processor performance continues to rise, and the industry increasingly moves toward a new generation of multi-core technology, multi-CPU system designs, interconnect latency and bandwidth take on a pivotal role in overall system and application performance,” said David Rich, president of the HyperTransport Consortium. “By further reinforcing HyperTransport’s industry position as the lowest latency, highest bandwidth interconnect technology, HyperTransport 3.0 enables designers to achieve state-of-the-art application performance and optimum time-to market advantages while benefiting from the combined economies of scale of a widely adopted interconnect standard and a full array of off-the-shelf systems and components.” The HyperTransport Consortium has also released the HyperTransport HTX™ connector specification which enables system designers to link high-performance peripheral subsystems directly to the system’s CPU or CPUs via low-latency HyperTransport links. “Cray has a long-standing history of designing and delivering supercomputers with superior interconnects,” said Steve Scott, Chief Technology Officer of Cray Inc. “HyperTransport technology provides an ultra low latency, high bandwidth connection directly from the processor’s memory system to our high speed network. We’ve integrated it into two of our current product lines, the Cray XT3 and Cray XD1 systems. Together with our strategic partner, AMD, we expect to continue this practice in future product generations.” ----------------------------------------- In Dresdenboy's previous TheInquirer.net link [url]http://www.theinquirer.net/?article=31761[/url] First it has a shared expandable L3 cache, necessary because it is a native quad-core design. Next is memory. The new core will support 48-bit addressing and 1GB pages. Cray and SGI will be very happy with this, until they hit that memory wall again. There is also official co-processor support, strongly hinted to be on a HTX card. [A fpga reprogrammable co-processor that plugs into an Opteron socket, for use on multi CPU socket motherboards.] Next up is RAS, another area where AMD is sorely lacking. It is addressing the major sore points with support for memory mirroring, data poisoning support, and HT retry. It looks like it is following the IBM roadmap more than the Intel one here. The last bit is much more aggressive prefetch to 'feed the beast'. It has gone from 16B to 32B, an obvious step with the added SSE number crunching power. On top of this, it has out of order loads, and other tweaks to use the available bandwidth in a much more efficient manner. |
The latency issue on the Intel side will be addressed by FB DIMM used with
the server chip Woodcrest aka Xeon DP 51xx series it will use FB DIMMs. It also has the higher speed 1333 FSB on most parts versus the only the Conroe Extreme Edition aka Core 2 Extreme the other Conroes will have 1066 and DDR2. [url]http://www.intel.com/cd/channel/reseller/emea/eng/250634.htm[/url] Increased Business Productivity Provides over 3 times higher memory throughput† allowing for superior application responsiveness Enables increased capacity and speed to balance capabilities of dual core processors Allows for Intel® I/O Acceleration Technology to more quickly access and process data Performs reads and writes simultaneously; eliminating the previous read to write blocking latency Supports a faster front side bus Built In Reliability Provides Cyclical Redundancy Checking (CRC) protection for commands and data Delivers additional memory channels that can be used to mirror data and prevent loss from any single memory or DIMM failure Allows for simplified board designs through reduced pin count and less complex routing requirements Leverages support from industry-standards Per channel segment Silent Data Corruption (SDC) FIT rate less than 0.10 (1,142,000 years) to support even the highest-RAS servers Intel also released Dempsey the Netburst CPUs as Xeon DP 50xx which uses the same Bensley systems and has higher power use and much lower performance, but will be cheaper chips. |
[QUOTE=dsouza123]The latency issue on the Intel side will be addressed by FB DIMM used with
the server chip Woodcrest aka Xeon DP 51xx series it will use FB DIMMs.[/QUOTE]FB DIMMs don't address latency issues, instead they offer higher bandwidth, weaken the memory chips per bus issue and introduce concurrent accesses for example. About latency: [QUOTE][...]There is a downside right? Yes, latency, but it appears manageable. There are two types of latency that the FB architecture adds, serialization delays and each added buffer means a transmission delay. The signal must be read by a buffer and either acted upon or passed on. By the time you get to DIMM number 8, it can add significant time. In absolute terms, the latency is 3-9 ns, and each hop you go out adds another 2-6 ns. Intel says it has gone to great lengths to address these latencies. First is the serialization delay. That part is unavoidable, it will happen no matter what you do. As the speed of the RAM increase, the absolute time of this delay decreases. At 400MHz, the delay will be twice the delay of ram at 800MHz. Since speeds are likely to go up in the future, this issue will decrease as time goes on.[...][/QUOTE] [url]http://www.theinquirer.net/?article=15189[/url] Each ns means 2-3 clock cycles. Then there is the long path to memory: CPU <-> NB <-> FB-DIMM controller <-> RAM |
Adressed not solved, without the HT and integrated memory controller
that AMD uses this the best that Intel could do to improve memory access and memory quantity. It does eliminate read write latency and adds concurrency. FB DIMMs versus regular DDR2 is analogous to a SATA hard drive with a memory cache/buffer versus a PATA without a cache/buffer. The Woodcrest will perform well in single and double CPU socket motherboards with a total of 2 and 4 cores respectively. Past that memory I/O becomes a bottleneck and the Opteron is better. |
new record on superPI:
[url]http://www.xtremesystems.org/forums/showthread.php?p=1471041#post1471041[/url] 11.3-11.9secs depending what screen you look at. Conroe @ 4.5GHz. He said max overclock on air was 3.5GHz. Details pretty sketchy. I get the impression a reason why he couldn't go higher was the clock generator on the mboard. Pity he doesn't have a higher multiplier CPU. He doesn't seem to be frying his CPU as yet. :) To have such a high overclock this close to release is amazing. Looks like this arch has plenty of room to move. Can I say, that until AMD doubles the FP capability it looks like core2 duo arch is going to lead for some time. -- Craig |
With 3.5 Ghz on air, it hopefully will allow a more modest 3.0 Ghz = 300*10,
with 300*4 = 1200 FSB on Prime95 stable systems. Was this with both cores enabled ? I've seen other overclocks with one core disabled. So it was a E6700 Conroe (Core 2 Duo) 2.67 Ghz that was overclocked. Using MSI motherboard using ICH7, ( ICH8 is the upgraded chipset that will go with it, has 2 extra SATA and USB ports). Any Prime95 benchmarks ? Any tests for Prime95 stability available ? 24 hour preferred or even 1 hour, only on air. The performance for programs with memory needs limited to the L2 cache size is excellent. It appears the SuperPi 1M may fall into this case. 00m 11.438s 1M 00m 29.953s 2M 01m 11.031s 4M 02m 39.906s 8M 05m 52.344s 16M 12m 44.500s 32M Roughly 2.6x increase in time for each doubling of SuperPi digits. From the information below for 1M digits, 35 minutes down to 11.438 seconds, 1/183.5 of the Pentium 90 Mhz time ! 51.1 times the clock speed. ======================================= From the SuperPi help file for some background on memory requirements and usage. SuperPi Calculation speed: ( with a Pentium 90 Mhz, just like Prime95 units! ) 35 minutes for 1 million decimal digits, 78 minutes for 2 million decimal digits, 183 minutes for 4 million decimal digits with Pentium 90MHz and 40MB or more main memory. Maximum length of calculation: 33.55 million decimal digits. 340 MB of disk storage is needed for 33.55 million decimal calculation. Memory requirement: For the maximum calculation speed, 8 MB/1 million decimal digit is favorable, but 2 MB/1 million decimal digit is completely acceptable. Working memory is automatically adjusted by the software. Anyway, available main memory size is crucial for the processing speed! Disk storage: As for the working disk storage, 10.5 MB per 1 million decimal digit is needed. Working disk storage is automaticall freed. As for the permanent data storage, 1 MB per 1 million decimal digit is needed. Elapsed time is very keen to the disk access time. In order to short the elapsed time, you are better to equip high speed hard disk drive! ======================================= |
| All times are UTC. The time now is 04:33. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.