![]() |
Other processing possibilities.
AGEIA Technologies Inc PhysX chip, dedicated Physics Processing Unit (PPU) Expect to see PPU enabled systems and boards in time for the 2005 Christmas buying season. Native hardware support of NovedeX Physics engine. 2 Terabits/second of bandwidth is presently contemplated for internal memories facilitating data movement to/from the FPE. The internal memory structure has no "set associativity" limitations. PPU provides a library of common linear algebra and physics related algorithms implemented using the DME and FPE. However, application specific or custom algorithms may also be defined within PPU for execution by the DME and FPE. Xbox 2/Xbox Next/Xenon/Xbox 360 (? which name ?) "Xenon's CPU has three 3.0 GHz PowerPC cores. Each core is capable of two instructions per cycle and has an L1 cache with 32 KB for data and 32 KB for instructions. The three cores share 1 MB of L2 cache." ? Ship before end of 2005, two versions one with hard drive |
Would it be possible / feasible to run LL tests on graphics hardware? The latest graphics chips from nV and ATi have a big processing power, the question is: would it be possible to utilize it in such a way?
Any thoughts on that :question: |
[QUOTE=Cruelty]Would it be possible / feasible to run LL tests on graphics hardware? The latest graphics chips from nV and ATi have a big processing power, the question is: would it be possible to utilize it in such a way?
Any thoughts on that :question:[/QUOTE] This question about GPUs has been asked about a thousand times in these forums. Unfortunately, the answer remains no. These GPUs will do floating point math, but only single-precision. Prime95 requires double-precision. |
Cell : DP
Do you know this presentation of the Cell architecture ?
There is a slide talking about the FP : [URL=http://www.research.scea.com/research/html/CellGDC05/16.html]Slide 15[/URL] Is the 2 ways DP means Double Precision ? But slide 17 says: "SIMD FLoat only" . So, single or double float precision ? Tony |
Cell :Simple or Double Floating point ?
[QUOTE=Paulie]Unfortunately Cell is geared to single precision SIMD. GIMPS needs double precision.[/QUOTE] What about this paper talking of double-precision: [URL=http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/CD03DF9DB5C3FB9187256FC000745CF1/$file/ISSCC-20.3-Cell_Mult.pdf]ISSCC[/URL] ?
Also, look at: [URL=http://www-306.ibm.com/chips/techlib/techlib.nsf/products/Cell]IBM[/URL] . Tony |
[QUOTE=T.Rex]Is the 2 ways DP means Double Precision ?[/quote]I think so.
[quote]But slide 17 says: "SIMD FLoat only".[/QUOTE]Read that slide's table right-to-left. |
[QUOTE=T.Rex]Is the 2 ways DP means Double Precision ?[/QUOTE]Yes.
[QUOTE=T.Rex]But slide 17 says: "SIMD FLoat only" .[/QUOTE]But not in the "SPE" column (Cell), which states "SIMD int, float, double". "VU" seems to be just some DSP's vector unit, which is being compared to Cell's SPEs. The double precision capabilities of Cell (more the SPE ones than the PPE's) have already been discussed in this thread. Just read above. |
While reading through an article in a Linux magazine, where they show the
possibilities of letting Linux run on Cell, I found an interesting document mentioned: [b]Unleashing the power: A programming example of large FFTs on Cell[/b] The original source is here: [url]http://www.power.org/news/events/barcelona/[/url] And the document is here: [url]http://www.power.org/news/events/barcelona/11_chow.pdf[/url] It speaks about single precision FFTs, but that doesn't matter, since it covers nearly all important factors, which might be interesting for implementing a LL test on Cell. They say, their FFT implementation would be already close to being computationally bound, so this would even be more the case when using double precision. |
Will the SSE3 instructions bail out Intel or does the AMD implementation of SSE3 keep AMD in the lead for serious number crunching?
|
[QUOTE=JHagerson]Will the SSE3 instructions bail out Intel or does the AMD implementation of SSE3 keep AMD in the lead for serious number crunching?[/QUOTE]The set of new SSE3 instructions, which are useful for complex number math, still have about the same FADD/FMUL throughput for the instructions, which are important for Prime95, so this wouldn't help. Additionally Prime95 already avoids the need for horizontal operations by doing the complex math "by hand" in 2 separate sets of calculations going on in the lower and higher halves of the SSE2 registers.
However, here is the data I collected from the appropriate optimization manuals for the register-to-register variants of these instructions. Intel didn't give the numbers for the case, when memory operands are involved and delivered from shortest latency cache (as it is often the case in Prime95). For the K8 these instructions have a 1 (HADDPx/HSUBPx) or 2 cycles (ADDSUBPx, MOVxDUP) longer latency. [code]Prescott/Nocona: Instruction(s) Latency/ involved Units Throughput ADDSUBPD/ADDSUBPS 5 / 2 FP_ADD HADDPD/HADDPS 13 / 4 FP_ADD,FP_MISC HSUBPD/HSUBPS 13 / 4 FP_ADD,FP_MISC MOVDDUP xmm1, xmm2 4 / 2 FP_MOVE MOVSHDUP xmm1, xmm2 6 / 2 FP_MOVE MOVSLDUP xmm1, xmm2 6 / 2 FP_MOVE K8 Stepping E: Instruction(s) Latency/ involved Units Throughput ADDSUBPD/ADDSUBPS 5 / 2 FADD HADDPD/HADDPS 5 / 2 FADD HSUBPD/HSUBPS 5 / 2 FMUL (maybe for parallel execution) MOVDDUP xmm1, xmm2 2 / 2 FMUL MOVSHDUP xmm1, xmm2 3 / 2 FMUL MOVSLDUP xmm1, xmm2 3 / 2 FADD[/code] Throughput is given as "cycles between instruction issue". As you can see, SSE3 wouldn't change the situation. |
| All times are UTC. The time now is 05:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.