mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-05-12, 09:01   #12
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3·199 Posts
Default

Quote:
Originally Posted by diep View Post
It executes 1 instruction a cycle a core.
No, it can execute 2 instructions per cycle on each SPU.
ldesnogu is offline   Reply With Quote
Old 2011-05-12, 11:42   #13
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3·269 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
No, it can execute 2 instructions per cycle on each SPU.
You're counting multiply-add double or something?
diep is offline   Reply With Quote
Old 2011-05-12, 12:25   #14
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

Quote:
Originally Posted by diep View Post
Not sure about current, but the initial PS3 ran at 3.17Ghz and has 6 cores available. It executes 1 instruction a cycle a core.

So basically that is similar to 3 cores of a x64 at that speed.
Lets take the 2x instructions per core anyhow, so that roughly makes it 3x core2 duos @3.2GHz, or roughly 20GHzdays/day.

My GT430 runs about 35-40GHzdays/day.

Quote:
Originally Posted by diep View Post
Sure if it would be low power. However the ps3 is rated 380 watt peak and 220 watt with average operation.
GT430 is rated at 60W. Turning off a 24x7 PS3 and buying a GT430 comes out less cost with just the savings on electricity within 12months. Let alone the GT430 would do 2x work of a PS3.

The time for PS3s to be useful has come and gone. The legal issues is just icing on the cake to not use them.

-- Craig
nucleon is offline   Reply With Quote
Old 2011-05-12, 12:37   #15
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3×269 Posts
Default

Quote:
Originally Posted by nucleon View Post
Lets take the 2x instructions per core anyhow, so that roughly makes it 3x core2 duos @3.2GHz, or roughly 20GHzdays/day.

My GT430 runs about 35-40GHzdays/day.



GT430 is rated at 60W. Turning off a 24x7 PS3 and buying a GT430 comes out less cost with just the savings on electricity within 12months. Let alone the GT430 would do 2x work of a PS3.

The time for PS3s to be useful has come and gone. The legal issues is just icing on the cake to not use them.

-- Craig
Sure, but this is the CELL-1.

CELL-2 is more powerful from double precision viewpoint. Those chippies deliver good throughput, whereas for example the multiplication unit on the x64's is notorious ugly. Integer 64x64 for example at i7 can deliver one multiplication each 3.75 cycles, that sucks just so much you know, this kills all the integer NTT's running on the x64 and this is why we have dug 'em up again for the GPU's to have a look how they doing. AMD k8 and newer need 2.25 cycles for each multiplication (throughput, don't confuse it with latency) and both the intel as well as the AMD block the other execution units a lot while multiplying (first cycle and last cycle in case of AMD and worse for intel).

So a chippie then that keeps having a throughput of 1 instruction per cycle then can do some damage.

That's how originally CELL positioned itself, besides having 8 cores in total. Just the release of the CELL chip for the PS3 was years years too late if you ask me. Realize no one had 8 cores at the time at a low budget manner.

Of course it was a disappointment then realizing the PS3 had just 6 cores available out of 8 PE's.

Huge loss.

For todays standards of course all this is outdated junk. Basically the gpu's with their thousands of cores now are the new kid on the block.

That said, never say ibm can't surprise again.
diep is offline   Reply With Quote
Old 2011-05-12, 12:59   #16
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×199 Posts
Default

Quote:
Originally Posted by diep View Post
You're counting multiply-add double or something?
No, the SPU is a dual-issue in-order CPU. OTOH FP computations (except for reciprocal and reciprocal sqrt) only go through pipe 0, but that doesn't prevent you from doing something else with pipe 1, for instance ld/st. More information in Appendix B of CellBE Handbook.

BTW PS3 has 7 SPU enabled, but one is reserved when running under Linux.
ldesnogu is offline   Reply With Quote
Old 2011-05-12, 13:13   #17
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

14478 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
No, the SPU is a dual-issue in-order CPU. OTOH FP computations (except for reciprocal and reciprocal sqrt) only go through pipe 0, but that doesn't prevent you from doing something else with pipe 1, for instance ld/st. More information in Appendix B of CellBE Handbook.

BTW PS3 has 7 SPU enabled, but one is reserved when running under Linux.
You sure we speak about the same CELL?

PS3 is from 2005 having CELL-1, CELL-2 is from years later.

I remember very well how Sato from the University of Tokio was complaining loud about the single instruction per cycle throughput the CELL had, which was his explanation why it was not outperforming a core2-duo for him.

Now i'm not going to dig up the handbooks online as i don't have time for that, but i'm sure you refer to CELL-2 rather than CELL-1, the last chip is the one that is in the PS3.

The much improved CELL-2 is the chip that was used for a number of supercomputers which gets roughly a 150 Gflop at each rackmount which has 2 of those chips inside. That was quite ok at the time.

Regards,
Vincent
diep is offline   Reply With Quote
Old 2011-05-12, 13:31   #18
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

59710 Posts
Default

Quote:
Originally Posted by diep View Post
You sure we speak about the same CELL?

PS3 is from 2005 having CELL-1, CELL-2 is from years later.

I remember very well how Sato from the University of Tokio was complaining loud about the single instruction per cycle throughput the CELL had, which was his explanation why it was not outperforming a core2-duo for him.

Now i'm not going to dig up the handbooks online as i don't have time for that, but i'm sure you refer to CELL-2 rather than CELL-1, the last chip is the one that is in the PS3.

The much improved CELL-2 is the chip that was used for a number of supercomputers which gets roughly a 150 Gflop at each rackmount which has 2 of those chips inside. That was quite ok at the time.
Yes, I'm 100% sure of what I'm saying, and I talk about first generation Cell not PowerXCell 8i (the only changes in the latter are improved DP throughput/latency numbers plus a few additional DP instructions for comparison).

If you made the effort of downloading the .pdf you could check by yourself
ldesnogu is offline   Reply With Quote
Old 2011-05-12, 13:39   #19
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

10038 Posts
Default

Quote:
Originally Posted by diep View Post
For todays standards of course all this is outdated junk. Basically the gpu's with their thousands of cores now are the new kid on the block.
Yeah traditional CPUs for number crunching have had their time in the sun. We're talking more than 1 order of magnitude difference here. Even 10x performance per watt of CPUs.

Quote:
Originally Posted by diep View Post
That said, never say ibm can't surprise again.
Yeah the new super comp from China with off-the-shelf GPUs beating IBM's machine must have hurt IBM's pride. IBM has huge research into this. Big hardware is their pride and joy.

I don't discount IBM by any stretch of the imagination.

-- Craig
nucleon is offline   Reply With Quote
Old 2011-05-12, 14:34   #20
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

32716 Posts
Default

Quote:
Originally Posted by nucleon View Post
Yeah traditional CPUs for number crunching have had their time in the sun. We're talking more than 1 order of magnitude difference here. Even 10x performance per watt of CPUs.



Yeah the new super comp from China with off-the-shelf GPUs beating IBM's machine must have hurt IBM's pride. IBM has huge research into this. Big hardware is their pride and joy.

I don't discount IBM by any stretch of the imagination.

-- Craig
It's not like we didn't see GPU's coming; the burocratic and lazy 1st world nations just focked up by not producing enough software for those GPU's.

Several chinese researchers report their software can run both on nvidia as well as on AMD. this is software even from before opencl was there. So they have something that works basically at the assembler level of those cpu's and todays researchers probably make use a lot from CUDA and AMD-CAL.

The IPC's they report also were always confirmed by experimental attempts by quite good programmers over here.

Yet only a few bigger companies here invested into gpu technology and simply don't want to talk much about it. It's not me who then is going to post publicly "company XYZ are using massively gpu's". They do use them.

As i already had posted at some other spot, i feel the 1st world nations are the nations hurting themselves most by not investing into manycore technology. In this manner the 3d world nations will completely overrun the 1st world nations also in research.

Where some geniuses of their time will be able to predict what will be important in the future to some extend, there is so few of them and science has become so big, that the vaste majority of progress on this planet by capable people happens by calculating your way into the future. What they see is what they realize and what they therefore can analyze and therefore take into account.

This is of course a tad lower form of intelligence than the first genius group, yet it works and it is what happens in overwhelming quantity.

GPU's can play a large role there as their bigger calculation capacity allows to further look into the future. So not having focussed upon programming those gpu's is a big mistake. A mistake that happened mainly in 1st world nations.

Now where the N*SA type organisations will figure out a way to do their calculations (reports from already 10 years ago about a 500-1000 mm^2 chip with a dozen memory controllers and eating hundreds of watts huh?), the real victims of all this is the students in the 1st world.

They do have the money budget to get the gpu's, unlike vaste majority of students in 3d world nations, yet the gpu companies really limited the 1st world nation students by not releasing good software support to run fast on those gpu's, or by not releasing enough information. AMD/ATI having the first problem, nvidia the second problem more than AMD yet both lack proper documentation of their gpu's which you can get for cpu's. So the students/researchers are not capable to look far ahead in the future in the 1st world and miss thereby an advantage over the 3d world students, namely a better understanding of science for the average student (again the geniuses in 3d world will always manage of course as they don't need to get confronted with the data to realize what is out there).

If we look objectively to cpu's, it is ancient technology; the cpu's RAM basically can keep up with the cpu speed. This at total ancient RAM. DDR3 we arrived at today.

GPU's already for half a decade use DDR5.

Now i'm not a hardware expert, yet what i do imagine is that it is rather easy to have lots of movement within some silicon. Getting outside of that silicon chip over copper wires that are on the mainboard, to some off-chip RAM is of course a tricky and slow journey.

It makes sense to have a chip which calculation power is far superior over the RAM speed.

GPU's travel that path nowadays. Yet it's not so many generations ago that that they started travelling this path. So where CPU's already have been optimized completely to be delivering good performance, the manycore technology is still rather new.

I'd argue that the CELL was a tad of in between stage. Not a mature manycore, yet already offloading calculations to PE's that deliver more performance. Kind of hybrid in between a CPU and a manycore-GPU.

Of course just raw calculation power is not everything. We also need caches that to some extend keep up with the PE speeds.

Now a question is of course for how long GPU's will dominate crunching. That has yet to be seen.

But one advantage we can all see easily and that is that they are dirt cheap delivering that crunching, whereas interesting cpu's are really expensive.

If we look to the total productioncost of a laptop and then compare that with the price of the CPU inside, it's trivial that CPU's are too expensive and therefore outdated from crunching viewpoint.

They're not obsolete, as they do have useful function. Think of latency they deliver. To some that is important. Think of ease of writing software for cpu's, of course not for the SSE2 part.

I'd argue it's more complicated to write SSE2 code than it is to write good code for a GPU. Writing code for a gpu is still rather high level. Sure the understanding needs to be low level, yet the actual writing is high level.

SSE2 really only works well in assembler, as george woltman here has proven.

SSE2 really extended the life of cpu's for too long i'd argue.

Last fiddled with by diep on 2011-05-12 at 14:41
diep is offline   Reply With Quote
Old 2011-05-28, 14:36   #21
chris2be8
 
chris2be8's Avatar
 
Sep 2009

246410 Posts
Default

Quote:
Originally Posted by jasonp View Post
I wonder what EPFL is going to do now; start buying old PS3's on ebay?
Is the software they use for running ECM on PS3s available? I'm thinking of getting a second hand one to put Linux on, but it's a waste of money unless I can get some software I would consider interesting.

I remember a post saying they are much faster than PCs for stage 1, but can only run stage 2 by brute force to low limits. I think it was by Bob Silverman, but I can't find it by searching on PS3 or Playstation.

And can they run any other factoring software better than a PC?

Chris K
chris2be8 is offline   Reply With Quote
Old 2011-05-28, 15:40   #22
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3×269 Posts
Default

Quote:
Originally Posted by chris2be8 View Post
Is the software they use for running ECM on PS3s available? I'm thinking of getting a second hand one to put Linux on, but it's a waste of money unless I can get some software I would consider interesting.

I remember a post saying they are much faster than PCs for stage 1, but can only run stage 2 by brute force to low limits. I think it was by Bob Silverman, but I can't find it by searching on PS3 or Playstation.

And can they run any other factoring software better than a PC?

Chris K
They won't be able to run anything faster than todays PC's.

You're speaking of a PS3 cell chip from the year 2005 or so?

Don't confuse those 2005 ps3 cell chippies with the later CELL2 chip
that isn't in PS3. Codenamed PowerXCell 8i.

This chippie was put by IBM in some supercomputers...

Another huge difference is that PS3 has 6 PE's available clocked 3.17Ghz, executing single instruction per cycle versus the supercomputer cell2 chip has 8 PE's available. This is a huge performance difference.

Modern PC processors such as cheapo AMD 6 core cpu's, or maybe a tad more expensive intel i7's with 6 cores, they execute handsdown 4 flops per cycle per core using SSE2 and newer. Clocked 3.4Ghz for little money.

Power consumption is the same like the PS3.

Yet why compare 2005 technology with 2011 technology, isn't real fair you know.

IBM also sells rackmounts with 2 newer CELL processors which according to IBM get around 150 Gflops. Maybe you can get those second hand. It'll be a $1200k or so each rackmount. Similar power consumption like the PS3 though.

Some years ago this PowerXCell 8i wasn't bad.

See for example Roadrunner supercomputer is using them in Los Alamos.

http://www.lanl.gov/

Right now this gets overwhelmed by GPU's. They deliver 0.5 Tflop to 1 Tflop a card.

Regards,
Vincent
diep is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 15:13.


Fri Jul 7 15:13:46 UTC 2023 up 323 days, 12:42, 0 users, load averages: 1.06, 1.08, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔