![]() |
|
|
#45 |
|
"Mike"
Aug 2002
25·257 Posts |
.
|
|
|
|
|
|
#46 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
If you compile existing software on the Cell you will get a binary that spends all of its time running on the PowerPC processor and not on the fancy SPU coprocessor engines. With general-purpose libraries like glucas and mlucas I actually don't anticipate many problems getting something basic up and running. However, don't be suprised if the performance for an LL test is mediocre but not terrible; that PPC core runs at 3.X GHz and can issue multiple instructions per clock cycle. What it will not be is faster than the same code running on a desktop machine.
Porting the code to the coprocessor engines in where everybody wants to go. These can execute a double precision floating point operation every 6 cycles, and those operations have a 13-cycle latency. There are other Cell versions that perform DP at full speed but the PS3 does not contain them (yet?). If you want to know what to expect, go to www.fftw.org and look at their latest library versions, which implement the FFT natively on the Cell processor. |
|
|
|
|
|
#47 | |
|
Aug 2006
3·1,993 Posts |
Quote:
1. "TF is waaay slower than LL" 2. "The Cell is fast" In particular, the claim is that the statements are meaningless and so impossible to prove/disprove and of no value as an opinion. I agree that #1 seems to have no meaning, and I believe that this is what retina means by saying "apples to oranges". #2 seems meaningful to me, and I can list some senses in which it is fast: 2a. The Cell has 8 cores, more than in the usual 2009.01 processor 2b. The Cell is clocked at 3.2 GHz, more than the usual 2009.01 processor Now how meaningful these metrics are depends on how much work the Cell can do per cycle. I've seen GPGPU benchmarks with a single:double ratio as low as 5:1 to 4:1, implicitly claiming to emulate double precision in less than 5 cycles. If that was the case for the Cell, it would have > 5 Gflop/s, which is pretty nice. If it takes more like 12-14 cycles, we're talking about 2 Gflop/s, which isn't nearly as nice. |
|
|
|
|
|
|
#48 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
11000001101002 Posts |
Quote:
|
|
|
|
|
|
|
#49 | ||
|
Aug 2006
3×1,993 Posts |
Quote:
Quote:
I wanted to distinguish that claim (meaningful, if not fully described) from the other claim about TF being slower than LL, which I don't understand at all. There could be a sensible interpretation for that, but none come to mind. |
||
|
|
|
|
|
#50 | |
|
Dec 2008
83310 Posts |
Quote:
Last fiddled with by flouran on 2009-01-13 at 15:15 |
|
|
|
|
|
|
#51 |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
I've put a WinZip archive of the current Mlucas code snapshot on John Pierce's ftp server:
http://hogranch.com/mayer/src/Mlucas_01.13.2009.zip Here is the build sequence (yes, Mom, it's true - I don't use makefiles) I use to build Mlucas sans SSE2 support (i.e. just the generic-C version) under GCC 4.2 in 32-bit mode - the middle 2 steps are required because a small subset of files need to be compiled at a lower -O1 optimization level in order for GCC to not optimize away the desired functionality of certain (non-performance-critical) routines: gcc -c -Wall -O3 -m32 *.c rm -f rng*.o util.o qfloat.o gcc -c -Wall -O1 -m32 rng*.c util.c qfloat.c gcc -m32 -o Mlucas *.o -lm Assuming the Cell platform looks like a PPC + extra stuff to the compiler, you shouldn't have to do much tweaking of the platform-specific #defines in the code, but if you do run into any problems, it's probably a good idea to check the platform identification part of the code, as follows: At the top of platform.h you'll see this: Code:
/* Only one of the following 3 should be set = 1 at any time. If > 1 is set, only the first onesuch will be respected. */ /* Set = 1 to print brief OS summary at compile time and exit: */ #undef OS_DEBUG #define OS_DEBUG 0 /* Set = 1 to print brief OS summary at compile time and exit: */ #undef CPU_DEBUG #define CPU_DEBUG 0 /* Set = 1 to print brief OS summary at compile time and exit: */ #undef CMPLR_DEBUG #define CMPLR_DEBUG 0 gcc -c -Wall -O3 -m32 util.c Good luck, let us know how it goes. |
|
|
|
|
|
#52 |
|
Aug 2006
135338 Posts |
Thanks for the instructions (and support!), Ernst.
|
|
|
|
|
|
#53 |
|
Aug 2008
1268 Posts |
I'm anxious to see if you can get this to work. I just got my PS3 this week, and have noticed the huge push in the PS3 community to run Folding during your unused cycles.
If TF assignments are efficient enough, I may be interested in using my PS3 during off hours for TF. I'm not volunteering to do any real bringup work, I just don't have the time to spend that way. |
|
|
|
|
|
#54 | |
|
Dec 2008
72×17 Posts |
Quote:
Sidenote: I will be posting my progress on this thread during the process of running MLucas on my PS3 this weekend. Thanks again, Ernst! |
|
|
|
|
|
|
#55 |
|
∂2ω=0
Sep 2002
República de California
101101011101112 Posts |
Note that you can ignore all of the following kinds of warnings, which you will likely see a lot of:
1. signed/unsigned int - Haven't had time to clean up all these, but none indicates a real erroneous misuse. 2. negative shift count warnings - These are bogus, related to compiler inlining speculative pre-execution in a place where it is not appropriate; 3. "type-punning aliasing" - the code makes deliberate use of this in the quad-float emulation and the floating-point random-number generator. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 128-bit OS'es and GIMPS? | ixfd64 | Software | 22 | 2011-10-31 22:23 |
| GIMPS Nub | SayMoi | Information & Answers | 5 | 2009-04-06 15:29 |
| GIMPS uses only 1 cpu | Unregistered | Information & Answers | 7 | 2009-01-10 20:01 |
| GIMPS should pay | Vijay | Lounge | 40 | 2005-07-01 18:10 |
| Why do you run GIMPS ? | Prime Monster | Lounge | 12 | 2003-11-25 19:04 |