mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2009-01-13, 06:54   #45
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

25·257 Posts
Default

.
Attached Images
 
Xyzzy is offline   Reply With Quote
Old 2009-01-13, 14:17   #46
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

If you compile existing software on the Cell you will get a binary that spends all of its time running on the PowerPC processor and not on the fancy SPU coprocessor engines. With general-purpose libraries like glucas and mlucas I actually don't anticipate many problems getting something basic up and running. However, don't be suprised if the performance for an LL test is mediocre but not terrible; that PPC core runs at 3.X GHz and can issue multiple instructions per clock cycle. What it will not be is faster than the same code running on a desktop machine.

Porting the code to the coprocessor engines in where everybody wants to go. These can execute a double precision floating point operation every 6 cycles, and those operations have a 13-cycle latency. There are other Cell versions that perform DP at full speed but the PS3 does not contain them (yet?).

If you want to know what to expect, go to www.fftw.org and look at their latest library versions, which implement the FFT natively on the Cell processor.
jasonp is offline   Reply With Quote
Old 2009-01-13, 14:19   #47
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3·1,993 Posts
Default

Quote:
Originally Posted by flouran View Post
Do some research then if you are unsure... Besides, your comment, "And your comparisons are all apples to oranges anyway so I'm not sure why I bother." was unnecessary and if you have more things to say like that please don't post (on this thread at least) unless you wanna contribute to running MPrime on a PS3. By the way, I wasn't really trying to prove anything; I simply stated my own opinion, it was YOU who wanted my statement to be either proved or disproved.
retina was asking for clarification of your statements:
1. "TF is waaay slower than LL"
2. "The Cell is fast"

In particular, the claim is that the statements are meaningless and so impossible to prove/disprove and of no value as an opinion. I agree that #1 seems to have no meaning, and I believe that this is what retina means by saying "apples to oranges". #2 seems meaningful to me, and I can list some senses in which it is fast:
2a. The Cell has 8 cores, more than in the usual 2009.01 processor
2b. The Cell is clocked at 3.2 GHz, more than the usual 2009.01 processor

Now how meaningful these metrics are depends on how much work the Cell can do per cycle. I've seen GPGPU benchmarks with a single:double ratio as low as 5:1 to 4:1, implicitly claiming to emulate double precision in less than 5 cycles. If that was the case for the Cell, it would have > 5 Gflop/s, which is pretty nice. If it takes more like 12-14 cycles, we're talking about 2 Gflop/s, which isn't nearly as nice.
CRGreathouse is offline   Reply With Quote
Old 2009-01-13, 14:32   #48
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

11000001101002 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
retina was asking for clarification of your statements:
1. "TF is waaay slower than LL"
2. "The Cell is fast"

In particular, the claim is that the statements are meaningless and so impossible to prove/disprove and of no value as an opinion. I agree that #1 seems to have no meaning, and I believe that this is what retina means by saying "apples to oranges". #2 seems meaningful to me, and I can list some senses in which it is fast:
2a. The Cell has 8 cores, more than in the usual 2009.01 processor
2b. The Cell is clocked at 3.2 GHz, more than the usual 2009.01 processor

Now how meaningful these metrics are depends on how much work the Cell can do per cycle. I've seen GPGPU benchmarks with a single:double ratio as low as 5:1 to 4:1, implicitly claiming to emulate double precision in less than 5 cycles. If that was the case for the Cell, it would have > 5 Gflop/s, which is pretty nice. If it takes more like 12-14 cycles, we're talking about 2 Gflop/s, which isn't nearly as nice.
But there is more to it than simply the number of clocks per DP and total clock rate, there is also the issue of memory cycles and how to get data to/from the processors. And lots of other finicky details about internal things. This is why I asked "fast at [doing] what?" It might be perfectly fast at computing a Mandelbrot set but totally stupid at decent sized LL tests with large memory transfer requirements. Plus, just simply stating "Cell is fast" is meaningless without some type of comparison. A snail can also be said to be fast, all you need is the right comparison.
retina is online now   Reply With Quote
Old 2009-01-13, 14:51   #49
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3×1,993 Posts
Default

Quote:
Originally Posted by retina View Post
But there is more to it than simply the number of clocks per DP and total clock rate, there is also the issue of memory cycles and how to get data to/from the processors.
I absolutely agree. I've asked the same question myself on this thread (posts #24 and #28).

Quote:
Originally Posted by retina View Post
This is why I asked "fast at [doing] what?" It might be perfectly fast at computing a Mandelbrot set but totally stupid at decent sized LL tests with large memory transfer requirements. Plus, just simply stating "Cell is fast" is meaningless without some type of comparison. A snail can also be said to be fast, all you need is the right comparison.
flouran claimed that the Cell is fast, and people generally understand what "fast" means (here, something like 'fast relative to current PCs at doing floating-point work'). Further, flouran quoted some numbers -- not in a useful form like double-precision Gflop/s, admittedly, but enough to get a ballpark.

I wanted to distinguish that claim (meaningful, if not fully described) from the other claim about TF being slower than LL, which I don't understand at all. There could be a sensible interpretation for that, but none come to mind.
CRGreathouse is offline   Reply With Quote
Old 2009-01-13, 15:14   #50
flouran
 
flouran's Avatar
 
Dec 2008

83310 Posts
Talking

Quote:
Originally Posted by retina View Post
But there is more to it than simply the number of clocks per DP and total clock rate, there is also the issue of memory cycles and how to get data to/from the processors. And lots of other finicky details about internal things. This is why I asked "fast at [doing] what?" It might be perfectly fast at computing a Mandelbrot set but totally stupid at decent sized LL tests with large memory transfer requirements. Plus, just simply stating "Cell is fast" is meaningless without some type of comparison. A snail can also be said to be fast, all you need is the right comparison.
That's true. Speed is relative. By the way, what do you guys want to know from my experiment this weekend (or the next) before I actually conduct it? As in, what would you like me to try on my PS3 after installing Yellow Dog and report back on this thread?

Last fiddled with by flouran on 2009-01-13 at 15:15
flouran is offline   Reply With Quote
Old 2009-01-13, 21:19   #51
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

I've put a WinZip archive of the current Mlucas code snapshot on John Pierce's ftp server:

http://hogranch.com/mayer/src/Mlucas_01.13.2009.zip

Here is the build sequence (yes, Mom, it's true - I don't use makefiles) I use to build Mlucas sans SSE2 support (i.e. just the generic-C version) under GCC 4.2 in 32-bit mode - the middle 2 steps are required because a small subset of files need to be compiled at a lower -O1 optimization level in order for GCC to not optimize away the desired functionality of certain (non-performance-critical) routines:

gcc -c -Wall -O3 -m32 *.c
rm -f rng*.o util.o qfloat.o
gcc -c -Wall -O1 -m32 rng*.c util.c qfloat.c
gcc -m32 -o Mlucas *.o -lm

Assuming the Cell platform looks like a PPC + extra stuff to the compiler, you shouldn't have to do much tweaking of the platform-specific #defines in the code, but if you do run into any problems, it's probably a good idea to check the platform identification part of the code, as follows: At the top of platform.h you'll see this:
Code:
/* Only one of the following 3 should be set = 1 at any time.
   If > 1 is set, only the first onesuch will be respected. */
/* Set = 1 to print brief OS summary at compile time and exit: */
#undef	OS_DEBUG
#define	OS_DEBUG	0

/* Set = 1 to print brief OS summary at compile time and exit: */
#undef	CPU_DEBUG
#define	CPU_DEBUG	0

/* Set = 1 to print brief OS summary at compile time and exit: */
#undef	CMPLR_DEBUG
#define	CMPLR_DEBUG	0
To check which OS, CPU and Compiler the code thinks it's dealing with, set the relevant #define (but only one at a time may be set) to 1, e.g. to see the CPU self-identifier result, set CPU_DEBUG 1, save the file, and build one source file (this is just to see the resulting preprocessor #error message containing the desired info), for instance

gcc -c -Wall -O3 -m32 util.c

Good luck, let us know how it goes.
ewmayer is online now   Reply With Quote
Old 2009-01-13, 21:47   #52
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

135338 Posts
Default

Thanks for the instructions (and support!), Ernst.
CRGreathouse is offline   Reply With Quote
Old 2009-01-13, 22:35   #53
uigrad
 
uigrad's Avatar
 
Aug 2008

1268 Posts
Default

I'm anxious to see if you can get this to work. I just got my PS3 this week, and have noticed the huge push in the PS3 community to run Folding during your unused cycles.

If TF assignments are efficient enough, I may be interested in using my PS3 during off hours for TF. I'm not volunteering to do any real bringup work, I just don't have the time to spend that way.
uigrad is offline   Reply With Quote
Old 2009-01-13, 23:26   #54
flouran
 
flouran's Avatar
 
Dec 2008

72×17 Posts
Talking

Quote:
Originally Posted by ewmayer View Post
I've put a WinZip archive of the current Mlucas code snapshot on John Pierce's ftp server:

http://hogranch.com/mayer/src/Mlucas_01.13.2009.zip

Here is the build sequence (yes, Mom, it's true - I don't use makefiles) I use to build Mlucas sans SSE2 support (i.e. just the generic-C version) under GCC 4.2 in 32-bit mode - the middle 2 steps are required because a small subset of files need to be compiled at a lower -O1 optimization level in order for GCC to not optimize away the desired functionality of certain (non-performance-critical) routines:

gcc -c -Wall -O3 -m32 *.c
rm -f rng*.o util.o qfloat.o
gcc -c -Wall -O1 -m32 rng*.c util.c qfloat.c
gcc -m32 -o Mlucas *.o -lm

Assuming the Cell platform looks like a PPC + extra stuff to the compiler, you shouldn't have to do much tweaking of the platform-specific #defines in the code, but if you do run into any problems, it's probably a good idea to check the platform identification part of the code, as follows: At the top of platform.h you'll see this:
Code:
/* Only one of the following 3 should be set = 1 at any time.
   If > 1 is set, only the first onesuch will be respected. */
/* Set = 1 to print brief OS summary at compile time and exit: */
#undef    OS_DEBUG
#define    OS_DEBUG    0

/* Set = 1 to print brief OS summary at compile time and exit: */
#undef    CPU_DEBUG
#define    CPU_DEBUG    0

/* Set = 1 to print brief OS summary at compile time and exit: */
#undef    CMPLR_DEBUG
#define    CMPLR_DEBUG    0
To check which OS, CPU and Compiler the code thinks it's dealing with, set the relevant #define (but only one at a time may be set) to 1, e.g. to see the CPU self-identifier result, set CPU_DEBUG 1, save the file, and build one source file (this is just to see the resulting preprocessor #error message containing the desired info), for instance

gcc -c -Wall -O3 -m32 util.c

Good luck, let us know how it goes.
Thank you very much for the code, Ernst, I will try it out and report back. Let's hope I can get somewhere...
Sidenote: I will be posting my progress on this thread during the process of running MLucas on my PS3 this weekend.
Thanks again, Ernst!
flouran is offline   Reply With Quote
Old 2009-01-13, 23:38   #55
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011101112 Posts
Default

Note that you can ignore all of the following kinds of warnings, which you will likely see a lot of:

1. signed/unsigned int - Haven't had time to clean up all these, but none indicates a real erroneous misuse.

2. negative shift count warnings - These are bogus, related to compiler inlining speculative pre-execution in a place where it is not appropriate;

3. "type-punning aliasing" - the code makes deliberate use of this in the quad-float emulation and the floating-point random-number generator.
ewmayer is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
128-bit OS'es and GIMPS? ixfd64 Software 22 2011-10-31 22:23
GIMPS Nub SayMoi Information & Answers 5 2009-04-06 15:29
GIMPS uses only 1 cpu Unregistered Information & Answers 7 2009-01-10 20:01
GIMPS should pay Vijay Lounge 40 2005-07-01 18:10
Why do you run GIMPS ? Prime Monster Lounge 12 2003-11-25 19:04

All times are UTC. The time now is 21:49.


Fri Jul 16 21:49:15 UTC 2021 up 49 days, 19:36, 2 users, load averages: 2.01, 1.92, 1.89

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.