![]() |
|
|
#1 |
|
Apr 2003
Berlin, Germany
192 Posts |
Hello,
following image shows, that 189.lucas, part of SPEC 2000 suite and based on Mlucas code is running a bit faster on Athlon 64 (here with 1MB L2 like Opteron) than on a P4 with 80% clock frequency advantage (score of 880 vs. 864) http://www.heise.de/ct/03/01/018/bild.gif One could try to compile Mlucas for AMD64 platforms using compilers like the Portland Group F90 compiler which I mentioned in The Hardware forum. The only thing that prevents me from testing this is the availability of some Opteron system (although there are entry systems starting at $1200+). Next year such x86-64 platforms (maybe also from Intel) will get some relevance for distributed computing. I'm also looking forward to EWMs developments in parallel FP/Integer calculations for Alpha CPUs since that would also make sense on Opteron. :) Regards, Matthias |
|
|
|
|
|
#2 |
|
∂2ω=0
Sep 2002
República de California
265678 Posts |
Thanks for the info, Dresdenboy. But some caveats: the FFT code I used for my submission to the SpecFP suite is light-years removed from the one in the currrent version of Mlucas, so it's not at all clear whether the performance difference seen in your chart will carry over to the current code.
Also, you don't need an f90 compiler anymore, since I the latest development version of Mlucas is in C, and is at ftp://hogranch.com/pub/mayer/src/C . I haven't personally built it on an Athlon (just on my P3 using CodeWarrior, which gives a binary that is roughly 50% as fast as Prime95 running on the same machine - no surprise there, since I've done little x86-oriented tuning), but Tom Cage ( k5gj@earthlink.net ) regularly does so. Re. the mixed float/int code, that has been on hold due to the demands of my work-for-pay job, and the fact that the little Mersenne-related code development time that has left me has mostly gone into coding a fast C-based sieve factorer. I've gotten the factoring code to really blast on the Alpha and Itanium (both of which have excellent 64x64==>128-bit multiply capability), and with the help of Tom C. and especially Klaus Kastens, gotten pretty good performance on the Mac/PPC, as well. This effort will also help in the mixed float/int code, though, because figuring out how to do speedy wide integer multiply on the various platforms is crucial to that. I'm hopeful that this kind of code could run well on the AMD Opteron, too, since I hear those also have good 64x64==>128-bit multiply capability. They need 4 cycles to get a 128-bit integer product, which is 2x as many as the Alpha and Itanium, but especially in non-factoring code this slight extra cycle count can be hidden by interleaving the integer muls with other integer operations that are going on. |
|
|
|
|
|
#3 | |
|
Jan 2003
North Carolina
F616 Posts |
Quote:
|
|
|
|
|
|
|
#4 |
|
Apr 2003
Berlin, Germany
5518 Posts |
Oh, I overlooked that latest Mlucas sources I saw were C+ASM code.
Well then we could try Portland Groups C/C++ compilers which are in the same workstation compiler suite like their F90 compiler. And since it is free, we could give it a try. But first I have to find out if these compiler binaries are runnable on standard platforms because they are compiled for Opteron. If they are thought to run in 32bit mode (Windows and 32bit Linux) then they should run at least on P4s because of possible need of SSE2. I'll try them today. Besides the 4 cycle latency on AMD64 CPUs it's at least possible to pipeline it by starting a 64bit mul every 2 cycles. Regards, DDB |
|
|
|
|
|
#5 | |
|
Apr 2003
Berlin, Germany
192 Posts |
Quote:
The Itanium 2 with Madison core will reach speeds of 1.4 to maybe 1.7 GHz till year end. Alpha CPUs also lie in this range but the Opteron (and especially the smaller core Athlon 64) will be at 2.4 till 3 GHz (if a certain AMD rep statement is true) then. Now we get 1.8GHz Opterons (unfortunately the price is a bit higher now than in april because of demand) and soon 2GHz (Cray is already getting 2GHz chips). Together with PPC970 this will create a wide base of mainstream 64bit PCs. Intel will surely follow in the next years. Also the amount of 64bit CPUs in PDAs will grow because Microsoft will once again support MIPS in coming PocketPC OS releases. If there are enough of them and the cost of letting them run all the time (shouldn't harm too much if everything else is off) is ok then we could try finding a new user base there. StrongARM users can already use Nick's StrongARM client. So the others could join in. Regards, Matthias |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) | jasong | jasong | 35 | 2016-12-11 00:57 |
| CUDA for ARM Platforms | robertom | GPU Computing | 0 | 2013-08-27 13:30 |
| AMD64 on Solaris | Kyle | Software | 9 | 2012-11-26 13:27 |
| ggnfs on amd64 | fivemack | Factoring | 1 | 2007-02-28 00:13 |
| llr on AMD64 ? | irzyxel | 3*2^n-1 Search | 4 | 2004-05-11 07:38 |