mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2003-05-07, 11:14   #1
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default Mlucas probably very fast on AMD64 platforms

Hello,

following image shows, that 189.lucas, part of SPEC 2000 suite and based on Mlucas code is running a bit faster on Athlon 64 (here with 1MB L2 like Opteron) than on a P4 with 80% clock frequency advantage (score of 880 vs. 864)

http://www.heise.de/ct/03/01/018/bild.gif

One could try to compile Mlucas for AMD64 platforms using compilers like the Portland Group F90 compiler which I mentioned in The Hardware forum.

The only thing that prevents me from testing this is the availability of some Opteron system (although there are entry systems starting at $1200+).

Next year such x86-64 platforms (maybe also from Intel) will get some relevance for distributed computing.

I'm also looking forward to EWMs developments in parallel FP/Integer calculations for Alpha CPUs since that would also make sense on Opteron. :)

Regards,
Matthias
Dresdenboy is offline   Reply With Quote
Old 2003-05-09, 17:22   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011101112 Posts
Default

Thanks for the info, Dresdenboy. But some caveats: the FFT code I used for my submission to the SpecFP suite is light-years removed from the one in the currrent version of Mlucas, so it's not at all clear whether the performance difference seen in your chart will carry over to the current code.

Also, you don't need an f90 compiler anymore, since I the latest development version of Mlucas is in C, and is at ftp://hogranch.com/pub/mayer/src/C . I haven't personally built it on an Athlon (just on my P3 using CodeWarrior, which gives a binary that is roughly 50% as fast as Prime95 running on the same machine - no surprise there, since I've done little x86-oriented tuning), but Tom Cage ( k5gj@earthlink.net ) regularly does so.

Re. the mixed float/int code, that has been on hold due to the demands of my work-for-pay job, and the fact that the little Mersenne-related code development time that has left me has mostly gone into coding a fast C-based sieve factorer. I've gotten the factoring code to really blast on the Alpha and Itanium (both of which have excellent 64x64==>128-bit multiply capability), and with the help of Tom C. and especially Klaus Kastens, gotten pretty good performance on the Mac/PPC, as well. This effort will also help in the mixed float/int code, though, because figuring out how to do speedy wide integer multiply on the various platforms is crucial to that.

I'm hopeful that this kind of code could run well on the AMD Opteron, too, since I hear those also have good 64x64==>128-bit multiply capability. They need 4 cycles to get a 128-bit integer product, which is 2x as many as the Alpha and Itanium, but especially in non-factoring code this slight extra cycle count can be hidden by interleaving the integer muls with other integer operations that are going on.
ewmayer is offline   Reply With Quote
Old 2003-05-10, 04:36   #3
nomadicus
 
nomadicus's Avatar
 
Jan 2003
North Carolina

2×3×41 Posts
Default

Quote:
Originally Posted by ewmayer
Re. the mixed float/int code, . . . I've gotten the factoring code to really blast on the Alpha and Itanium (both of which have excellent 64x64==>128-bit multiply capability),
I love that Alpha. Thanks again for all your help ewmayer. :D
nomadicus is offline   Reply With Quote
Old 2003-05-11, 11:09   #4
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

36110 Posts
Default

Oh, I overlooked that latest Mlucas sources I saw were C+ASM code.

Well then we could try Portland Groups C/C++ compilers which are in the same workstation compiler suite like their F90 compiler. And since it is free, we could give it a try.

But first I have to find out if these compiler binaries are runnable on standard platforms because they are compiled for Opteron. If they are thought to run in 32bit mode (Windows and 32bit Linux) then they should run at least on P4s because of possible need of SSE2. I'll try them today.

Besides the 4 cycle latency on AMD64 CPUs it's at least possible to pipeline it by starting a 64bit mul every 2 cycles.

Regards,
DDB
Dresdenboy is offline   Reply With Quote
Old 2003-06-12, 09:30   #5
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

5518 Posts
Default

Quote:
Originally Posted by ewmayer
I'm hopeful that this kind of code could run well on the AMD Opteron, too, since I hear those also have good 64x64==>128-bit multiply capability. They need 4 cycles to get a 128-bit integer product, which is 2x as many as the Alpha and Itanium, but especially in non-factoring code this slight extra cycle count can be hidden by interleaving the integer muls with other integer operations that are going on.
One addition:
The Itanium 2 with Madison core will reach speeds of 1.4 to maybe 1.7 GHz till year end. Alpha CPUs also lie in this range but the Opteron (and especially the smaller core Athlon 64) will be at 2.4 till 3 GHz (if a certain AMD rep statement is true) then. Now we get 1.8GHz Opterons (unfortunately the price is a bit higher now than in april because of demand) and soon 2GHz (Cray is already getting 2GHz chips). Together with PPC970 this will create a wide base of mainstream 64bit PCs. Intel will surely follow in the next years.

Also the amount of 64bit CPUs in PDAs will grow because Microsoft will once again support MIPS in coming PocketPC OS releases. If there are enough of them and the cost of letting them run all the time (shouldn't harm too much if everything else is off) is ok then we could try finding a new user base there. StrongARM users can already use Nick's StrongARM client. So the others could join in.

Regards,
Matthias
Dresdenboy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) jasong jasong 35 2016-12-11 00:57
CUDA for ARM Platforms robertom GPU Computing 0 2013-08-27 13:30
AMD64 on Solaris Kyle Software 9 2012-11-26 13:27
ggnfs on amd64 fivemack Factoring 1 2007-02-28 00:13
llr on AMD64 ? irzyxel 3*2^n-1 Search 4 2004-05-11 07:38

All times are UTC. The time now is 06:16.


Sat Jul 17 06:16:35 UTC 2021 up 50 days, 4:03, 1 user, load averages: 1.38, 1.30, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.