mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Closed Thread
 
Thread Tools
Old 2003-12-07, 18:20   #1
MadMac
 

22×32×52 Posts
Default Opteron/Athlon64 Performance

Has anyone notice how abismal the performance the GIMPS client runs of the Opteron or Athlon64? It is about 2-2.5 times slower than an equivalent P4! So much for a broadly optimized x86 client. I suspect it isn't detecting/using SSE2 extensions in these processors. Anyone know if this is the case or if there is a fix in the works?

Thanks,
Sean
 
Old 2003-12-07, 22:30   #2
only_human
 
only_human's Avatar
 
"Gang aft agley"
Sep 2002

2×1,877 Posts
Default

What I know about the state of development of the GIMPS client code for the Opteron, I learned from reading this thread through to the end: Let's buy GIMPS an Opteron! (page 8)
As far as I can tell there are problems at the moment in keeping the FPU fully utilized. Also there have been some problems with certain tools used for timing and profiling.
I am certain there is no misunderstanding about the availability of SSE2 extensions in AMD's 64bit processors. This thread has some timing figures for an Athlon 64:http://www.mersenneforum.org/showthr...highlight=SSE2
You will see that the client detected the SSE2 extensions for the CPU
Quote:
AMD Athlon(tm) 64 Processor 3200+
CPU speed: 2362.53 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE, SSE2
If you are running the PRIME95 client, you can check if the program is aware of SSE2 by examining the CPU settings and information in the options menu.

Tuning the client for maximum performance on these chips may take some time. The GIMPS clients are notoriously well written to utilize CPU resources. It might be that superb coding of the P4 code that makes current Opteron timing look poor by comparison -- IMO

--Ross

Last fiddled with by only_human on 2003-12-07 at 22:32
only_human is offline  
Old 2003-12-07, 22:35   #3
Complex33
 
Complex33's Avatar
 
Aug 2002
Texas

9B16 Posts
Default

MadMac

The performance of Opteron and Athlon64 chips have been widely discussed on this forum. As for the optimization, there are few programs out there that have the level of optimization that Prime95 has, due to the tireless work of hand tuning by George Woltman. With respect to the Opteron, the GIMPS community has been gracious enough to purchase George such as system to work on optimizations, as he previously did not have access to such a machine. I would suggest performing a search of the forum for threads related to Opterons and Athlons.

Last fiddled with by Complex33 on 2003-12-07 at 22:36
Complex33 is offline  
Old 2003-12-08, 11:29   #4
QuintLeo
 
QuintLeo's Avatar
 
Oct 2002
Lost in the hills of Iowa

26×7 Posts
Default

The reason the Athlon performs relatively poorly is that the SSE2 enhancements to the P4 are almost tailor-made to GIMPS usage. It's one of the VERY FEW places where the P4 does better per Mhz than the Athlon. Note that the Athlon does not and never has and never will support SSE2.

The Opteron does support SSE2, but it falls behind the P4 due to clock speed - the fastest Operon is about 40% slower than the fastest P4, and a little less slow on Prime - this gap might narrow once George gets the time to add optimizations for the Opteron, but the clock speed will still be a handicap unless AMD manages to ramp it up faster than Intel has managed to ramp up P4 clock speeds.

Narrow-focus distributed projects tend to have wide performance variations, *especially* those that are well-optimised. They tend to rely on a VERY NARROW subset of the instruction set of a modern processor, which different processors impliment way differently. In example, the Distributed.Net RC5 client relies VERY heavily on a specific rotate instruction - the Athlon happens to have that rotate set up in hardware, the P4 uses microcode for the same instruction, which makes the Athlon a lot faster on RC5. In general usage, though, that particular instruction is very little used.

For Prime usage, the advantage of SSE2 is the more numerous floating point registers coupled with the ability to perform actions on more registers at one time, which makes the FFT work Prime does a lot faster. The Athlon is faster at a lot of FP work, but if the limits of SSE2 can be worked within, SSE2 blows away Athlon FP performance.

If the AltiVec unit in the G4/G5/recent Power PC cpus handled double-precision FP work, *that* CPU would be massively faster on Prime work (presuming an AltiVec-enabled client existsd) than even the P4.
QuintLeo is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
In ggnfs lasieve4 experimental, what's the difference between the athlon64 and x64 folders? Dubslow Factoring 3 2016-10-12 10:58
Help needed to test Athlon64 code geoff Programming 7 2006-08-18 12:16
Let's buy GIMPS an Opteron! Xyzzy Lounge 264 2006-08-17 12:39
Athlon64 support? JuanTutors Software 1 2004-06-04 02:46
AMD Opteron naclosagc Software 27 2003-08-10 19:14

All times are UTC. The time now is 20:07.

Wed Apr 14 20:07:14 UTC 2021 up 6 days, 14:48, 0 users, load averages: 2.44, 2.67, 2.66

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.