mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2009-07-12, 05:38   #1
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2×5×239 Posts
Default Prime95 25.9 on a Beowulf cluster?

Does anyone know how well Prime95 (version 25.9) would perform on a Beowulf cluster? I know that LL tests aren't very parallelizable unless each core is running an instance of such a test.

Is it possible to run Prime95 on a Beowulf cluster efficiently, or will the cores need to constantly communicate with the server node?
ixfd64 is offline   Reply With Quote
Old 2009-07-24, 02:04   #2
hj47
 
hj47's Avatar
 
Oct 2008

26 Posts
Default

This may be worth a read:

http://www.beowulf.org/archive/2008-June/021612.html
hj47 is offline   Reply With Quote
Old 2009-07-24, 12:50   #3
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

10000101010112 Posts
Default

Quote:
Originally Posted by hj47 View Post
Quote:
AMD is by far fastest right now for integer multiplication (oldie K8
beating core2 amazingly).
Would a modern AMD with an optimized integer multiplication algorithm be faster than it would with George's optimized floating point arithmetic?
Mini-Geek is offline   Reply With Quote
Old 2009-07-24, 20:24   #4
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

52×17 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Would a modern AMD with an optimized integer multiplication algorithm be faster than it would with George's optimized floating point arithmetic?
Um, kinda hard to tell, since GW's code has been under heavy optimization for over 10 years, and we haven't seen an all integer implementation that we can even try yet let alone one thats been optimized for 10 years. Most people who seem capable of producing one seem to think it can't be done thus it won't get done.

Actually I guess there may have been a few integer FFT implementations, but not full Mersenne prime tests, out there but they just haven't been competitive yet. They may never be.

Another problem is that the latest AMD CPUs have better floating point performance (Phenom II) so the exploitable difference is vanishing. This further reduces the hardware base to just a generation or two of one manufacturer's product (the Athlon X2s mainly I think) making the 10 year effort even less likely.

As you can imagine this whole topic can get rather controversial and erupt into flame wars since there are people who are vehemently attached to various brands. Please tread carefully.
lfm is offline   Reply With Quote
Old 2009-07-24, 22:49   #5
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

Integer-arithmetic FFT implementations have been discussed many times in this forum ever since it started. (Do a search!)

There's a thread in the Software subforum right now!

"Faster Lucas-Lehmer test using integer arithmetic?"

http://mersenneforum.org/showthread.php?t=11243
cheesehead is offline   Reply With Quote
Old 2009-07-29, 17:54   #6
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2×5×239 Posts
Default

I'm sorry if I wasn't too clear in my question.

I meant, if I were to install Prime95 on a cluster (something like this), would it run a separate instance on each core, as with a normal computer? Also, would each node need to constantly communicate with each other?
ixfd64 is offline   Reply With Quote
Old 2009-07-30, 06:23   #7
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

52·17 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
I'm sorry if I wasn't too clear in my question.

I meant, if I were to install Prime95 on a cluster (something like this), would it run a separate instance on each core, as with a normal computer? Also, would each node need to constantly communicate with each other?
Each node would be separate.
lfm is offline   Reply With Quote
Old 2009-07-30, 08:39   #8
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

32·5·107 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
I'm sorry if I wasn't too clear in my question.

I meant, if I were to install Prime95 on a cluster (something like this), would it run a separate instance on each core, as with a normal computer? Also, would each node need to constantly communicate with each other?
For the optimal settings:
Yes, it should run a separate instance on each core
No, for that setting there is no implied inter-process communication.

Luigi
ET_ is offline   Reply With Quote
Old 2009-07-30, 19:56   #9
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2×5×239 Posts
Default

This may be a stupid question, but is there a limit on how many cores Prime95 can handle? I reckon it's going to have some difficulties when it has to display 50 worker windows.
ixfd64 is offline   Reply With Quote
Old 2009-07-30, 20:04   #10
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

12CF16 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
This may be a stupid question, but is there a limit on how many cores Prime95 can handle? I reckon it's going to have some difficulties when it has to display 50 worker windows.
Before Prime95 reaches 50 worker windows it will have some difficulties with IPC and efficient shared memory use...

Luigi
ET_ is offline   Reply With Quote
Old 2009-07-30, 22:00   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011101112 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Would a modern AMD with an optimized integer multiplication algorithm be faster than it would with George's optimized floating point arithmetic?
Short answer: No.

Short explanation: 64-bit Pure-int arithmetic gives only a modest accuracy improvement (on the order of 10%) over DP floating-point, and typically requires more arithmetic operations due to the need for modding intermediate results, thus negating any beneift of being able to use a slightly shorter tranform length. A Core2-style chip can do roughly 2x as many DP floating-point ops per cycle (mainly adds and muls in some ratio typical of transform arithmetic, e.g. ~2 adds for every mul) as any commodity CPU can do 64-bit integer arithmetic ops.

The coming Intel AVX (256-bit-wide vector ops) units, assuming they will eventually be available in quad-pumped DP-float fashion, will only magnify the above disparity.
ewmayer is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cluster software fivemack Software 5 2016-09-27 22:13
Cuda and a cluster efiGeek Msieve 17 2015-12-06 14:31
GGNFS under SuSe cluster VolMike Factoring 7 2008-01-23 01:23
Prime95 on a Cluster??? georgekh Software 22 2004-11-09 14:39
Cluster @ MSRC smh NFSNET Discussion 1 2003-08-12 08:52

All times are UTC. The time now is 23:53.


Fri Jul 16 23:53:14 UTC 2021 up 49 days, 21:40, 1 user, load averages: 2.08, 1.62, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.