![]() |
Prime95 25.9 on a Beowulf cluster?
Does anyone know how well Prime95 (version 25.9) would perform on a Beowulf cluster? I know that LL tests aren't very parallelizable unless each core is running an instance of such a test.
Is it possible to run Prime95 on a Beowulf cluster efficiently, or will the cores need to constantly communicate with the server node? |
This may be worth a read:
[url]http://www.beowulf.org/archive/2008-June/021612.html[/url] |
[quote=hj47;182460]This may be worth a read:
[URL]http://www.beowulf.org/archive/2008-June/021612.html[/URL][/quote] [quote]AMD is by far fastest right now for integer multiplication (oldie K8 beating core2 amazingly). [/quote]Would a modern AMD with an optimized integer multiplication algorithm be faster than it would with George's optimized floating point arithmetic? |
[QUOTE=Mini-Geek;182521]Would a modern AMD with an optimized integer multiplication algorithm be faster than it would with George's optimized floating point arithmetic?[/QUOTE]
Um, kinda hard to tell, since GW's code has been under heavy optimization for over 10 years, and we haven't seen an all integer implementation that we can even try yet let alone one thats been optimized for 10 years. Most people who seem capable of producing one seem to think it can't be done thus it won't get done. Actually I guess there may have been a few integer FFT implementations, but not full Mersenne prime tests, out there but they just haven't been competitive yet. They may never be. Another problem is that the latest AMD CPUs have better floating point performance (Phenom II) so the exploitable difference is vanishing. This further reduces the hardware base to just a generation or two of one manufacturer's product (the Athlon X2s mainly I think) making the 10 year effort even less likely. As you can imagine this whole topic can get rather controversial and erupt into flame wars since there are people who are vehemently attached to various brands. Please tread carefully. |
Integer-arithmetic FFT implementations have been discussed many times in this forum ever since it started. (Do a search!)
There's a thread in the Software subforum right now! "Faster Lucas-Lehmer test using integer arithmetic?" [url]http://mersenneforum.org/showthread.php?t=11243[/url] |
I'm sorry if I wasn't too clear in my question.
I meant, if I were to install Prime95 on a cluster (something like [url=http://www.flickr.com/photos/aussierupe/174642404]this[/url]), would it run a separate instance on each core, as with a normal computer? Also, would each node need to constantly communicate with each other? |
[QUOTE=ixfd64;183315]I'm sorry if I wasn't too clear in my question.
I meant, if I were to install Prime95 on a cluster (something like [url=http://www.flickr.com/photos/aussierupe/174642404]this[/url]), would it run a separate instance on each core, as with a normal computer? Also, would each node need to constantly communicate with each other?[/QUOTE] Each node would be separate. |
[QUOTE=ixfd64;183315]I'm sorry if I wasn't too clear in my question.
I meant, if I were to install Prime95 on a cluster (something like [url=http://www.flickr.com/photos/aussierupe/174642404]this[/url]), would it run a separate instance on each core, as with a normal computer? Also, would each node need to constantly communicate with each other?[/QUOTE] For the optimal settings: Yes, it should run a separate instance on each core No, for that setting there is no implied inter-process communication. Luigi |
This may be a stupid question, but is there a limit on how many cores Prime95 can handle? I reckon it's going to have some difficulties when it has to display 50 worker windows.
|
[QUOTE=ixfd64;183437]This may be a stupid question, but is there a limit on how many cores Prime95 can handle? I reckon it's going to have some difficulties when it has to display 50 worker windows.[/QUOTE]
Before Prime95 reaches 50 worker windows it will have some difficulties with IPC and efficient shared memory use... Luigi |
[QUOTE=Mini-Geek;182521]Would a modern AMD with an optimized integer multiplication algorithm be faster than it would with George's optimized floating point arithmetic?[/QUOTE]
Short answer: No. Short explanation: 64-bit Pure-int arithmetic gives only a modest accuracy improvement (on the order of 10%) over DP floating-point, and typically requires more arithmetic operations due to the need for modding intermediate results, thus negating any beneift of being able to use a slightly shorter tranform length. A Core2-style chip can do roughly 2x as many DP floating-point ops per cycle (mainly adds and muls in some ratio typical of transform arithmetic, e.g. ~2 adds for every mul) as any commodity CPU can do 64-bit integer arithmetic ops. The coming Intel AVX (256-bit-wide vector ops) units, assuming they will eventually be available in quad-pumped DP-float fashion, will only magnify the above disparity. |
| All times are UTC. The time now is 04:35. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.