mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2008-03-25, 04:34   #12
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

7·331 Posts
Default

It looks like that StarQwest has got competition!
ixfd64 is offline   Reply With Quote
Old 2008-03-25, 11:27   #13
Cruelty
 
Cruelty's Avatar
 
May 2005

31228 Posts
Default

I have also observed CPU utilization during this short test, and Prime95 was using 90-95% which is not bad I wonder how the newer C2Q will improve this timings given the larger L2 cache, FSB @ 1600 (X48 chipset) and some fast DDR3 memory...
Cruelty is offline   Reply With Quote
Old 2008-03-25, 14:08   #14
lycorn
 
lycorn's Avatar
 
Sep 2002
Oeiras, Portugal

142310 Posts
Default

Quote:
Originally Posted by SlashDude View Post
I didn't know about this!

Some quick numbers:

Running 1 test on 16 cores (on the HP DL580) runs at .020 seconds per iteration on an exponent in the 47,000,000 range. This has the box running at ~51% total.
Single test: (CPU Affinity set to run on first CPU - Each addition thread took the CPU's in order - 0,1,2,3,4,5...)
Code:
Cores  Iteration   Box%
16     .020 sec    51%
8      .022 sec    40%
7      .023 sec    36%
6      .024 sec    32%
5      .029 sec    26%
4      .029 sec    23%
3      .034 sec    18%
2      .037 sec    13%
1      .067 sec     7%
Code:
Single test: (CPU Affinity set to run on any)
Cores  Iteration   Box%
16     .024 sec    41-51%
8      .024 sec    31%
7      .046 sec    20%
6      .051 sec    18%
5      .055 sec    17%
4      .029 sec    25%
3      .035 sec    19%
2      .050 sec    13%
1      .067 sec     7%
Thank you for running the tests.
I think that in this case the scalability is not just a matter of memory contention (if at all), but it has also something to do with the way the computation of the FFTs is performed (the sharing of the work between the different CPUs).
It would be good to have George Woltman´s word on this.
George, any comments?
lycorn is offline   Reply With Quote
Old 2008-03-25, 17:32   #15
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

715810 Posts
Default

I'm hardly surprised prime95 scales this poorly. I'd have to research this more, but I suspect the problem is in pass 1 of the FFT. Whereas in pass 2 the 16 threads can all operate independently, in pass 1 the carry propagation requires that all the threads to closely cooperate. Also, the more threads you throw at an FFT, the more you negate the benefits of memory prefetching. To get better scaling might require a complete rethinking of memory layouts and thread cooperation.
Prime95 is offline   Reply With Quote
Old 2008-03-25, 19:55   #16
SlashDude
 
SlashDude's Avatar
 
Aug 2002
Minneapolis, MN

111001002 Posts
Default

Quote:
Originally Posted by lpmurray View Post
Sorry but the number you did was only 21,977,176 digits long, a hundred million digit number must be 9 digits long..... 332192809 is exactly 100million digits long. You must run that number or larger to be running a 100million digit number. Also thank you for running.

Oops, my bad

Here is a test using M332192831:

Code:
Threads	HP	IBM
	2.4GHz	3.98GHz
16	0.172	0.17
15	0.174	0.176
14	0.172	0.174
13	0.176	0.182
12	0.173	0.179
11	0.177	0.187
10	0.176	0.185
9	0.186	0.2
8	0.189	0.189
7	0.198	0.204
6	0.21	0.22
5	0.251	0.262
4	0.262	0.304
3	0.309	0.38
2	0.359	0.33
1	0.661	0.586
-SD
SlashDude is offline   Reply With Quote
Old 2008-03-25, 21:54   #17
Cruelty
 
Cruelty's Avatar
 
May 2005

2×809 Posts
Default

Looks like 16-core system matches my overclocked C2Q
Cruelty is offline   Reply With Quote
Old 2008-11-16, 13:15   #18
tekkamichael
 
Sep 2008

5 Posts
Default

Are there news about the scalability of P95?
tekkamichael is offline   Reply With Quote
Old 2009-01-17, 21:03   #19
joblack
 
joblack's Avatar
 
Oct 2008
n00bville

2D516 Posts
Default

Quote:
Originally Posted by tekkamichael View Post
Are there news about the scalability of P95?
I´m interested in that as well. Without the scalability an oct-core doesn´t make any sense.
joblack is offline   Reply With Quote
Old 2009-01-18, 06:05   #20
jasong
 
jasong's Avatar
 
"Jason Goatcher"
Mar 2005

DB116 Posts
Default

I know I'm going to sound like a broken record, but when a graphics card prime number testing program finally becomes public, it's going to blow the cpu-based ones out of the water.

What people don't seem to understand is that if an implementation of an algorithm quits working, or doesn't work for a new situation, then maybe it's time to re-think how things are done. Some of the first cars were powered by steam. If you'd told those people that someday people would be in their cars and gone in under a minute, with potential speeds of over 100mph, they would have thought you were nuts.

I don't understand the math, and I'm not going to pretend like I have the skills to do what I'm talking about. I'm not even going to suggest that someone "should" attempt it. But think about it. We're dealing with a very specialized algorithm, basically you're doing the same thing over and over again an ungodly number of times. It's just(I wish there was a more humble word to put here) that you're dealing with a brand new architecture. Think about other cultures, older cultures. Not all of them used base-10, some used base-5, base-20, base-120, but the basic truths of mathematics and logical thinking hold true no matter what base you use. If you express the value pi in base-7(or whatever) it's still the same number in a sense. And in the case of the problem we're dealing with, you're still dealing with base-2. It's simply(again, I wish there was a more humble word to put here) a matter of adjusting to a new architecture.
jasong is offline   Reply With Quote
Old 2009-01-18, 09:06   #21
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

13×367 Posts
Default

Quote:
Originally Posted by jasong View Post
I know I'm going to sound like a broken record, but when a graphics card prime number testing program finally becomes public, it's going to blow the cpu-based ones out of the water.

What people don't seem to understand is that if an implementation of an algorithm quits working, or doesn't work for a new situation, then maybe it's time to re-think how things are done. Some of the first cars were powered by steam. If you'd told those people that someday people would be in their cars and gone in under a minute, with potential speeds of over 100mph, they would have thought you were nuts.

I don't understand the math, and I'm not going to pretend like I have the skills to do what I'm talking about. I'm not even going to suggest that someone "should" attempt it. But think about it. We're dealing with a very specialized algorithm, basically you're doing the same thing over and over again an ungodly number of times. It's just(I wish there was a more humble word to put here) that you're dealing with a brand new architecture. Think about other cultures, older cultures. Not all of them used base-10, some used base-5, base-20, base-120, but the basic truths of mathematics and logical thinking hold true no matter what base you use. If you express the value pi in base-7(or whatever) it's still the same number in a sense. And in the case of the problem we're dealing with, you're still dealing with base-2. It's simply(again, I wish there was a more humble word to put here) a matter of adjusting to a new architecture.
If the new architecture you are talking about has no basement and allow only one (beautiful) floor with nice windows and arches, you just can't build a four floor building on it.

Luigi
ET_ is offline   Reply With Quote
Old 2009-01-18, 10:00   #22
joblack
 
joblack's Avatar
 
Oct 2008
n00bville

72510 Posts
Default

Even if a graphics card version will be released it is still a good idea to optimize the parallel prime95 version (quad and especially oct cores will be normal in two years).

Quote:
Originally Posted by jasong View Post
I know I'm going to sound like a broken record, but when a graphics card prime number testing program finally becomes public, it's going to blow the cpu-based ones out of the water.

What people don't seem to understand is that if an implementation of an algorithm quits working, or doesn't work for a new situation, then maybe it's time to re-think how things are done. Some of the first cars were powered by steam. If you'd told those people that someday people would be in their cars and gone in under a minute, with potential speeds of over 100mph, they would have thought you were nuts.

I don't understand the math, and I'm not going to pretend like I have the skills to do what I'm talking about. I'm not even going to suggest that someone "should" attempt it. But think about it. We're dealing with a very specialized algorithm, basically you're doing the same thing over and over again an ungodly number of times. It's just(I wish there was a more humble word to put here) that you're dealing with a brand new architecture. Think about other cultures, older cultures. Not all of them used base-10, some used base-5, base-20, base-120, but the basic truths of mathematics and logical thinking hold true no matter what base you use. If you express the value pi in base-7(or whatever) it's still the same number in a sense. And in the case of the problem we're dealing with, you're still dealing with base-2. It's simply(again, I wish there was a more humble word to put here) a matter of adjusting to a new architecture.
joblack is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Xeon vs. Quad CPU (775) EdH Hardware 19 2017-06-08 22:06
"Nehalem" quad-cores faster than 100 GFLOPS? ixfd64 Hardware 11 2009-03-09 18:17
What's the better quad? CRGreathouse Hardware 51 2009-03-04 01:32
Quad Core and P95 sgrupp Hardware 54 2008-01-25 22:01
Quad Core R.D. Silverman Hardware 76 2007-11-19 21:57

All times are UTC. The time now is 00:45.

Tue Nov 24 00:45:23 UTC 2020 up 74 days, 21:56, 4 users, load averages: 2.48, 2.58, 2.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.