It looks like that StarQwest has got competition! :grin:

I have also observed CPU utilization during this short test, and Prime95 was using 9095% which is not bad :tu: I wonder how the newer C2Q will improve this timings given the larger L2 cache, FSB @ 1600 (X48 chipset) and some fast DDR3 memory...

[QUOTE=SlashDude;129592]I didn't know about this! :smile:
Some quick numbers: Running 1 test on 16 cores (on the HP DL580) runs at .020 seconds per iteration on an exponent in the 47,000,000 range. This has the box running at ~51% total. Single test: (CPU Affinity set to run on first CPU  Each addition thread took the CPU's in order  0,1,2,3,4,5...) [CODE]Cores Iteration Box% 16 .020 sec 51% 8 .022 sec 40% 7 .023 sec 36% 6 .024 sec 32% 5 .029 sec 26% 4 .029 sec 23% 3 .034 sec 18% 2 .037 sec 13% 1 .067 sec 7%[/CODE][CODE]Single test: (CPU Affinity set to run on any) Cores Iteration Box% 16 .024 sec 4151% 8 .024 sec 31% 7 .046 sec 20% 6 .051 sec 18% 5 .055 sec 17% 4 .029 sec 25% 3 .035 sec 19% 2 .050 sec 13% 1 .067 sec 7%[/CODE] [/QUOTE] Thank you for running the tests. I think that in this case the scalability is not just a matter of memory contention (if at all), but it has also something to do with the way the computation of the FFTs is performed (the sharing of the work between the different CPUs). It would be good to have George Woltman´s word on this. George, any comments? 
I'm hardly surprised prime95 scales this poorly. I'd have to research this more, but I suspect the problem is in pass 1 of the FFT. Whereas in pass 2 the 16 threads can all operate independently, in pass 1 the carry propagation requires that all the threads to closely cooperate. Also, the more threads you throw at an FFT, the more you negate the benefits of memory prefetching. To get better scaling might require a complete rethinking of memory layouts and thread cooperation.

[QUOTE=lpmurray;129627]Sorry but the number you did was only 21,977,176 digits long, a hundred million digit number must be 9 digits long..... 332192809 is exactly 100million digits long. You must run that number or larger to be running a 100million digit number. Also thank you for running.[/QUOTE]
Oops, my bad :redface: Here is a test using M332192831: [CODE]Threads HP IBM 2.4GHz 3.98GHz 16 0.172 0.17 15 0.174 0.176 14 0.172 0.174 13 0.176 0.182 12 0.173 0.179 11 0.177 0.187 10 0.176 0.185 9 0.186 0.2 8 0.189 0.189 7 0.198 0.204 6 0.21 0.22 5 0.251 0.262 4 0.262 0.304 3 0.309 0.38 2 0.359 0.33 1 0.661 0.586[/CODE] SD 
Looks like 16core system matches my overclocked C2Q :smile:

Are there news about the scalability of P95?

[quote=tekkamichael;149521]Are there news about the scalability of P95?[/quote]
I´m interested in that as well. Without the scalability an octcore doesn´t make any sense. 
I know I'm going to sound like a broken record, but when a graphics card prime number testing program finally becomes public, it's going to blow the cpubased ones out of the water.
What people don't seem to understand is that if an implementation of an algorithm quits working, or doesn't work for a new situation, then maybe it's time to rethink how things are done. Some of the first cars were powered by steam. If you'd told those people that someday people would be in their cars and gone in under a minute, with potential speeds of over 100mph, they would have thought you were nuts. I don't understand the math, and I'm not going to pretend like I have the skills to do what I'm talking about. I'm not even going to suggest that someone "should" attempt it. But think about it. We're dealing with a very specialized algorithm, basically you're doing the same thing over and over again an ungodly number of times. It's just(I wish there was a more humble word to put here) that you're dealing with a brand new architecture. Think about other cultures, older cultures. Not all of them used base10, some used base5, base20, base120, but the basic truths of mathematics and logical thinking hold true no matter what base you use. If you express the value pi in base7(or whatever) it's still the same number in a sense. And in the case of the problem we're dealing with, you're still dealing with base2. It's simply(again, I wish there was a more humble word to put here) a matter of adjusting to a new architecture. 
[QUOTE=jasong;159208]I know I'm going to sound like a broken record, but when a graphics card prime number testing program finally becomes public, it's going to blow the cpubased ones out of the water.
What people don't seem to understand is that if an implementation of an algorithm quits working, or doesn't work for a new situation, then maybe it's time to rethink how things are done. Some of the first cars were powered by steam. If you'd told those people that someday people would be in their cars and gone in under a minute, with potential speeds of over 100mph, they would have thought you were nuts. I don't understand the math, and I'm not going to pretend like I have the skills to do what I'm talking about. I'm not even going to suggest that someone "should" attempt it. But think about it. We're dealing with a very specialized algorithm, basically you're doing the same thing over and over again an ungodly number of times. It's just(I wish there was a more humble word to put here) that you're dealing with a brand new architecture. Think about other cultures, older cultures. Not all of them used base10, some used base5, base20, base120, but the basic truths of mathematics and logical thinking hold true no matter what base you use. If you express the value pi in base7(or whatever) it's still the same number in a sense. And in the case of the problem we're dealing with, you're still dealing with base2. It's simply(again, I wish there was a more humble word to put here) a matter of adjusting to a new architecture.[/QUOTE] If the new architecture you are talking about has no basement and allow only one (beautiful) floor with nice windows and arches, you just can't build a four floor building on it. Luigi 
Even if a graphics card version will be released it is still a good idea to optimize the parallel prime95 version (quad and especially oct cores will be normal in two years).
[quote=jasong;159208]I know I'm going to sound like a broken record, but when a graphics card prime number testing program finally becomes public, it's going to blow the cpubased ones out of the water. What people don't seem to understand is that if an implementation of an algorithm quits working, or doesn't work for a new situation, then maybe it's time to rethink how things are done. Some of the first cars were powered by steam. If you'd told those people that someday people would be in their cars and gone in under a minute, with potential speeds of over 100mph, they would have thought you were nuts. I don't understand the math, and I'm not going to pretend like I have the skills to do what I'm talking about. I'm not even going to suggest that someone "should" attempt it. But think about it. We're dealing with a very specialized algorithm, basically you're doing the same thing over and over again an ungodly number of times. It's just(I wish there was a more humble word to put here) that you're dealing with a brand new architecture. Think about other cultures, older cultures. Not all of them used base10, some used base5, base20, base120, but the basic truths of mathematics and logical thinking hold true no matter what base you use. If you express the value pi in base7(or whatever) it's still the same number in a sense. And in the case of the problem we're dealing with, you're still dealing with base2. It's simply(again, I wish there was a more humble word to put here) a matter of adjusting to a new architecture.[/quote] 
All times are UTC. The time now is 05:40. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2021, Jelsoft Enterprises Ltd.