mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2008-12-23, 20:26   #12
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

42510 Posts
Default

Quote:
Originally Posted by S00113 View Post
T
Every different configuration of CPU, cache, frequenzy, RAM amount, timings and frequenzies, etc, perform differently. All have some strenghts and some weaknesses compared to itself and to other machines participating in GIMPS. The client should have an option to benchmark itself and try to find the perfect thread combination and kind of work for each machine based on benchmarks and wishes of the owner.
It does not follow that there IS a perfect combination and or kind of work for each machine. Given the various priorities and needs for each user and the various aspects of performance (i.e raw speed vs response times vs power consumption vs memory availability vs cache sizes vs networking speeds). It would still come down to some users wanting to override whatever default work selection gets picked. So it makes sense just to start with giving the users lots of override choices and doing simple defaults.
lfm is offline   Reply With Quote
Old 2008-12-23, 20:36   #13
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×641 Posts
Default

Quote:
Originally Posted by S485122 View Post
I speak of course about benchmarks done when not doing anything else with the computer. On my 64 bits system there are a minimal number of processes running part from Prime95 (no sharing, no antivirus or firewall, no sound, minimal graphics : 13 processes apart from Prime95)
Yes, you've done your best to minimize non-GIMPS execution ... but the OS still does things (e.g., those 13 processes) in the background occasionally.

Quote:
Still one can see huge variations.
There, you're referring to the 5% or 3% as huge, right?

Quote:
My uninformed guess is that is all about memory contention on the Intel pre i7 architecture.
... but here, by "that" you are referring to

"Best time for 2560K FFT length: 82.323 ms.
...
Best time for 5120K FFT length: 73.077 ms."

Is that correct?

- - - - - -

Quote:
Originally Posted by S00113 View Post
The client should have an option to benchmark itself and try to find the perfect thread combination and kind of work for each machine based on benchmarks and wishes of the owner.
My motive for butting-in is to point out that because benchmarking is complicated by the (unknown) non-GIMPS load, even when steps are taken to minimize such, there is a substantial, perhaps even dominant, chance that variations in benchmark times will represent the non-GIMPS load variations more strongly than they will represent any differences between CPU configurations.

Thus, benchmarking for the purpose of tailoring a choice of assignment may range from possibly-useful (if the non-GIMPS load is relatively stable across both benchmarking and actual assignments -- but how often will that happen in general users' cases?) to useless or even downright counterproductive. Note that using average times rather than best times, in an attempt to account for the non-GIMPS load, may make things worse by burying actual CPU-difference effects that could potentially show up in best-times.

Last fiddled with by cheesehead on 2008-12-23 at 21:03
cheesehead is offline   Reply With Quote
Old 2008-12-23, 20:37   #14
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

10110111111102 Posts
Default

on my pc i get the same each time
it even does the same underclocked
henryzz is offline   Reply With Quote
Old 2008-12-23, 21:50   #15
S00113
 
S00113's Avatar
 
Dec 2003

23×33 Posts
Default

Quote:
Originally Posted by retina View Post
Okay, so it is not a fluke and not throttling then what could it be? Even just saying that the cache/CPU/etc. configuration is different still does not feel right to me to fully explain the huge discrepancy. Perhaps I am just stubborn, but 5120K faster than 2560K (requires double the data throughput from memory in less than half the time) just can't be sensible. What have I forgotten/overlooked/never-knew?
It is a NUMA machine, so it is probably the memory configuration and favourable access patterns for the 5120K FFT size.

This is a rare case. There will not be many cases as extreme as this one, but I am sure you would all agree that for this special machine it would make no sense at all to run a 2560K FFT LL test. If I set preferred work to "GIMPS" (let the server choose), this is exactly the kind of work this machine will get. So I think it makes very good sense for GIMPS to run benchmarks before handing out any kind of work automatically. And then use the benchmarks to select a kind of work each machine's CPU, memory and cache configuration is best suited for, at least to FFT size or trial factor bit depth granularity. Perhaps ask the user how long he or she wishes an average work unit to take at most, to avoid sending 100M tests to beginners.
S00113 is offline   Reply With Quote
Old 2008-12-23, 23:06   #16
S00113
 
S00113's Avatar
 
Dec 2003

23·33 Posts
Default

Quote:
Originally Posted by cheesehead View Post
My motive for butting-in is to point out that because benchmarking is complicated by the (unknown) non-GIMPS load, even when steps are taken to minimize such, there is a substantial, perhaps even dominant, chance that variations in benchmark times will represent the non-GIMPS load variations more strongly than they will represent any differences between CPU configurations.
This is why I advocate a longer (at last one or two hours) benchmarks. Change best times to median, or check what works best in the real world. Backgroud processes, and even user activity is a natural part of the benchmarks. Since different machines and operating systems run different background processes, it makes even more sense to do an individual benchmark of each computer. Also due to differences in background processes, optimal memory usage during stage 2 of P-1 may be a lot less than the user thinks. This could also be checked automatically in a very long benchmark.
Quote:
Thus, benchmarking for the purpose of tailoring a choice of assignment may range from possibly-useful (if the non-GIMPS load is relatively stable across both benchmarking and actual assignments -- but how often will that happen in general users' cases?) to useless or even downright counterproductive. Note that using average times rather than best times, in an attempt to account for the non-GIMPS load, may make things worse by burying actual CPU-difference effects that could potentially show up in best-times.
Most machines are idle most of the time. A user could be asked to leave the computer while the benchmark is running, or at least a part of it. I agree that benchmarking is hard, but it is a lot better than ignoring what we know about CPU/cache/memory/bitness differences and just hand out more or less randomly from the same pool to everyone.
S00113 is offline   Reply With Quote
Old 2008-12-23, 23:06   #17
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

13×131 Posts
Default

Quote:
Originally Posted by cheesehead View Post
Yes, you've done your best to minimize non-GIMPS execution ... but the OS still does things (e.g., those 13 processes) in the background occasionally.
Yes the OS does something but not for hours. IWhen some Prime95 threads are slow, they wil keep slow untill I reboot the machine, stopping and restarting Prime95 is not not enough. The only thing I can think of is that that system has an onboard graphic card. After I bought it that motherboard turned out to be a bad choice ; not enough memory voltage to run them at specs for instance. I might try to plug in a cheap graphic card (I only need 1280 * 1024 with 256 colours.
Quote:
Originally Posted by cheesehead View Post
There, you're referring to the 5% or 3% as huge, right?
No not at all my problems have nothing to do with the strange 2560K / 5120K problem : I sort of hijacked the thread to try to get somebody with ideas to help me ;-)

Jacob
S485122 is offline   Reply With Quote
Old 2008-12-24, 01:57   #18
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

24×389 Posts
Default

Quote:
Originally Posted by S00113 View Post
It is a NUMA machine, so it is probably the memory configuration and favourable access patterns for the 5120K FFT size.
So, if that is the reason, then if could be that the multi-CPU i7's will experience the same weirdness. It will be interesting to see how the QPI links will manage.
Quote:
Originally Posted by S00113 View Post
There will not be many cases as extreme as this one, but I am sure you would all agree that for this special machine it would make no sense at all to run a 2560K FFT LL test.
Indeed, 2560K in multi-thread mode would be wasteful for that machine. But what about the more usual situation of each thread running one LL of 2560K (rather than 4 (or whatever) threads all working together on one exponent)? Does the NUMA/OS architecture split the jobs nicely between memory regions?
retina is offline   Reply With Quote
Old 2008-12-24, 07:43   #19
S00113
 
S00113's Avatar
 
Dec 2003

23×33 Posts
Default

Quote:
Originally Posted by retina View Post
So, if that is the reason, then if could be that the multi-CPU i7's will experience the same weirdness. It will be interesting to see how the QPI links will manage.
NUMA will not be in normal consumer products for a while, I think. This machine has 8 DIMM slots connected to each CPU, filled to a total of 64 GiB RAM. Maximum is 32 cores and 256 GiB RAM.
Quote:
Indeed, 2560K in multi-thread mode would be wasteful for that machine. But what about the more usual situation of each thread running one LL of 2560K (rather than 4 (or whatever) threads all working together on one exponent)? Does the NUMA/OS architecture split the jobs nicely between memory regions?
The OS knows. I am not sure how if it just allocates memory to the node the allocating thread is running on, or if it also moves jobs and/or memory around. The weirdness may be caused by mprime forcing affinity to each of the threads without knowing about the memory architecture. I haven't benchmarked 16 different LL tests, but I think mprime should do that and compare to the parallell results. For some machines, where RAM is slow, caches large and CPU interconnect is fast, a second cocurrent test may cause almost as much slowdown as running one in two threads.

I just wish mprime could test all his for me. The summary page says I have 538 computers, and it isn't practical to check each and every one of them manually to find which conficuration works best.

Here is another thread pointing out differences in efficiency for different FFT sizes and factoring depths for different CPU/cache/memory/bitness configuration. As you can see, there are large diffferences i efficiency even for normal machines.
S00113 is offline   Reply With Quote
Old 2008-12-25, 04:50   #20
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

769210 Posts
Default

Quote:
Originally Posted by S00113 View Post
Backgroud processes, and even user activity is a natural part of the benchmarks. Since different machines and operating systems run different background processes, it makes even more sense to do an individual benchmark of each computer. Also due to differences in background processes, optimal memory usage during stage 2 of P-1 may be a lot less than the user thinks.
Very well, I agree that a benchmark that is intended to include the influence of the non-GIMPS load in determining which task type is best for a particular system, given its typical usage, should use average times rather than best-times.

Can someone construct an example in which, for instance, an AMD's superiority for TF rather than FFT-using tasks is effectively cancelled by the non-GIMPS load's influence, so that L-Ls turn out to be the best choice when one considers average times, but TF appears to be the better choice when only best-times are used?

Last fiddled with by cheesehead on 2008-12-25 at 04:56
cheesehead is offline   Reply With Quote
Old 2008-12-27, 01:28   #21
S00113
 
S00113's Avatar
 
Dec 2003

D816 Posts
Default

Quote:
Originally Posted by cheesehead View Post
Can someone construct an example in which, for instance, an AMD's superiority for TF rather than FFT-using tasks is effectively cancelled by the non-GIMPS load's influence, so that L-Ls turn out to be the best choice when one considers average times, but TF appears to be the better choice when only best-times are used?
While the opposite is easy to construct, at least where it is undesirable to run FFT work due to memory bus/cache congestion and constant register reloading due to other processes using SSE2 instructions, I think this one is very hard. It would be an even rarer case than the ccNUMA machine I found. Most machines are idle most of the time.
S00113 is offline   Reply With Quote
Old 2008-12-30, 11:44   #22
S00113
 
S00113's Avatar
 
Dec 2003

23·33 Posts
Default Another example

I'm not giving up on this.

I still think it is a good idea to run a benchmark and choose work to run or avoid based on the result. Here is another example. This is an Intel Atom N270 CPU at 1.6 GHz. Full benchmark here.
Code:
Best time for 58 bit trial factors: 19.060 ms.
Best time for 59 bit trial factors: 19.026 ms.
Best time for 60 bit trial factors: 18.985 ms.
Best time for 61 bit trial factors: 19.153 ms.
Best time for 62 bit trial factors: 39.511 ms.
Best time for 63 bit trial factors: 39.585 ms.
Best time for 64 bit trial factors: 22.897 ms.
Best time for 65 bit trial factors: 22.826 ms.
Best time for 66 bit trial factors: 22.771 ms.
Best time for 67 bit trial factors: 22.848 ms.
Note that 62 bit and 63 bit trial factoring is almost only half as fast as 64 to 67 bits trial factoring. So why does it get 62 and 63 bit factoring work, then? Shouldn't those ranges be reserved for machines without a handicap at those levels?

Btw: Does 63 bit trial factors in the benchmark mean factoring from 63 to 64 bits or 62 to 63 bits? This machine has been assigned work from 63 to 64, not 62 to 63. At least not yet.
S00113 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Polynomial selection Max0526 NFS@Home 9 2017-05-20 08:57
2^877-1 polynomial selection fivemack Factoring 47 2009-06-16 00:24
Polynomial selection CRGreathouse Factoring 2 2009-05-25 07:55
Guided Missile. mfgoode Puzzles 46 2006-12-17 16:38
Motherboard Selection Help jugbugs Hardware 13 2004-06-04 15:59

All times are UTC. The time now is 16:40.


Mon Aug 2 16:40:24 UTC 2021 up 10 days, 11:09, 0 users, load averages: 1.48, 1.94, 2.17

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.