mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Did I read a post re 4 cores to NOT LL all? (https://www.mersenneforum.org/showthread.php?t=11025)

ldesnogu 2008-11-26 11:12

[quote=S485122;150762]I do not agree, on P4 D and on Core2 Quad the performance of Prime95 was proportional to the memory speed (measured from 533 MHz DDR2 to 1066 MHz DDR2.)[/quote]
At equal core frequency and different memory speeds? Could you please provide detailed benchmarks? :)

S485122 2008-11-26 16:05

I, and others posted details in the Hardware subforum. I do not have the time to try to find them now (I did a quick search : the threads Quad Core and Quad Core and P95 should contain the necessary data.). All parameters except memory were constant (Motherboard, FSB speed, CPU.

Jacob

db597 2008-12-02 03:42

[QUOTE=Phantomas;150536]Yes, that's right. And when I interpret my results right, than each LL will run on one DualCore, so (my impression) one LL can use the 6MB L2 Cache alone, an it doesn't need to access the ordinary RAM so often.

But it seems to be important to run one test on Core [1,2], and the other on core [0,3]. Else my itteration time went up 20%.[/QUOTE]

Interesting... must be the combined cache kicking in. What settings are needed to ensure we run on core [1,2] and core [0,3]? Is it achievable only on 25.8 with the affinityscramble setting?

petrw1 2008-12-02 05:37

[QUOTE=db597;151581]Interesting... must be the combined cache kicking in. What settings are needed to ensure we run on core [1,2] and core [0,3]? Is it achievable only on 25.8 with the affinityscramble setting?[/QUOTE]

Yes, so says George after my attempts to do the same with 25.7 with mixed results. See post ...

[url]http://www.mersenneforum.org/showpost.php?p=151570&postcount=29[/url]

... and the next 3

Phantomas 2008-12-02 17:29

[quote=db597;151581]Interesting... must be the combined cache kicking in. What settings are needed to ensure we run on core [1,2] and core [0,3]? Is it achievable only on 25.8 with the affinityscramble setting?[/quote]
Yes, only 25.8 gives you full control which core to use. But (at least) in my system I noticed, that the core-binding depends and varies on the FSB and/or CPU speed. Can't explain why, but it is reproducible....
See [URL]http://mersenneforum.org/showpost.php?p=151272&postcount=24[/URL]
and [URL]http://mersenneforum.org/showpost.php?p=151272&postcount=26[/URL]

db597 2008-12-03 07:39

Thanks guys. I'll download 25.8 and give it a try tonight. Even running 24/7, it takes me over a month to complete 1 LL (first time tests), so it's a welcoming thought to be able to get 2 workers on the same exponent without sacrificing any speed (or even get a tiny speedup is a fantastic bonus!).

stars10250 2008-12-12 13:34

While examining the quad core performance of my system I noticed something interesting. When running 4 LL tests I get the equivalent of about 3.2 cores-worth of performance if I pick as a reference the speed of a single core operating on a single exponent. This is a well known issue and agrees with the observations of others (aka memory bottleneck). This made me initially think it is only minimally worth the effort of running the 4th core for LL, as getting 0.2x performance out of it isn't all that good. However, when I run 3 cores on LL I don't get 3 cores-worth of performance. I get 2.6. Only when I go down to 2 cores do I get twice the single core performance. So running the fourth core on LL has more than a 0.2 effect, as it takes me from 2.6 to 3.2. I believe others have noticed this too, as I've seen some recommend running 2 LL and 2 TF (instead of 3 LL and 1 TF). My quad is overclocked to 3.2GHz, with 1066DDR2 memory running at 533MHz, and yet I still see this behavior. Nonetheless, I'm happy with its performance as it far exceeds the stock performance and is exactly double that of my dual-core E8500 (3.16GHz) which I always thought was fast and not suffering from a memory bottleneck.

henryzz 2008-12-14 13:41

[quote=stars10250;153057]While examining the quad core performance of my system I noticed something interesting. When running 4 LL tests I get the equivalent of about 3.2 cores-worth of performance if I pick as a reference the speed of a single core operating on a single exponent. This is a well known issue and agrees with the observations of others (aka memory bottleneck). This made me initially think it is only minimally worth the effort of running the 4th core for LL, as getting 0.2x performance out of it isn't all that good. However, when I run 3 cores on LL I don't get 3 cores-worth of performance. I get 2.6. Only when I go down to 2 cores do I get twice the single core performance. So running the fourth core on LL has more than a 0.2 effect, as it takes me from 2.6 to 3.2. I believe others have noticed this too, as I've seen some recommend running 2 LL and 2 TF (instead of 3 LL and 1 TF). My quad is overclocked to 3.2GHz, with 1066DDR2 memory running at 533MHz, and yet I still see this behavior. Nonetheless, I'm happy with its performance as it far exceeds the stock performance and is exactly double that of my dual-core E8500 (3.16GHz) which I always thought was fast and not suffering from a memory bottleneck.[/quote]
i bet if you remove your overclocking but keep the memory at the same speed it will scale better

stars10250 2008-12-14 15:25

[quote=henryzz;153278]i bet if you remove your overclocking but keep the memory at the same speed it will scale better[/quote]


I tried this and did get better scaling but overall lower performance. Here are the numbers:

3.2 GHz Q6600 (8x), 400 MHz FSB, 533 MHz DRAM
...4 cores (0,1,2,3) ....3.2 core-equivalent performance (total # of iter in 1 hr: 239016)
...3 cores (1,2,3) .......2.6 core-equivalent performance
...2 cores (1,3) ..........2.0 core-equivalent performance
...1 core. (3) .............1.0 core-equivalent performance (48 ms iter time, M47.8)

2.8 GHz Q6600 (7x), 400 MHz FSB, 533 MHz DRAM
...4 cores (0,1,2,3) ....3.4 core-equivalent performance (total # of iter in 1 hr: 220699)
...3 cores (1,2,3) .......2.7 core-equivalent performance
...2 cores (1,3) ..........2.0 core-equivalent performance
...1 core. (3) .............1.0 core-equivalent performance (55 ms iter time, M47.8)

2.4 GHz Q6600 (6x), 400 MHz FSB, 533 MHz DRAM
...4 cores (0,1,2,3) ....3.5 core-equivalent performance (total # of iter in 1 hr: 200000)
...3 cores (1,2,3) .......2.8 core-equivalent performance
...2 cores (1,3) ..........2.0 core-equivalent performance
...1 core. (3) .............1.0 core-equivalent performance (63 ms iter time, M47.8)

Overall, the maximum number of iterations performed in a given time is achieved by running all 4 cores at the highest CPU (and memory) speed.

henryzz 2008-12-14 19:21

exactly as i expected
computer speed isnt so based on cpu speed as people used to think
at some point i will so some benchmarks with different memory speeds to show the difference

jasong 2008-12-15 00:20

This is not to cause a stink, but Prime95 is specifically made for Intel processors. I've heard opinions that modern AMD processors would kick butt if there were a publicly available LLR client for AMDs.

If one were made available publicly(the one I heard about is integer-based and probably still alpha) would it be something that a decent number of people would be interested in?

I guess I should be more direct: If an LLR client(Prime95 is an LLR client made specifically for Mersenne numbers) were made available for AMD computers, but making the same residues(when there's not an error) would a good chunk of the community be interested in using that program?


All times are UTC. The time now is 13:59.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.