mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   LL test with V25 much slower than with V24 (https://www.mersenneforum.org/showthread.php?t=10891)

lidocorc 2008-11-01 07:05

LL test with V25 much slower than with V24
 
LL test with v25.7 much slower than with v24.14

Since yesterday I'm using the new Version 25.7 instead of 24.14 on my AMD X64 dual core CPU driven machine. Up to yesterday I ran only one core with prime95 and it took about 0.100 sec for one iteration step. Now since I'm using both cores of the CPU with prime95 v25 speed is much slower on that core that continues proceeding the LL test of the old Mersenne exponent. Time per iteration is now 0.146 sec, which means only 65% of the previous speed. Any suggestions what's the reason? I thought both cores work separately whithout influencing each other.

lidocorc

starrynte 2008-11-01 15:40

if i understand correctly, though there are two separate cores, there is only one memory being shared for both instances, so it will be slightly slower for each individual assignment, but overall it will be faster

Phantomas 2008-11-01 21:01

you can also let both cores work on the same exponent. But in general that is slower overall.

Freightyard 2008-11-01 22:09

Intel Nehalem with Quickpath.

dan3ny 2008-11-01 23:46

You may also be experiencing what I am, and having p95 not using much of your CPU

lidocorc 2008-11-02 07:52

@starrynte
It's as you write. But is shared cache memory such a bottleneck, that it reduces performance that much?

@dan3ny
No. I'm experiencing 100% CPU load.


I wonder what would happen if I'd shut down only one of the workers and let the other one continue. But I can't find a menue command to switch off a single worker independently from the other one.

lidocorc

Kevin 2008-11-02 08:34

[QUOTE=lidocorc;147539]@starrynte
It's as you write. But is shared cache memory such a bottleneck, that it reduces performance that much?

@dan3ny
No. I'm experiencing 100% CPU load.


I wonder what would happen if I'd shut down only one of the workers and let the other one continue. But I can't find a menue command to switch off a single worker independently from the other one.

lidocorc[/QUOTE]

The shared cache memory is only a bottleneck if both cores are running LL tests. There's a menu command called "worker windows" under Test. Each "worker" corresponds to a core. Set worker #1 to do LL tests like you did before, and set worker #2 to do trial factoring. The core working on the LL test will go as fast as it did before, and the second core is still contributing.

You might not notice an immediate change because both cores will still be working on the LL tests they've been assigned. What you can do is shut down the client, go to worktodo.txt, and move the line corresponding to the second LL test from the heading under "worker #2" to the one under "worker #1" (assuming worker #1 is the one with your current LL test that you have set to do LL tests). Then when you restart the client, your first worker will be testing your old exponent at full speed, and it will continue the test the other core started when the current one finishes. The second worker will reserve trial factoring assignments and begin work on those.

S485122 2008-11-02 10:25

[QUOTE=Kevin;147542]The shared cache memory is only a bottleneck if both cores are running LL tests.[/QUOTE]The shared memory cache is not the problem, the code supports a maximum of 1024 KB cache per worker (cfr. [Thread=10838]Prime95 and L2 Cache[/Thread]). If you have less than 1024 KB of L2 cache per core, f.i. 1024 KB of shared cache for two cores, you can include a line "CpuL2CacheSize=128 or 256 or 512" in local.txt.

The problem lies in access to memory on multicores. This is especially true with some of the NVidia chipsets.

Jacob

Oleg V.Cat 2008-11-02 10:28

[quote=lidocorc;147442]LL test with v25.7 much slower than with v24.14

Since yesterday I'm using the new Version 25.7 instead of 24.14 on my AMD X64 dual core CPU driven machine. [/quote]
[FONT=Verdana]
[/FONT] [FONT=Verdana]Having 2 worker threads on double core CPU - that is a bad idea, especially if you have desktop machine. You can put 2 CPU threads on one worker thread, and get ~50%-70% grater performance.[/FONT]
[FONT=Verdana] [/FONT]
[FONT=Verdana][FONT=&quot]To switch from 2 worker threads to 1 – you need to go to Test->Worker windows, and set 1 in “Number of worker windows to run”. Then (100% success way) – stop P95 and exit, and manually move all work in worktodo.txt file and then run P95 again.[/FONT]
[/FONT]

S485122 2008-11-02 15:25

[QUOTE=Oleg V.Cat;147550]Having 2 worker threads on double core CPU - that is a bad idea, especially if you have desktop machine. You can put 2 CPU threads on one worker thread, and get ~50%-70% grater performance.[/QUOTE]According to all other testers that have expressed themselves on the forum and more important according to the writer of the program, this is not true. It may be true in special cases, but I doubt it, certainly your alleged 50 % to 70% increase in throughput.

Can it be that you did not take into account that testing one different number on each core might take longer to produce a result, but that you end up with more than one result ?

Jacob

Oleg V.Cat 2008-11-02 17:44

[quote=S485122;147573]According to all other testers that have expressed themselves on the forum and more important according to the writer of the program, this is not true. It may be true in special cases, but I doubt it, certainly your alleged 50 % to 70% increase in throughput.

[/quote]

Strange, but it's real for my cheap desktop E2160*1.8Ghz. In one worker with one CPU thread I have iteration time approx ~0.090, with two - ~0.055 when CPU is free and ~0.090 in heavy loaded. In two workers I have approx ~0.140 for each worker thread...


All times are UTC. The time now is 23:27.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.