mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   Prime95 version 25.6 (https://www.mersenneforum.org/showthread.php?t=9779)

Batalov 2008-03-13 21:25

"Less is more" (very true for LL threads)
 
[quote=S00113;128719]Another suggestion:

Benchmark should test for memory bus congestion when running n LL threads on a multi-core CPU. First run one thread of LL and n-1 of trial factoring. Then switch to two threads running LL on different exponents and n-2 trial factoring. Continue until LL speed is falling or all n threads run LL. Mark when LL speed drops, and tell the user and the server ho many LL threads can be run without worrying about memory bus congestion. The server can use this information when choosing default work for each thread.[/quote]

I wonder is there is an existing research on the optimum load balancing. If not, I'd be glad to provide my horsepower (I have an 8-core E5345 station, where the memory bus congestion is clearly visible. On my home Q6600 the effect is much less - 4 LLs run as fast as each alone; maybe that's because I picked fast memory). In an ideal world - prime95 would decide by itself after trying several configurations.

Empirically after observing the speed of LL threads I've permanently stopped the 8th thread and replaced one with TF, but this is a borderline solution (didn't have time to dig deeper). So I have 6 LLs and 1 TF thread.

Ballpark numbers are - off the top of my head (just an example) - a lone LL thread runs at, say, 0.054s, but in the company of seven others they all run at different, yet constant speeds (say, 0.147s 0.096s 0.084s 0.084s etc.) - on average [B]8 LL threads perform much worse than seven, six or maybe even five[/B] (and I am talking about cumulative throughput! and of course if 5 LL threads give better cumulative performance than 8 - I'd rather run five and it'd be cooler in my office, too). Haven't done a more systematic trial, though. Will be thankful for a pointer to an existing thread about this if such exists (I'm new).


...And almost forgot (now that I'm done with the intial batch of P-1 stage 2's - a few days earlier) - once one of the threads gets into P-1 stage 2 - it gets much worse... the neghboring LL threads start running at 1/3 speed. Well, for this there could be a couple solutions: 1) forget about it for now (they P-1s run and they go, but LLs stay for weeks); or 2) when one thread enters P-1 stage 2, possibly postpone all other LLs and get a TF assignment for every LL thread - some light reading for them for a day. (Otherwise they are barely moving forward anyway.)

P.S. I did set all individual CPU affinities on both computers.

henryzz 2008-03-14 07:36

would it be worth having work options one for just stage 1 and another for stage two and another for both
this could be done for both p-1 and ecm

cheesehead 2008-03-14 19:59

[quote=henryzz;128767]would it be worth having work options one for just stage 1 and another for stage two and another for both
this could be done for both p-1 and ecm[/quote]If we want to allow someone to do stage 2 work after someone else has done stage 1, then we have to provide the stage 2 worker with a copy (size in bits = ~exponent) of the full residue from stage 1. (Same is true if we want to allow someone to extend the stage 1 work done by someone else to a higher stage 1 limit.)

This is feasible, but it has to be provided-for and introduces a much larger data transmission requirement than GIMPS now has, for all (both stage 1 and stage 2) P-1 or ECM work units. For extending stage 1 P-1 to a higher limit, this larger transmission would be required at both the start and the end of each such assignment.

Also, if someone returns a P-1 stage 1 result but without the residue copy (as is now the case), extension to a higher stage 1 limit or to stage 2 will require the later assignee to re-do that stage 1 work before proceeding with the extension work. (My understanding is that that could be partly true for ECM as well as P-1 if there were to be a stage 1 or stage 2 extension on a particular ECM curve, but I could be wrong, or perhaps ECM work would never be extended in that fashion.)

(The currently small data transmission needed for each work unit has been considered a positive arguing-point to persuade prospective new GIMPS participants. That would still be true for LL, DC, and TF, of course.)

James Heinrich 2008-03-14 23:42

[QUOTE=henryzz;128767]would it be worth having work options one for just stage 1 and another for stage two and another for both
this could be done for both p-1 and ecm[/QUOTE]While [i]cheesehead[/i]'s response is all true, one useful application of [i]henryzz[/i]'s idea that does not require transmitting relatively large amounts of data around would be to allow several threads on one machine to be configure independantly -- rather than having one thread spend time on Stage1 and then later on Stage2 (which means half the time the available RAM is "wasted"), or having 2 threads doing both stages1+2 and potentially fighting for stage2 RAM, if one thread is dedicated to stage1 and another thread is dedicated to stage2 then available RAM would be utilized to the fullest at all times with no competing for stage2 RAM allocation. The trick then would be what to do with the threads if (for example) the stage2 thread runs out of work (e.g. it found a factor) before the stage1 thread has finished preparing the next exponent.

My proposed "solution" to this whole work allocation thing is to have a unified pool of worktodo (not segregated by thread in worktodo.txt), and threads assigned to do "low-RAM" work (TF, LL, stage1) or "high-RAM" (stage2) work and then the yet-to-be-made smart work allocator part of Prime95 dynamically assigns work to threads in the most efficient manner possible. That could also include things like finding work for idle threads to do if no new worktodo is available from the server (server or network down), such as running 2+ threads on one exponent, or TF complete-as-assigned-but-not-reported exponents to higher bounds.

There's a whole realm of "smart" allocation tricks like this (and as discussed in several previous threads) that could be potentially useful. Hopefully one day George will have time (and inclination) to make it all happen :smile:


:ermm: [i]I just hijacked [/i]henryzz[i]'s idea and inserted my wishlist rant again, didn't I?[/i] :blush:

henryzz 2008-03-15 15:48

[quote=cheesehead;128815]If we want to allow someone to do stage 2 work after someone else has done stage 1, then we have to provide the stage 2 worker with a copy (size in bits = ~exponent) of the full residue from stage 1. (Same is true if we want to allow someone to extend the stage 1 work done by someone else to a higher stage 1 limit.)

This is feasible, but it has to be provided-for and introduces a much larger data transmission requirement than GIMPS now has, for all (both stage 1 and stage 2) P-1 or ECM work units. For extending stage 1 P-1 to a higher limit, this larger transmission would be required at both the start and the end of each such assignment.

Also, if someone returns a P-1 stage 1 result but without the residue copy (as is now the case), extension to a higher stage 1 limit or to stage 2 will require the later assignee to re-do that stage 1 work before proceeding with the extension work. (My understanding is that that could be partly true for ECM as well as P-1 if there were to be a stage 1 or stage 2 extension on a particular ECM curve, but I could be wrong, or perhaps ECM work would never be extended in that fashion.)

(The currently small data transmission needed for each work unit has been considered a positive arguing-point to persuade prospective new GIMPS participants. That would still be true for LL, DC, and TF, of course.)[/quote]
yes that would be a problem one day that might be possible but not now

monst 2008-03-16 04:34

Are there static builds available of mprime v25.6 for 32 and 64-bit Linux? If so, where?
Thanks.

Prime95 2008-03-16 13:29

[QUOTE=monst;128909]Are there static builds available of mprime v25.6 for 32 and 64-bit Linux? If so, where?[/QUOTE]

No static builds are available.

Taxythingy 2008-03-16 20:25

It's minor, but hey :smile:

The help menu of 25.6 currently reads Mesenne Wiki, not Mersenne... gosh I lead a sad life...

James Heinrich 2008-03-17 11:53

I think v5 server ran out of TF work? I'm getting ECM work instead.

Prime95 2008-03-17 16:37

[QUOTE=James Heinrich;128976]I think v5 server ran out of TF work? I'm getting ECM work instead.[/QUOTE]

Looks OK to me. I got TF work this morning.

Batalov 2008-03-17 21:48

Re: static builds
 
[quote=monst;128909]Are there static builds available of mprime v25.6 for 32 and 64-bit Linux? If so, where?[/quote]

If the absense of specific libs on a particular system gives you trouble, then you can still use dynamic libs (not being an admin), like follows :
[code]ln -s /usr/lib64/libcurl.so.3 ./libcurl.so.4
setenv LD_LIBRARY_PATH .
ldd ./mprime
./mprime[/code]
...and resolve other [I]slightly-mismatched[/I] libraries similarly. (this is just an example, may not work for you... well, it worked for me) If your system doesn't have a particular library altogether or the version is far too old, build the library (with [FONT=Courier New]configure --prefix=$HOME ; make ; make install[/FONT] ), then move it to ./ and repeat until it works. :uncwilly:


All times are UTC. The time now is 21:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.