mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2013-10-09, 00:51   #1
Warlord
 
"Cas Wegkamp"
Sep 2013
The Netherlands

22 Posts
Default How many jobs should I run?

Currently I am running 4 jobs on my machine, an i7 3770 quad core HT. Obviously, progress is slow with numbers in the 65999XXX range. The iterations are more than four times slower than when I ran a single job, to be precise 6,375 times (a strangely nice number popped out :S).

So I am assuming it would be better to assign all cores to a single job but how do I stop 3 out of 4 workers from automatically getting more work? Also, where is the helpfile for this program, it seems it wasn't included in the 64bit version.

Steps I have currently taken to do this are the following:
I've changed ...
- minutes between network retries to 300
- days of work to queue up to 0
- days between sending new end dates to 7

Especially the second one seems like what I need to do, getting 0 days worth of work queued up since any job will take longer than a day. But is there a way to tell the program "When this job is done, don't do anything until I tell you to"?
Warlord is offline   Reply With Quote
Old 2013-10-09, 01:06   #2
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

9,127 Posts
Default

Quote:
Originally Posted by Warlord View Post
...the second one seems like what I need to do, getting 0 days worth of work queued up since any job will take longer than a day. But is there a way to tell the program "When this job is done, don't do anything until I tell you to"?

I've changed ...
- days of work to queue up to 0
You got it. That was an easy problem and the correct solution.

However, the harder problem is why it appears that you can do more with 1 4-core job that with 4 1-core jobs. Normally this is not the case, and if it is the case in your settings, then your settings are off. Someone else will tell you how to set the so-called affinity scrambling (you need someone with a system similar to yours).
Batalov is offline   Reply With Quote
Old 2013-10-09, 01:08   #3
chappy
 
chappy's Avatar
 
"Jeff"
Feb 2012
St. Louis, Missouri, USA

100100001012 Posts
Default

Someone smart will undoubtedly respond with more helpful advice, but I would like some clarification. Running four LLs should be slightly faster (lets guess about 12% faster) than running all four cores on one LL four times.

There's a memory hangup which makes it so.

If you are seeing something different then we need to figure out why your machine is not optimized for running the four cores. Heat would seem to be the most likely candidate for such a slowdown.

What kind of iterations times are you seeing. I'll compare them with my i5-2570k when I get to work. I would guess your seeing around 35 ms? I think I generally see 39-40ish.


(edit) see one of the smart guys already beat me to the punch!

Last fiddled with by chappy on 2013-10-09 at 01:10 Reason: Sergey's typing skillz exceed my own.
chappy is offline   Reply With Quote
Old 2013-10-09, 01:14   #4
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

17×19×31 Posts
Default

There are others who can answer some of your questions better than I can. I will start from the end and work backward on the stuff I know. I'm not saying you necessarily want to do this, but you can keep more work from being obtained by putting the following in prime.txt:
NoMoreWork=1

There is no file named "help". However, there are several informative .txt files- readme.txt, stress.txt, undoc.txt, and whatsnew.txt. Two other files contain configuration settings- local.txt and prime.txt. Results.txt is exactly that: your results. Prime.log is a complementary record of program operations. EDIT: worktodo.txt is pretty self-explanatory, though the lines in it must be formatted in particular ways which are described in the informative .txt files.

Last fiddled with by kladner on 2013-10-09 at 01:18
kladner is offline   Reply With Quote
Old 2013-10-09, 01:21   #5
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

5×433 Posts
Default

What is the speed of your memory?

EDIT: also to add to chappy, thermals may well be a problem.

Last fiddled with by kracker on 2013-10-09 at 01:35
kracker is offline   Reply With Quote
Old 2013-10-09, 01:23   #6
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

23×5×17 Posts
Default

[Assuming Windows and hyperthreading enabled]
Don't run more than 4 threads and in local.txt try
Code:
AffinityScramble2=02461357
In most cases [Windows can sometimes be odd] this will give each thread a dedicated ( not shared via hyperthreading) core.
If HT disabled, AffinityScramble2 is not needed.

Last fiddled with by sdbardwick on 2013-10-09 at 01:28
sdbardwick is online now   Reply With Quote
Old 2013-10-09, 05:39   #7
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

8,761 Posts
Default

Quote:
Originally Posted by Warlord View Post
So I am assuming it would be better to assign all cores to a single job but how do I stop 3 out of 4 workers from automatically getting more work?
Stop P95. Look for a file called local.txt in the folder where P95 runs. In local.txt, using any text editor (windows notepad, NOT a word processor like WinWord, but that can also be used, assuming you save as blind text after editing) change the two lines to:

WorkerThreads=1
ThreadsPerTest=4

Save as text.

Edit "worktodo.txt" and put all assignments in the same section. I.e. delete "[worker 2]", "[worker 3]" etc lines and move everything in [worker 1]. Don't delete "[worker 1]" line. The numbers of workers need to match with worker threads. Each thread will use "threads per test" cores (physical - or logical cores if you have more threads than phys cores)

The product (mathematical multiplication) of the two numbers (workers times threads) must be equal with the number of cores you want to allocate to P95. Don't allocate logical cores, only physical. P95 is well optimized, so it will generally not take any advantages from logical cores, it can use a physical core to maximum. For example, if you have 4 phys cores (8 logical, with hyperthreading), use 1 and 4 as in the example above.

Save. Restart P95.

After restarting P95, you have as many windows as workers (only one in your case) so it would be easy to see the results too. (less windows, more space for each).

Quote:
Also, where is the helpfile for this program, it seems it wasn't included in the 64bit version.
If you take official distribution from GIMPS/PrimeNet, look for the text files inside the zip. There should be a mirror on this forum too, somewhere around. Especially read the "readme.txt" "stress.txt", and "undoc.txt", the second and third are very important if you like to squeeze the last bit in your box.

Quote:
Steps I have currently taken to do this are the following:
I've changed ...
- minutes between network retries to 300
- days of work to queue up to 0
- days between sending new end dates to 7

Especially the second one seems like what I need to do, getting 0 days worth of work queued up since any job will take longer than a day.
Zero days is bad. If your assignment is finished, and no connection available, than your computer will waste time doing nothing

Assignment problems should be solved by setting the right parameters for threads, cores, etc, as explained above. You may want to use a "queue" of at least 5 days, to have some work to do in case the computer can't connect to internet for few days. Days to send results? Set it to 1, to have P95 exchanging info with the server every day (if it is online; if not, this setting won't matter).

Read the text docs.

Quote:
But is there a way to tell the program "When this job is done, don't do anything until I tell you to"?
Look for "Advanced -> quit gimps" option. "quitting" in this context will mean that you will not do any work momentarily, until you decide to "join" again. All your credentials is preserved, and the program will ask you if you want to complete the current assignments before "quitting". You can also play with the "NoMoreWork=" options in the prime.txt file.

Again, accentuating on what other people say: the fact you see a higher speed when you run one worker is contrarily to what we (all other people here) experience with our computers. This could mean (my best bet) a memory band limitation on your computer (4 workers need to exchange more data with the memory than a single worker, you need to read 4 residues at every iteration), or (my second bet) - a heat problem (4 workers heat the cores harder, as a single worker need some time to move the data from one core to another, having "dead time" in which the cores cool). Four workers are always faster than 1 worker, assuming your computer has no band limitation for memory transfer, and it can get rid faster of the produced heat.

TL;DR version:

4 workers, on 4 cores: read 4 residues from memory (very big numbers!), each core multiplying its own sh!t, writing back the residues. Need high memory band, producing lots of heat. The most productive.

1 worker on 4 cores: reads 1 residue from memory, but "spread" this residue on all cores, giving to each core a quarter of multiplication. At the end, collects the results from all cores and write one residue to memory. Here one iteration is 4 times faster to compute, because 4 cores participate into computing it, but some time is lost to "share" between the cores, and "collect" the results, therefore if the iteration time for the "4 workers" scenario is about 80ms, than you will not get 20ms in the "1 worker" scenario, but you will get 21, or 22, depending of how efficient your CPU is. "Sharing" time is "cooling time" for the CPU, as the cores don't do too much calculus in this time.

In the first scenario you will do 4 exponents in (say) 80 days (it will take 80 times the number of iterations, depending on your exponents), but in the second scenario you will do one exponents in 21 or 22 days, therefore you will need 84 to 88 days to do 4 exponents. (these numbers are fictive, just to make you understand how things work. The real numbers depend on the exponents, and real timing on your computer).

More calculus, more heat. Less calculus, less heat. Sounds logically.

Last fiddled with by LaurV on 2013-10-09 at 06:14
LaurV is online now   Reply With Quote
Old 2013-10-10, 21:25   #8
Warlord
 
"Cas Wegkamp"
Sep 2013
The Netherlands

22 Posts
Default

Well, after reading about the memory thing and the probably my settings being messed up I did some investigation.

I've actually turned down the amount of memory the P95 is allowed to use and it turns out it is now actually calculating a lot faster! Where an iteration took ~0.05 @3500MB mem it is now down to ~0.035 @100MB mem. This is about on par with having four cores calculating 1 number ~0.008 even though that was ran with 3500MB mem available to it which apparently is detrimental to speed.

This is a genuine WTF moment for me as with anything else I can think of the motto is the more the merrier. Setting available memory even lower does apperantly not affect iteration times.

Not having changed *Anything* else, this begs the question: What is the optimal memory setting for P95?

Above memory settings were set by the daytime and nighttime values under "Options" > "CPU...".

Last fiddled with by Warlord on 2013-10-10 at 21:27
Warlord is offline   Reply With Quote
Old 2013-10-11, 02:36   #9
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

11·157 Posts
Default

As far as I know, the memory is only used in P-1 factoring, but I could be wrong.

Are you running anything else while running prime? It runs with the lowest priority so anything and everything will take precedence over it.

What is your memory speed? Try running the Windows Experience Index and see if you have some serious limitation in your memory, perhaps. You shouldn't be bottlenecked at one worker unless your memory is very, very slow.

Could you post iteration times for 4 workers / 4 cores and 1 worker / 1 core? I find it hard to believe 1 worker is six thousand times faster.
TheMawn is offline   Reply With Quote
Old 2013-10-11, 03:14   #10
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

Quote:
Originally Posted by Warlord View Post
I've actually turned down the amount of memory the P95 is allowed to use and it turns out it is now actually calculating a lot faster!
That must be coincidentally due to something other than "available memory". P95 L-L testing does not consult the "available memory" setting; only the stage 2 of P-1 and ECM testing uses that setting.

When stage 2 starts during a P-1 or ECM run, P95 displays a message with the amount of the "available memory" that it is actually using. (For instance, when I set "available memory" to 1250, I typically see a message that ECM uses ~780-800M during stage 2.) If you don't see any message saying how much memory P95 is using during stage 2 (and you'll never see such a message during L-L testing), then you know that the "available memory" setting is having no effect on whatever type of work P95 is doing.

Quote:
Where an iteration took ~0.05 @3500MB mem it is now down to ~0.035 @100MB mem.
This could be due to less-cluttered memory (because of other programs) during the second run, or some other effect of programs your system is running other than P95. For instance, other non-P95 programs could have been using less CPU time during the second run than during the first.

Again, P95's L-L testing does not use the "available memory" setting for anything.

Quote:
This is about on par with having four cores calculating 1 number ~0.008 even though that was ran with 3500MB mem available to it which apparently is detrimental to speed.
I think you may find that if you raise "available memory" back to 3500MB, after restarts your iteration timing may be faster than ~0.05 sec again.

Quote:
Setting available memory even lower does apperantly not affect iteration times.
... because P95 doesn't consult or use the "available memory" setting when doing L-L testing.

Quote:
Not having changed *Anything* else, this begs the question: What is the optimal memory setting for P95?
Return it to 3500MB -- this will be an important improvement if you ever run P-1 or ECM assignments, and has no effect on your L-L (or DC or TF) testing.

Last fiddled with by cheesehead on 2013-10-11 at 03:23
cheesehead is offline   Reply With Quote
Old 2013-10-11, 06:28   #11
Warlord
 
"Cas Wegkamp"
Sep 2013
The Netherlands

22 Posts
Default

Weird, did what you said and the times did indeed stay between 0.032 and 0.037. Wonder why they went that high.
Warlord is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Pausing CUDA jobs fivemack Software 0 2013-11-27 23:50
R.I.P. Steve Jobs ewmayer Lounge 40 2011-10-23 19:44
Jobs R.D. Silverman Lounge 25 2009-10-15 05:41
How are you running your nfs jobs schickel Factoring 7 2009-02-26 01:06
Filtering on large NFS jobs, particularly 2^908+1 bdodson Factoring 20 2008-11-26 20:45

All times are UTC. The time now is 14:51.

Thu Oct 1 14:51:27 UTC 2020 up 21 days, 12:02, 1 user, load averages: 2.95, 2.05, 1.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.