![]() |
[QUOTE=ixfd64;232144]This is probably a silly question,[/QUOTE]
Not really. :smile: [QUOTE=ixfd64;232144]does the "P-1/ECM stage 2 memory" refer to the allocation for each core or each processor? For example, I've heard that at least 256 MB of memory should be allowed for P-1 assignments. If I had a dual-core processor, should I enter 256 or 512 MB?[/QUOTE] The amount you want (256 MB sounds like a good amount for a computer you use, in the readme.txt that's the desirable amount for exponents up to about 50M; for dedicated ones or if you have plenty of free memory, set it higher) should be available for each worker for its P-1 stage 2. The memory limit you set in Options > CPU is for the whole processor (more accurately: the whole Prime95 instance). The workers share from that as needed. If they'll be doing P-1 stage 2 at the same time, that'd mean you should enter 512 MB (it will see that 512 MB is available and two workers need to run P-1, and split it to 256 MB per worker). If they'll be doing it at separate times, 256 MB will be fine (it will see that 256 MB is available and only one worker needs to use it at a time, and each will get 256 MB in their turn). Or you can try to set the memory limit for each worker individually, (this would probably be best: you don't have to worry about when they do it - they'll each just take 256 MB when needed, whether that means you use 0, 256, or 512 MB for stage 2 at that moment) but this might not be working yet (see [url]http://www.mersenneforum.org/showthread.php?p=232097#post232097[/url] and the link in it, and the post after it, for more info). |
Actually, I'd suggest putting it a little bit higher than that. Since the memory limit you input is for the entire instance of Prime95, even if only one core is doing P-1 stage 2, if the other core is doing LL work, then Prime95 will account for the 20-30MB of memory that test requires and only assign P-1 stage 2 ~230MB.
|
[QUOTE=Kevin;232164]Actually, I'd suggest putting it a little bit higher than that. Since the memory limit you input is for the entire instance of Prime95, even if only one core is doing P-1 stage 2, if the other core is doing LL work, then Prime95 will account for the 20-30MB of memory that test requires and only assign P-1 stage 2 ~230MB.[/QUOTE]
I don't think it does that. I just tested it, with the memory set to 100 MB: whether another worker is running an LL or not, the P-1 assumes it has 100 MB. But then I tested it again with two P-1s and each assumes it has 100 MB to work with, so...I don't know, I guess when it starts the stage 2s it splits it then. It must be less efficient to let it choose bounds for 100 MB and then force it into 50 MB, but it doesn't look like the worker-specific settings are working yet. I'd say the best bet would be to get the two workers to do stage 2 at separate times and set the whole instance to 256 MB. This might be of note: [CODE]You can set MaxHighMemWorkers=n in local.txt. This tells the program how wany workers are allowed to use lots of memory. This occurs doing stage 2 of P-1 or ECM on medium-to-large numbers. [B]Default is available memory / 200MB.[/B] [/CODE](emphasis mine) So, if I'm interpreting this right, it will already automatically choose that only one worker can do high-memory work at a time. (as long as 256/200 is rounded down) Hopefully it does this in a graceful way, (instead of just having one skip stage 2 or something) but in any case, this may play into this. Sorry for putting so much detail and confusion in to such a simple question. But it's surprisingly confusing when you get down to the nitty gritty (not to mention bugs, like the worker-specific memory limits). Luckily, there isn't too much riding on you getting your settings perfectly right and efficient: just a tiny difference of chance for the numbers you P-1 (which is small even if you are only doing P-1, and nearly insignificant if you're just doing it as part of LL tests). |
Well right now I have 2 workers out of 4 running P-1 stage 2 on an i5 with global memory settings of 2056 MB, and each one is only using 988MB of memory. Generally, I think the program will say "x MB" of memory available at the start of stage 2, but then only actually uses something like (x-20) MB (or maybe .95x MB, who knows).
Also, I wanted to suggest something a little higher because I thought the actual minimum for P-1 testing was 300MB, not 256MB (and now that I've looked it up, that is indeed the case).:whistle: |
[QUOTE=Kevin;232172]Well right now I have 2 workers out of 4 running P-1 stage 2 on an i5 with global memory settings of 2056 MB, and each one is only using 988MB of memory. Generally, I think the program will say "x MB" of memory available at the start of stage 2, but then only actually uses something like (x-20) MB (or maybe .95x MB, who knows).[/QUOTE]
I've noticed that the "x MB used" is always a little less than what you say it can use, even when that P-1 is the only thing running. I think this is because the memory usage goes up in discrete jumps with the number of relative primes being processed. I think that Prime95 chooses a number of relative primes that will make the memory usage close (I think as close as possible without going over, but I don't have a very good definite data point to say it can't be just the closest) to the allowed memory (whether that "allowed memory" is [set number] or [set number]/2 to split between two workers or [set number]-20 to make room for LL or what). e.g. I recently did a P-1 on M53250707. It was the only worker. When I set the memory to: 500 MB it does 16 relative primes at a time, and reports 491 MB used 800 MB it does 29 relative primes at a time, and reports 788 MB used 1000 MB it does 38 relative primes at a time, and reports 993 MB used (this was all after the B1 and B2 had been chosen; FFT length for P-1 was 2880K, B1=625000, B2=16250000) I suspected that the memory needed would directly or linearly relate with the relative primes calculated at a time, and I seem to be right (from these data points). I calculated that [memory usage] ~= 22.85*[number of relative primes at a time] + 125.4. (expect these constants to vary greatly with the p, B1, and B2) It appears that Prime95 always chose the number of relative primes that puts the predicted memory usage as close to the allowed amount without going over. This can explain why your P-1 workers are only using 988 MB of memory, even without taking memory out for LL: if each considers itself to have 1028 MB available, and the next higher relative prime would take more than (1028-988=)40 MB, then it'd limit itself to that. |
Two more results from more testing I've done:
It picks the closest without going over, not the closest. e.g. if 15 rel. primes will take 291 MB, and that worker is allowed 290 MB, it will go all the way down to 14 rel. primes (277 MB) rather than go over. You're right, it does reserve some memory for LL threads. It changes depending on the FFT size, of course, but it's roughly (FFT size)*12.5 bytes (e.g. 1792K*12.5 bytes = 22400 KB ~= 22 MB) for each LL. This amount is subtracted from the "Available memory is x MB" message on the P-1 worker. Between these two effects, I think I get how your memory is distributed, and how all that works. :smile: |
[QUOTE=Mini-Geek;232215]Two more results from more testing I've done:
It picks the closest without going over, not the closest. e.g. if 15 rel. primes will take 291 MB, and that worker is allowed 290 MB, it will go all the way down to 14 rel. primes (277 MB) rather than go over.[/QUOTE] I concurr with the "not going over" scenario, as that's what I found while doing some similar benchmarking a couple of months ago. To add another tidbit - additional memory, at least beyond a certain point, does not speed up the P-1 task noticeably. The total number of iterations didn't change, and neither did the time per iteration (beyond the 1% noise level). What did happen with larger memory allocations is that the estimated % of finding a factor went up very slightly. 2000 MB gave me a 6.88% chance, 3000 MB gave 6.91% Just don't set the memory so high that it starts disk swapping. Swapping will slow your machine to a relative crawl. |
Note that the 300MB minimum for P-1 is not the minimum needed for P-1 to be performed effectively. It is the minimum memory you need to have before the server will assign you P-1 tests. You could manually ask for P-1 tests and do perfectly OK with 256MB. This is what George says in the readme.txt:
[QUOTE]4) Factor in the information below about minimum, reasonable, and desirable memory amounts for some sample exponents. If you choose a value below the minimum, that is OK. The program will simply skip stage 2 of P-1 factoring. Exponent Minimum Reasonable Desirable -------- ------- ---------- --------- 20000000 40MB 80MB 120MB 33000000 65MB 125MB 185MB 50000000 85MB 170MB 250MB [/QUOTE] |
[QUOTE=garo;232223]Note that the 300MB minimum for P-1 is not the minimum needed for P-1 to be performed effectively. It is the minimum memory you need to have before the server will assign you P-1 tests. You could manually ask for P-1 tests and do perfectly OK with 256MB. This is what George says in the readme.txt:[/QUOTE]
I'm a bit out of touch with bytes of RAM since my ZX81 days (1K with 16K pack extra), but aren't we talking peanuts here? David Yes I do know what cache means. |
[QUOTE=davieddy;232253]I'm a bit out of touch with bytes of RAM since my ZX81 days
(1K with 16K pack extra), but aren't we talking peanuts here? David Yes I do know what cache means.[/QUOTE] Peanuts in terms of impact to a system, or peanuts in terms of likelihood of finding a factor? The extra 50MB of memory is only going to marginally increase the chances of finding a factor, but there's really no reason to not do it if the impact to your system is negligible. I presume if the person was debating between 256MB and 512MB, that means that 512MB was an option, and they could afford to move up to 300MB (which is closer to where the marginal benefit of additional memory begins to become negligible). |
[QUOTE=Mini-Geek;232169]I don't think it does that. I just tested it, with the memory set to 100 MB: whether another worker is running an LL or not, the P-1 assumes it has 100 MB. But then I tested it again with two P-1s and each assumes it has 100 MB to work with, so...I don't know, I guess when it starts the stage 2s it splits it then. It must be less efficient to let it choose bounds for 100 MB and then force it into 50 MB, but it doesn't look like the worker-specific settings are working yet.
I'd say the best bet would be to get the two workers to do stage 2 at separate times and set the whole instance to 256 MB. This might be of note: [CODE]You can set MaxHighMemWorkers=n in local.txt. This tells the program how wany workers are allowed to use lots of memory. This occurs doing stage 2 of P-1 or ECM on medium-to-large numbers. [B]Default is available memory / 200MB.[/B] [/CODE](emphasis mine) So, if I'm interpreting this right, it will already automatically choose that only one worker can do high-memory work at a time. (as long as 256/200 is rounded down) Hopefully it does this in a graceful way, (instead of just having one skip stage 2 or something) but in any case, this may play into this. Sorry for putting so much detail and confusion in to such a simple question. But it's surprisingly confusing when you get down to the nitty gritty (not to mention bugs, like the worker-specific memory limits). Luckily, there isn't too much riding on you getting your settings perfectly right and efficient: just a tiny difference of chance for the numbers you P-1 (which is small even if you are only doing P-1, and nearly insignificant if you're just doing it as part of LL tests).[/QUOTE] [QUOTE=Mini-Geek;232208]I've noticed that the "x MB used" is always a little less than what you say it can use, even when that P-1 is the only thing running. I think this is because the memory usage goes up in discrete jumps with the number of relative primes being processed. I think that Prime95 chooses a number of relative primes that will make the memory usage close (I think as close as possible without going over, but I don't have a very good definite data point to say it can't be just the closest) to the allowed memory (whether that "allowed memory" is [set number] or [set number]/2 to split between two workers or [set number]-20 to make room for LL or what). e.g. I recently did a P-1 on M53250707. It was the only worker. When I set the memory to: 500 MB it does 16 relative primes at a time, and reports 491 MB used 800 MB it does 29 relative primes at a time, and reports 788 MB used 1000 MB it does 38 relative primes at a time, and reports 993 MB used (this was all after the B1 and B2 had been chosen; FFT length for P-1 was 2880K, B1=625000, B2=16250000) I suspected that the memory needed would directly or linearly relate with the relative primes calculated at a time, and I seem to be right (from these data points). I calculated that [memory usage] ~= 22.85*[number of relative primes at a time] + 125.4. (expect these constants to vary greatly with the p, B1, and B2) It appears that Prime95 always chose the number of relative primes that puts the predicted memory usage as close to the allowed amount without going over. This can explain why your P-1 workers are only using 988 MB of memory, even without taking memory out for LL: if each considers itself to have 1028 MB available, and the next higher relative prime would take more than (1028-988=)40 MB, then it'd limit itself to that.[/QUOTE] I wondered about all this myself a while back. All types of work require memory. LL only takes a few MB. The actual amount used for P-1 is displayed only when stage 2 starts. I deduced linear functions pretty much the same way you did. I presently have 845MB allowed 24/7, and I'm running two workers. I also use MaxHighMemWorkers=1, which works great. All workers restart with the new memory setting when highmem work is finished, so that a worker that has moved on in its work queue because another worker is running stage 2 will stop whatever non-stage2-work it's doing and go back to highmem work in its own queue when the other worker finishes its highmem work. That way you don't "waste" memory by not allocating it. |
| All times are UTC. The time now is 22:55. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.