![]() |
I don't think this is a math problem, rather it is a resource allocation problem.
You can calculate the optimal order to do TF, P-1, and LL if all work is done on your GPU. If you start looking at prime95's P-1 speed then you are comparing apples (CPUs) to oranges (GPUs). I've not done these GPU-only calculations. My gut tells me that there is only a small change in total throughput (and I expect doing P-1 earlier would be better). If we look at this as resource allocation problem, then to me it appears we have an excess of P-1 capacity vs TF capacity. Therefore, we'd want to release exponents for P-1 one bit earlier. This would increase the amount of work P-1 users do and reduce the amount of work TFers have to do. |
[QUOTE=Mark Rose;412671]I think if we have the TF capacity, we should still fully TF before P-1. It's work that needs to be done anyway, and if it saves P-1 time, it increases the overall system throughput.[/QUOTE]
But, we're currently [B][U]right at the edge[/U][/B] of feeding the P-1'ers. If it's agreed that P-1'ing at lower bit levels makes more sense, then when "Spidy" needs to pull its rip-cord it releases at lower bit-levels rather than higher, and then recaptures for final TF'ing those candidates not factored to take to 75 before being re-released for LL'ing. [QUOTE=Mark Rose;412671]So the real problem is that we have too many resources doing P-1 work and not LL/DC or TF.[/QUOTE] Definitely don't disagree with that! DC'ing, in particular, needs some love (read: it continues to fall behind LL'ing by ~90 candidates a day).... :smile: |
[QUOTE=Prime95;412676]If we look at this as resource allocation problem, then to me it appears we have an excess of P-1 capacity vs TF capacity. Therefore, we'd want to release exponents for P-1 one bit earlier. This would increase the amount of work P-1 users do and reduce the amount of work TFers have to do.[/QUOTE]
Agreed. But not necessarily only one bit early. Perhaps as many bits lower which are available that can be safely released for P-1'ing without the risk of having the candidate assigned to an LL'er (who might not do the P-1 run "well", or even at all). BTW George, if I may ask... Why does the Probability per Hour drop so much for each bit level? |
[QUOTE=chalsall;412667]This test was with 4GB allocated. Is the same trend evident with less?[/QUOTE]Sorry, I tried to be modest with 4GB, I normally have 8GB allocated per worker :smile:
The same general trend will exist, with slightly different numbers, down to very-low memory allocation at which point Prime95 will give up on Stage2 and run Stage1-only with a larger B1. Much more RAM translates to very slight efficiency increase, specifically in that the each pass of Stage2 has a small fixed overhead, so if you can do it in fewer passes you can afford to spend a little more time with higher bounds [SIZE="1"](which translates into more RAM used, potentially more passes required... see why this is a complex optimization? :)[/SIZE] [QUOTE=chalsall;412667]Perhaps Aaron could speak to what percentage of P-1 have both stages done?[/QUOTE]I'm no Aaron, but all submitted results for 2015-Jan-01 through 2015-Sep-30:[code]SELECT COUNT(*) AS `howmany`, `result_type`, (`message` LIKE "%B1=%") AS `stage1`, (`message` LIKE "%B2=%") AS `stage2` FROM `primenet_results_archive` WHERE (`date_received` > "2015-01-01") AND (`result_type` IN ("F-PM1", "NF-PM1")) GROUP BY `result_type` ASC, `stage1` ASC, `stage2` ASC; +---------+-------------+--------+--------+ | howmany | result_type | stage1 | stage2 | +---------+-------------+--------+--------+ | 35040 | NF-PM1 | 1 | 0 | | 113495 | NF-PM1 | 1 | 1 | | 451 | F-PM1 | 0 | 0 | | 1442 | F-PM1 | 1 | 0 | | 7318 | F-PM1 | 1 | 1 | +---------+-------------+--------+--------+ 5 rows in set (21.54 sec)[/code]So, for no-factor P-1 results: 23.6% were done with no stage2 (presumably due to lack of available memory; having the default Prime95 setting of 8MB is a strong cause I assume since that's insufficient to run stage2 on current exponents). For P-1 factors, 80% were found in stage2, 15% in stage1, and 5% indeterminate (not reported in results data). Which I find unexpected, since in my experience there should be pretty much an even split between stage1 and stage2 factors. In fact, I just checked my last 86 F-PM1 results, and exactly 43 were stage1 and 43 were stage2. I suspect it's something to do with the amount of RAM available. If you allocate 8MB (default) you won't get stage2. If you allocate a tiny amount (I dunno, 50MB?) then it's enough to do a feeble stage2, but in that case B1 is set so low that many stage1 factors that could have been found aren't found in stage1 but are found in stage2. |
[QUOTE=James Heinrich;412680]Sorry, I tried to be modest with 4GB, I normally have 8GB allocated per worker :smile:[/QUOTE]
LOL... If I may go tangentially nostalgic... The first computer I (and many) owned was a TRS-80 model 1; 4 [B][U]K[/U][/B]B of RAM. At my high-school we first worked on Commodore PETs (again with 4 KB of RAM). There was ongoing heated (but friendly) argument amongst all the teachers and the students as to what was better, Z-80 or 6502 assembly (BASIC was of course, by definition, for Beginners :wink:)... When you step back a bit, it is truly stunning just how much progress has been made in a very short period of time. |
I think it would be time to change the default Prime95 mem allocation from 8MB to something more in line with the amount of memory modern computers are usually fit with.
I joined GIMPS more than 13 years ago, by that time 256 MB total memory were not uncommon on a mid range desktop PC, and Prime95 default allocation was already 8MB. It seems to me that 128MB, or even 256 MB. would be a reasonable amount to allocate. |
[QUOTE=lycorn;412685]I think it would be time to change the default Prime95 mem allocation from 8MB to something more in line with the amount of memory modern computers are usually fit with.[/QUOTE]
Makes sense. And should be done. But, there are a great many workers which have been "fired and forgotten" still working. No chance of changing their settings nor their code. |
1 Attachment(s)
[QUOTE=lycorn;412685]It seems to me that 128MB, or even 256 MB. would be a reasonable amount to allocate.[/QUOTE]Let's look at that with some numbers (using the aforementioned M73412063 as a test case):[code]
MB B1 B2 Prob 8 965000 965000 2.48 16 965000 965000 2.48 32 965000 965000 2.48 64 965000 965000 2.48 128 965000 965000 2.48 256 640000 6400000 3.86 512 695000 11815000 4.37 1024 720000 15120000 4.59 2048 730000 16242500 4.66 4096 730000 16972500 4.69 8192 735000 17272500 4.71 16384 730000 17337500 4.71 32768 730000 17337500 4.71 [/code]In fact, 8MB or 128MB is the same thing -- no stage2 is attempted. For current exponents 256MB would be just enough to get into stage2, but 512MB would be much better (and give a little breathing room to still be useful a year or two from now). Beyond 1GB there is some benefit, but not much. |
[QUOTE=chalsall;412684]There was ongoing heated (but friendly) argument amongst all the teachers and the students as to what was better, Z-80 or 6502 assembly…[/QUOTE][URL="http://tlindner.macmess.org/wp-content/uploads/2006/09/byte_6809_articles.pdf"]6809[/URL]!
:tu: |
[QUOTE=chalsall;412679]BTW George, if I may ask... Why does the Probability per Hour drop so much for each bit level?[/QUOTE]
Because the GPU found the "easy" factors at the lower bit levels. In other words, the probabilities aren't going down so much due to the changing bounds as due to the fact that there are fewer possible small factors to find. |
[QUOTE=Prime95;412698]Because the GPU found the "easy" factors at the lower bit levels. In other words, the probabilities aren't going down so much due to the changing bounds as due to the fact that there are fewer possible small factors to find.[/QUOTE]
OK. Cool. But... If the P-1 code knew what had already been found, would it not optimize itself to take this into account? |
| All times are UTC. The time now is 23:15. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.