![]() |
[quote=mdettweiler;209750]Hi all,
I am pleased to announce that I just received word from Dave that import of manual results into the stats database is ready to roll! :w00t: Sometime today I will place the complete manual results for drives #1 and #2 in the results folder; the DB will then automatically pull them in at midnight CDT (GMT-5) tonight. Drive #3 is also close to being ready to import, so that should follow shortly afterwards. From there I will continue preparing the rest of the drives and importing them as they're ready; I hope to have the process caught up to our current level of progress within 2 weeks, 3 at maximum. Now we can finally have our stats database truly reflect [I]all[/I] work that's been done by this project, and those of us who have done manual work will have it fairly credited according to the same rules as automated LLRnet/PRPnet work. Max :smile:[/quote] Update: when the DB tried to load in Drive #1 and #2 last night, it choked on the sheer volume of results in the files. I'm currently working with Dave to resolve this; in the meantime, he's disabled all scheduled DB imports, so hence the stats appear "frozen" for now. This shouldn't be for long; stay tuned. :smile: |
sent email to Max and Gary
Due to the large amount of records (4 million) in the staging table which is un-indexed to allow for duplicates the de-duplication procedure paged to file and bogged down. The design of the de-duplication process is not at fault. The load process for the manual data will have to be done off-line due to the large volume of data. I have resumed the normal scheduled processes for now and will begin the manual load process in an off-line database in 2 days time. AMDave |
[QUOTE=AMDave;209810]Due to the large amount of records (4 million) in the staging table which is un-indexed to allow for duplicates the de-duplication procedure paged to file and bogged down. The design of the de-duplication process is not at fault.
[/QUOTE] Why trying to upload the whole Drives #1 and #2 in the database? As I can say, there're only about 900000 pairs done manually for Drive #1 (for Drive #2 about 270000 pairs). So not much duplicates to process. OTOH perhaps not all automated pairs are yet in the database!? It would be nice to get the ranges from the database to compare with my Summary-page! |
[quote=kar_bon;209858]Why trying to upload the whole Drives #1 and #2 in the database?
As I can say, there're only about 900000 pairs done manually for Drive #1 (for Drive #2 about 270000 pairs). So not much duplicates to process. OTOH perhaps not all automated pairs are yet in the database!? It would be nice to get the ranges from the database to compare with my Summary-page![/quote] What I'm doing is loading in the LLRnet ranges from Gary's master results files along with the manual ranges; that way, in case there were any results that were missed by the automated processing (and which were filled in manually by you, me, or Gary afterward--there was a lot of that earlier on), we can be sure they're all in the database so we have a complete record of everything in there. My idea is that sometime down the road we can set it up so that we can have the DB do its own comparison of the results it has with the original sieve file; it could tell us exactly which pairs are done and which aren't, produce formatted and sorted results files automatically (thereby automating the process of processing the results for Gary), and maybe even make some nice graphs showing just how far we are on each drive relative to the sieve file. Right now, what I'm doing is just putting the entire LLRnet ranges in the manual dumps under the username "Unknown". Any results which were properly imported from the server the first time around (almost everything) will be rejected as duplicates, while any that were missed will be imported and credited to "Unknown". The disadvantage of this is that it creates loads of duplicates, which seem to be presenting a problem for the DB. I'm currently awaiting a response from Dave as to whether the problem is due to the number of raw results being imported, or just the number of duplicates; if the latter, then we can solve the problem by not including LLRnet ranges under "Unknown" in the manual dumps, but rather checking in the DB later on and manually filling in from the master results files any missing pairs. Since there's not many of them, that shouldn't be hard. |
[quote=mdettweiler;209869]My idea is that sometime down the road we can set it up so that we can have the DB do its own comparison of the results it has with the original sieve file; it could tell us exactly which pairs are done and which aren't, produce formatted and sorted results files automatically (thereby automating the process of processing the results for Gary), and maybe even make some nice graphs showing just how far we are on each drive relative to the sieve file.[/quote]
Something like that could be even more helpful for CRUS. How different are the needs of CRUS and NPLB in terms of how the database would need to work. Could CRUS have its own database based on NPLB's without too much trouble? |
[quote=henryzz;210412]Something like that could be even more helpful for CRUS. How different are the needs of CRUS and NPLB in terms of how the database would need to work. Could CRUS have its own database based on NPLB's without too much trouble?[/quote]
I suppose it wouldn't be too hard to duplicate NPLB's database for CRUS. The only potentially tricky part would be the scoring formula: does anyone know what we're currently using for that at NPLB? More specifically, is it limited to just base 2, or does it factor in the effects of the base when determining score? |
[quote=mdettweiler;210437]does anyone know what we're currently using for that at NPLB? More specifically, is it limited to just base 2, or does it factor in the effects of the base when determining score?[/quote]
According to [URL]http://www.noprimeleftbehind.net/stats/index.php?content=user_pairs[/URL], it's n^2/160e9. So no, it doesn't consider different bases. You could switch to something based on bit or decimal length instead. e.g. (bit length)^2/160e9. It would just mean that you first have to calculate the bit or decimal length. PRPnet uses something similar, i.e. (decimal length of the candidate / 10000) ^ 2. I recall calculating the approximate conversion factor between NPLB and PRPnet scoring, but I can't find that at the moment and it isn't really applicable anyway... To see how PRPnet calculates the decimal length, look in LengthCalculator.cpp. But I'm sure you already know that k*b^n+c is (except in a case where the +c changes the number of digits, I suppose) floor(log(k)+log(b)*n)+1, where log() is in the base you're converting to (in this case, probably 2 or 10). And log_x(y)=log(y)/log(x) (where log_x is the base x logarithm, and log() is any base logarithm). |
[quote=Mini-Geek;210440]According to [URL]http://www.noprimeleftbehind.net/stats/index.php?content=user_pairs[/URL], it's n^2/160e9. So no, it doesn't consider different bases. You could switch to something based on bit or decimal length instead. e.g. (bit length)^2/160e9. It would just mean that you first have to calculate the bit or decimal length. PRPnet uses something similar, i.e. (decimal length of the candidate / 10000) ^ 2. I recall calculating the approximate conversion factor between NPLB and PRPnet scoring, but I can't find that at the moment and it isn't really applicable anyway...
To see how PRPnet calculates the decimal length, look in LengthCalculator.cpp. But I'm sure you already know that k*b^n+c is (except in a case where the +c changes the number of digits, I suppose) floor(log(k)+log(b)*n)+1, where log() is in the base you're converting to (in this case, probably 2 or 10). And log_x(y)=log(y)/log(x) (where log_x is the base x logarithm, and log() is any base logarithm).[/quote] Ah, okay. In that case, then, yeah, we'd need a different scoring system for CRUS. As you suggested, something based on decimal length like what PRPnet uses would probably be best; however, I have noticed that the PRPnet scores can tend to balloon rather quickly: for instance, on one of the personal servers that Gary and I use, Gary has a total score of 2404073894412 over just 27076 results. Since most of the work we've done in that server has been decimally quite large, it would make sense that there'd be a high score per candidate, but Gary has never put more than one full quad on this server at any given time--so that's a really, really high score to rack up given the amount of CPU time put in. :smile: Therefore, if we used a similar score for a CRUS DB, I'd suggest that we scale it somewhat to produce values more on the order of what NPLB's formula produces for candidates of similar size. |
[quote=mdettweiler;210442]Ah, okay. In that case, then, yeah, we'd need a different scoring system for CRUS. As you suggested, something based on decimal length like what PRPnet uses would probably be best; however, I have noticed that the PRPnet scores can tend to balloon rather quickly: for instance, on one of the personal servers that Gary and I use, Gary has a total score of 2404073894412 over just 27076 results. Since most of the work we've done in that server has been decimally quite large, it would make sense that there'd be a high score per candidate, but Gary has never put more than one full quad on this server at any given time--so that's a really, really high score to rack up given the amount of CPU time put in. :smile: Therefore, if we used a similar score for a CRUS DB, I'd suggest that we scale it somewhat to produce values more on the order of what NPLB's formula produces for candidates of similar size.[/quote]
The PRPnet scoring that you're looking at is very incorrect. I remember looking at how it was calculated and just ignored it because we have an accurate method in the NPLB database. I can't remember how it was calculated but I remember that it wasn't right. Yes, we definitely need to go to either bit length or decimal length of the test if we set up something similar to our DB here at CRUS. Gary |
@henryzz
the base number is already included in the load and stored in the tables, however it is not yet part of the primary key. Adding the base number to the key should achieve the compatibility that you are seeking. Indeed the design of the NPLB tables is easily extensible either to a separate or combined database for additional data sets such as CRUS. However, the CRUS data pages include a lot more manually added meta-data which is not yet catered for, which requires some consideration. |
[quote=AMDave;210729]@henryzz
the base number is already included in the load and stored in the tables, however it is not yet part of the primary key. Adding the base number to the key should achieve the compatibility that you are seeking. Indeed the design of the NPLB tables is easily extensible either to a separate or combined database for additional data sets such as CRUS. However, the CRUS data pages include a lot more manually added meta-data which is not yet catered for, which requires some consideration.[/quote] This is somewhat like I thought was the case. I think it is worth keeping in mind for CRUS. If algebraic factorizations can be added to it and the production of webpages showingg the reported progress of the conjectures then it could be very useful to CRUS and safe Gary a huge amout of work. |
| All times are UTC. The time now is 06:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.