mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-03-28, 13:39   #144
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hi all,

I am pleased to announce that I just received word from Dave that import of manual results into the stats database is ready to roll! Sometime today I will place the complete manual results for drives #1 and #2 in the results folder; the DB will then automatically pull them in at midnight CDT (GMT-5) tonight. Drive #3 is also close to being ready to import, so that should follow shortly afterwards. From there I will continue preparing the rest of the drives and importing them as they're ready; I hope to have the process caught up to our current level of progress within 2 weeks, 3 at maximum.

Now we can finally have our stats database truly reflect all work that's been done by this project, and those of us who have done manual work will have it fairly credited according to the same rules as automated LLRnet/PRPnet work.

Max
Update: when the DB tried to load in Drive #1 and #2 last night, it choked on the sheer volume of results in the files. I'm currently working with Dave to resolve this; in the meantime, he's disabled all scheduled DB imports, so hence the stats appear "frozen" for now. This shouldn't be for long; stay tuned.
mdettweiler is offline   Reply With Quote
Old 2010-03-28, 15:35   #145
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

2·7·47 Posts
Default

sent email to Max and Gary

Due to the large amount of records (4 million) in the staging table which is un-indexed to allow for duplicates the de-duplication procedure paged to file and bogged down. The design of the de-duplication process is not at fault.

The load process for the manual data will have to be done off-line due to the large volume of data.

I have resumed the normal scheduled processes for now and will begin the manual load process in an off-line database in 2 days time.

AMDave
AMDave is offline   Reply With Quote
Old 2010-03-29, 00:39   #146
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

23×349 Posts
Default

Quote:
Originally Posted by AMDave View Post
Due to the large amount of records (4 million) in the staging table which is un-indexed to allow for duplicates the de-duplication procedure paged to file and bogged down. The design of the de-duplication process is not at fault.
Why trying to upload the whole Drives #1 and #2 in the database?

As I can say, there're only about 900000 pairs done manually for Drive #1 (for Drive #2 about 270000 pairs).
So not much duplicates to process.
OTOH perhaps not all automated pairs are yet in the database!?

It would be nice to get the ranges from the database to compare with my Summary-page!
kar_bon is offline   Reply With Quote
Old 2010-03-29, 02:24   #147
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by kar_bon View Post
Why trying to upload the whole Drives #1 and #2 in the database?

As I can say, there're only about 900000 pairs done manually for Drive #1 (for Drive #2 about 270000 pairs).
So not much duplicates to process.
OTOH perhaps not all automated pairs are yet in the database!?

It would be nice to get the ranges from the database to compare with my Summary-page!
What I'm doing is loading in the LLRnet ranges from Gary's master results files along with the manual ranges; that way, in case there were any results that were missed by the automated processing (and which were filled in manually by you, me, or Gary afterward--there was a lot of that earlier on), we can be sure they're all in the database so we have a complete record of everything in there.

My idea is that sometime down the road we can set it up so that we can have the DB do its own comparison of the results it has with the original sieve file; it could tell us exactly which pairs are done and which aren't, produce formatted and sorted results files automatically (thereby automating the process of processing the results for Gary), and maybe even make some nice graphs showing just how far we are on each drive relative to the sieve file.

Right now, what I'm doing is just putting the entire LLRnet ranges in the manual dumps under the username "Unknown". Any results which were properly imported from the server the first time around (almost everything) will be rejected as duplicates, while any that were missed will be imported and credited to "Unknown". The disadvantage of this is that it creates loads of duplicates, which seem to be presenting a problem for the DB.

I'm currently awaiting a response from Dave as to whether the problem is due to the number of raw results being imported, or just the number of duplicates; if the latter, then we can solve the problem by not including LLRnet ranges under "Unknown" in the manual dumps, but rather checking in the DB later on and manually filling in from the master results files any missing pairs. Since there's not many of them, that shouldn't be hard.
mdettweiler is offline   Reply With Quote
Old 2010-04-02, 10:58   #148
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT)

2·3·23·41 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
My idea is that sometime down the road we can set it up so that we can have the DB do its own comparison of the results it has with the original sieve file; it could tell us exactly which pairs are done and which aren't, produce formatted and sorted results files automatically (thereby automating the process of processing the results for Gary), and maybe even make some nice graphs showing just how far we are on each drive relative to the sieve file.
Something like that could be even more helpful for CRUS. How different are the needs of CRUS and NPLB in terms of how the database would need to work. Could CRUS have its own database based on NPLB's without too much trouble?
henryzz is offline   Reply With Quote
Old 2010-04-02, 15:22   #149
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by henryzz View Post
Something like that could be even more helpful for CRUS. How different are the needs of CRUS and NPLB in terms of how the database would need to work. Could CRUS have its own database based on NPLB's without too much trouble?
I suppose it wouldn't be too hard to duplicate NPLB's database for CRUS. The only potentially tricky part would be the scoring formula: does anyone know what we're currently using for that at NPLB? More specifically, is it limited to just base 2, or does it factor in the effects of the base when determining score?
mdettweiler is offline   Reply With Quote
Old 2010-04-02, 16:24   #150
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17·251 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
does anyone know what we're currently using for that at NPLB? More specifically, is it limited to just base 2, or does it factor in the effects of the base when determining score?
According to http://www.noprimeleftbehind.net/sta...ent=user_pairs, it's n^2/160e9. So no, it doesn't consider different bases. You could switch to something based on bit or decimal length instead. e.g. (bit length)^2/160e9. It would just mean that you first have to calculate the bit or decimal length. PRPnet uses something similar, i.e. (decimal length of the candidate / 10000) ^ 2. I recall calculating the approximate conversion factor between NPLB and PRPnet scoring, but I can't find that at the moment and it isn't really applicable anyway...
To see how PRPnet calculates the decimal length, look in LengthCalculator.cpp. But I'm sure you already know that k*b^n+c is (except in a case where the +c changes the number of digits, I suppose) floor(log(k)+log(b)*n)+1, where log() is in the base you're converting to (in this case, probably 2 or 10). And log_x(y)=log(y)/log(x) (where log_x is the base x logarithm, and log() is any base logarithm).

Last fiddled with by Mini-Geek on 2010-04-02 at 16:28
Mini-Geek is offline   Reply With Quote
Old 2010-04-02, 16:36   #151
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
According to http://www.noprimeleftbehind.net/sta...ent=user_pairs, it's n^2/160e9. So no, it doesn't consider different bases. You could switch to something based on bit or decimal length instead. e.g. (bit length)^2/160e9. It would just mean that you first have to calculate the bit or decimal length. PRPnet uses something similar, i.e. (decimal length of the candidate / 10000) ^ 2. I recall calculating the approximate conversion factor between NPLB and PRPnet scoring, but I can't find that at the moment and it isn't really applicable anyway...
To see how PRPnet calculates the decimal length, look in LengthCalculator.cpp. But I'm sure you already know that k*b^n+c is (except in a case where the +c changes the number of digits, I suppose) floor(log(k)+log(b)*n)+1, where log() is in the base you're converting to (in this case, probably 2 or 10). And log_x(y)=log(y)/log(x) (where log_x is the base x logarithm, and log() is any base logarithm).
Ah, okay. In that case, then, yeah, we'd need a different scoring system for CRUS. As you suggested, something based on decimal length like what PRPnet uses would probably be best; however, I have noticed that the PRPnet scores can tend to balloon rather quickly: for instance, on one of the personal servers that Gary and I use, Gary has a total score of 2404073894412 over just 27076 results. Since most of the work we've done in that server has been decimally quite large, it would make sense that there'd be a high score per candidate, but Gary has never put more than one full quad on this server at any given time--so that's a really, really high score to rack up given the amount of CPU time put in. Therefore, if we used a similar score for a CRUS DB, I'd suggest that we scale it somewhat to produce values more on the order of what NPLB's formula produces for candidates of similar size.
mdettweiler is offline   Reply With Quote
Old 2010-04-02, 16:59   #152
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

236128 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Ah, okay. In that case, then, yeah, we'd need a different scoring system for CRUS. As you suggested, something based on decimal length like what PRPnet uses would probably be best; however, I have noticed that the PRPnet scores can tend to balloon rather quickly: for instance, on one of the personal servers that Gary and I use, Gary has a total score of 2404073894412 over just 27076 results. Since most of the work we've done in that server has been decimally quite large, it would make sense that there'd be a high score per candidate, but Gary has never put more than one full quad on this server at any given time--so that's a really, really high score to rack up given the amount of CPU time put in. Therefore, if we used a similar score for a CRUS DB, I'd suggest that we scale it somewhat to produce values more on the order of what NPLB's formula produces for candidates of similar size.
The PRPnet scoring that you're looking at is very incorrect. I remember looking at how it was calculated and just ignored it because we have an accurate method in the NPLB database. I can't remember how it was calculated but I remember that it wasn't right.

Yes, we definitely need to go to either bit length or decimal length of the test if we set up something similar to our DB here at CRUS.


Gary
gd_barnes is offline   Reply With Quote
Old 2010-04-06, 13:15   #153
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

2·7·47 Posts
Default

@henryzz
the base number is already included in the load and stored in the tables, however it is not yet part of the primary key.
Adding the base number to the key should achieve the compatibility that you are seeking.

Indeed the design of the NPLB tables is easily extensible either to a separate or combined database for additional data sets such as CRUS.
However, the CRUS data pages include a lot more manually added meta-data which is not yet catered for, which requires some consideration.

Last fiddled with by AMDave on 2010-04-06 at 13:19
AMDave is offline   Reply With Quote
Old 2010-04-06, 20:32   #154
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT)

2·3·23·41 Posts
Default

Quote:
Originally Posted by AMDave View Post
@henryzz
the base number is already included in the load and stored in the tables, however it is not yet part of the primary key.
Adding the base number to the key should achieve the compatibility that you are seeking.

Indeed the design of the NPLB tables is easily extensible either to a separate or combined database for additional data sets such as CRUS.
However, the CRUS data pages include a lot more manually added meta-data which is not yet catered for, which requires some consideration.
This is somewhat like I thought was the case. I think it is worth keeping in mind for CRUS. If algebraic factorizations can be added to it and the production of webpages showingg the reported progress of the conjectures then it could be very useful to CRUS and safe Gary a huge amout of work.
henryzz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
News gd_barnes Conjectures 'R Us 281 2020-02-26 21:25
P!=NP in the news willmore Computer Science & Computational Number Theory 48 2010-09-19 08:30
Other news Cruelty Riesel Prime Search 41 2010-03-08 18:46
The news giveth, the news taketh away... NBtarheel_33 Hardware 17 2009-05-04 15:52
News KEP Riesel Base 3 Attack 4 2008-12-17 11:54

All times are UTC. The time now is 22:07.

Sun May 24 22:07:18 UTC 2020 up 60 days, 19:40, 0 users, load averages: 1.43, 1.48, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.