mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Data (https://www.mersenneforum.org/forumdisplay.php?f=21)
-   -   Request for GP2: Some useful datamining (https://www.mersenneforum.org/showthread.php?t=1182)

garo 2003-09-30 18:15

Request for GP2: Some useful datamining
 
Hi GP2,
I think there are a set of exponents which should be tested preferentially but are not getting that treatment. If you could publish their worktodo entries it will be very helpful.

The exponents I am talking about are the ones that had errors in their first tests and were re-released by George as first timers. However, a server synch makes all of these exponents double-checks. So, if these exponents expire they are tossed to the bottom of the doublecheck stack. As you can see, there is a number of unassigned doublechecks in the 10-15M range. These should get priority. If you can find a way to find their wtd entries - presumably by comparing old status files and seeing which exponents did not get completed - it will be a great help and clear these exponents out soon as George rightly intended.

GP2 2003-10-01 07:11

1 Attachment(s)
I think this is more or less what you want.
There are 334 exponents.

These exponents aren't in STATUS.TXT or CLEARED.TXT now, but they were in STATUS.TXT just before the server sync, and they are in HRF3.TXT (been LL-tested, but need a double-check or triple-check).

As a second method, I did a compare of STATUS.TXT just before the server sync and just after the server sync, and looked for exponents that changed from '[font=courier] [/font] ' to '[font=courier]D[/font]' without any change in their assignment date (2275 exponents), then filtered out anything currently in STATUS.TXT or CLEARED.TXT (or LUCAS_V.TXT just for completeness). Got the same result except for the 9M exponents (at the end), and that's because those were already were double-checks to begin with.

Then the list of exponents was put through a function that collected information from NOFACTOR.CMP and PMINUS1.TXT to generate lines for worktodo.ini.

By the way, roughly 10% or so of these 334 exponents were not close to expiry in the STATUS.TXT just before the sync, but they're MIA nonetheless. So they're included too.

These are in sequential order except I left the 9M exponents at the end, you can move them to the start if you want.

The worktodo.ini file is an attachment.

garo 2003-10-01 10:13

Thanks a lot for the exponents GP2. I'm adding them to the TPR exponents and hopefully they will be handed out soon.

[QUOTE] By the way, roughly 10% or so of these 334 exponents were not close to expiry in the STATUS.TXT just before the sync, but they're MIA nonetheless. So they're included too.[/QUOTE]

My guess is that some people saw these exponents turn from first timers to DCs and decided to unreserve them.

BTW, I did not understand how you got the 9M exponents. How could they have changed status? Can you explain what they are?
Thanks

GP2 2003-10-01 14:55

[QUOTE][i]Originally posted by garo [/i]
[B]BTW, I did not understand how you got the 9M exponents. How could they have changed status? Can you explain what they are?
Thanks [/B][/QUOTE]

Quick summary of the below: you can ignore the 9M exponents.


They're just like the other exponents, except they were already double-checks even before the server sync.

I used two methods that came up with almost the same results.

The first method was to look for everything that was in STATUS.TXT just before the sync, but now isn't in either STATUS.TXT or CLEARED.TXT (as of Sept 30 03:00 UTC), and is in HRF3.TXT and therefore not in LUCAS_V.TXT (as of Sept 29) which means the double-check is still pending.

The second method was to look at old STATUS.TXT files just before and just after the sync, to identify the 2275 exponents that changed status from first-time to double-check. Then we again exclude everything that's in the current version of STATUS.TXT or CLEARED.TXT, which cuts it down to 314 exponents.

The two methods give the same results, except:
1) The first method also produces the 9M exponents which were already double-checks before the server sync (but expired in much the same way as the other exponents expired).
2) The second method also produces one extra exponent 10977301. This was dswanson's exponent, which changed status from first-time to double-check along with the rest, but then he dropped it soon after because it was one of the exponents I posted about in the [url=http://www.mersenneforum.org/showthread.php?s=&threadid=1141]Assigned [or cleared] exponents that are already obsolete[/url] thread. Running a check to exclude anything that's already in LUCAS_V.TXT automatically excludes this one exponent.


So I guess the 9M's were just ordinary double-checks that expired and will probably get automatically re-assigned by the Primenet server in a short time, because they're below the leading edge of double-checks. So they can be ignored.


By the way, I think from time to time we'll have to monitor the rest of the 2275 exponents that switched status, since some more of them might expire over time. As of the Sept 30 03:00 UTC version of STATUS.TXT, 1896 of them were still assigned and crunching, 65 were cleared, and the rest were the topic of this thread.


GP2 2003-10-01 17:22

[QUOTE][i]Originally posted by GP2 [/i]
[B]By the way, I think from time to time we'll have to monitor the rest of the 2275 exponents that switched status, since some more of them might expire over time. As of the Sept 30 03:00 UTC version of STATUS.TXT, 1896 of them were still assigned and crunching, 65 were cleared, and the rest were the topic of this thread. [/B][/QUOTE]

Actually, we can do more than just monitor those 2275 exponents in particular. We can monitor expiry for [i]all[/i] early out-of-sequence double checks beyond the leading edge of double checking. I'll add that to my scripts.

Even before the server sync, there were a handful of exponents in early out-of-sequence double-check. For instance, 10271543 and 11015513 and others assigned to Nick Glover, and 10962089 assigned to dswanson. I'm not sure how those got assigned originally. But they can be monitored for expiry just like the exponents that got switched from first-time to double-check by the server sync.

GP2 2003-10-01 17:45

1 Attachment(s)
Okay, I ran a script to check how many early out-of-sequence (above the leading edge of double-checking) double checks expired between Sep 30 03:00 UTC and Oct 1 17:00 UTC.

To my surprise there were 109 more in this short period!

Most of these had "days to go" around the -60 mark, so they're genuine expiries. A handful did not:

[font=courier new][size=1]
15121657,D ,66,,76.8,111.7,0.7,31-Jul-03 19:03,15-Jul-03 07:10,AndreasPipp,turbomachine
15122047,D ,66,,76.8,120.7,0.7,31-Jul-03 19:03,15-Jul-03 07:10,AndreasPipp,turbomachine
15122621,D ,66,,76.8,128.7,0.7,31-Jul-03 19:03,15-Jul-03 07:10,AndreasPipp,turbomachine
15123499,D ,66,,76.8,137.7,0.7,31-Jul-03 19:03,15-Jul-03 07:10,AndreasPipp,turbomachine
15125459,D ,66,,76.8,146.7,0.7,31-Jul-03 19:03,15-Jul-03 07:12,AndreasPipp,turbomachine
15126721,D ,66,,76.8,155.7,0.7,31-Jul-03 19:03,15-Jul-03 07:14,AndreasPipp,turbomachine
[/size][/font]

But I guess they were being early double-checked for a reason, so we'll keep them in the set of 109.

As for yesterday's list of exponents (in the attachment to my first message in this thread), I note today that they are all assigned to Newbie/Annie, except for the 9M exponents which were assigned overnight in the usual way... in fact it turns out I got assigned a few of them! So the 9M exponents really were just routine expiries of ordinary double checks.

Anyways see the attachment for a brand new set of 109 exponents to add to the ones from yesterday.

garo 2003-10-01 17:54

Great! Thanks. I noticed some 9M unassigned in the primenet summary page so I figured they'd be reassigned normally. So I did not touch them. I've moved the rest over to TPR from Newbie/Annie - which is one of holding accounts.

I'll add the f_worktodo.ini to this list right now. I noticed a huge numebr expiring last night as well.

GP2 2003-10-01 17:56

OK, one final stat.

For all early out-of-sequence double checks (beyond the leading edge of double checking), look at the "days to go" number and count how many are at each stage.

You can see that there's 41 such exponents at -60 days to go (on the verge of expiry). So you can expect 300 or so more exponents over the next few days, and after that it should level off and hopefully some of them will actually get completed instead of expired.


[font=courier new][size=1]
41 -60
48 -59
53 -58
46 -57
40 -56
46 -55
19 -54
25 -53
16 -52
14 -51
13 -50
7 -49
11 -48
14 -47
8 -46
13 -45
12 -44
13 -43
8 -42
12 -41
9 -40
6 -39
7 -38
[... rest omitted]
[/size][/font]

garo 2003-10-01 18:28

Cool! Thanks. One other thing I need to worry about is the exponents which will expire and be completed after having expired. Unfortunately there are bound to be a few of those. But hopefully the first test was bad and that would only be a doublecheck. Anyway, the 400 right now are more than TPR can handle for a few months. You might want to run the script again in 2-3 weeks and offer those exponents to the Marin Mersennaries at large.

NickGlover 2003-10-02 02:45

[QUOTE][i]Originally posted by GP2 [/i]
Even before the server sync, there were a handful of exponents in early out-of-sequence double-check. For instance, 10271543 and 11015513 and others assigned to Nick Glover, and 10962089 assigned to dswanson. I'm not sure how those got assigned originally. But they can be monitored for expiry just like the exponents that got switched from first-time to double-check by the server sync.[/QUOTE]

Many of my exponents came from doing the same thing you are doing in this thread. I noticed lots available double-checks that were originally released as first-time tests because the original test had lots of errors. I did comparisons of the proper files to figure out which ones were sitting on the server unassigned and then I put them in my worktodo.ini to get them assigned to me. Most of the ones I have were originally rereleased by George on Sep-17-02 and were changed into double-checks by the Nov-18-02 database synch.

NickGlover 2003-10-02 02:49

[QUOTE][i]Originally posted by GP2 [/i]
For all early out-of-sequence double checks (beyond the leading edge of double checking), look at the "days to go" number and count how many are at each stage.
[/QUOTE]

Instead of looking at "days to go", you can look at the "exp" column. Exponents expire and are released by the server at 0600 UTC the first time an exponent has a negative value in the "exp" column at 0000 UTC.


All times are UTC. The time now is 11:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.