mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Information & Answers (https://www.mersenneforum.org/forumdisplay.php?f=38)
-   -   P-1 Rankings (https://www.mersenneforum.org/showthread.php?t=13956)

Mini-Geek 2010-09-29 12:29

[QUOTE=Rhyled;231838]Sigh - because my denominator was only 3 out of the 4 categories. Stupid mistake.[/QUOTE]

Oh, I see. Now our numbers (for GHz-Days) match, as I'd expect.
[QUOTE=Mini-Geek;231831]The deltas aren't being parsed for me. I didn't really care to try to use them, but I wasn't expecting it to be blank.[/QUOTE]

I noticed why, and it's stupidly simple: The deltas are in $6, and are saved to $Deltas. The output line only goes up to $5. So they're, basically, intentionally being ignored.
I also noticed that a lot of the code you included, while useful if you're planning to use the data in more Perl code, was useless and unused for me. Here's an updated version of my modification:
[CODE] open(IN, $ARGV[0].'.txt');
open(OUT, '>' .$ARGV[0].'.csv');
print OUT "Rank,Name,GHz-Days,Attempts,Successes\n";
while (<IN>) {
if (/^\s*(\d*)\s*(.*)\s+(\d+\.\d*)\s*(\d*)\s*(\d*)\s*\|(.*)/) {
print OUT "$1,\"$2\",$3,$4,$5\n";
$i++;
if ($i % 500 == 0) {
print "on line $i\n";
}
}
}[/CODE]I decided to exclude the Deltas completely, including the column header for it. Same usage as before.

chalsall 2010-09-29 13:56

[QUOTE=Mini-Geek;231900]I also noticed that a lot of the code you included, while useful if you're planning to use the data in more Perl code, was useless and unused for me.[/QUOTE]

Yes, as I mentioned in my post. I left them in as they're useful if your going to process the data further in the script, and I thought it was also a good way of documenting what the regex extracted into what temporary variables.

Also, you'd correctly commented that this doesn't work on the "Totals Overall" report. For anyone who's interested, here's code for that report:

[CODE] if (/^\s*(\d+)\s*(.*)\s+(\d+\.\d*)\s*\|(.*)\|(.*)$/) {
$Rank = $1;
$Name = $2;
$GHzDays = $3;
$Deltas = $4;
$Percentages = $5;
}[/CODE]

Note that the $Percentages variable still needs to be broken down into the six possible values.

chalsall 2010-09-30 17:32

[QUOTE=Mini-Geek;231831]Oddly, I found a relatively small, but still significant, discrepancy between the summed GHz-Days for the Overall report vs the sums of the individual reports. The individual reports sum to 7812823.76 but the overall report sums to 7868772.93, a difference of 55949.17 GHz-Days. That's about 0.7%. Anyone have a guess as to the reason?[/QUOTE]

A thought just came to me, which might explain this...

Did you run your analysis from a full dataset of each work type (and overall) ("Customize"... "End Rank" = 10000 results in 6705 records with GHzDays > 0.000 for the overall report as of right now, for example), or only the reports' default top 1000?

If the latter, this might explain what you observed. If the former, I have no idea....

Mini-Geek 2010-09-30 19:41

[QUOTE=chalsall;232096]A thought just came to me, which might explain this...

Did you run your analysis from a full dataset of each work type (and overall) ("Customize"... "End Rank" = 10000 results in 6705 records with GHzDays > 0.000 for the overall report as of right now, for example), or only the reports' default top 1000?

If the latter, this might explain what you observed. If the former, I have no idea....[/QUOTE]

It was with all results, which is the default before you click Customize. When you click Customize, it changes to 1000. In checking that out, I just noticed the reason for the difference: I only took from the given links under Top Producers, but there's another category, visible under Customize: ECM on Fermat numbers! I guess I figured the ECM link included both, (or just forgot about ECM on Fermat) but it specifically says "ECM on small Mersenne numbers". When you click Customize, you get the option to see ECM on Fermat numbers. I'd have to rerun all the numbers to get a perfect record, but the current GHz-Days for the last year of ECM on Fermat numbers is 56187.37. That's a difference of just 238.2 from the last time I ran the report, which can probably be attributed to the recent work done. So I'd say it's almost certainly the only significant cause of the difference I observed.
So ECM on Fermat is about 0.71% of the total GHz-Days, which is just a little less than ECM on Mersenne.

chalsall 2010-09-30 20:04

Mini-Geek" "It was with all results, which is the default before you click Customize.

I'm not entirely sure you are correct here.

For empirical evidence, do all of the default queries provide more than 1000 records (other than, perhaps, ECM-F, which provides the full donation in less than 1000 records)?

If they don't provide more than 1000 records, then your claim you're working from the full data sets is clearly false.

Mini-Geek 2010-09-30 20:08

[QUOTE=chalsall;232109]Mini-Geek" "It was with all results, which is the default before you click Customize.

I'm not entirely sure you are correct here.

For empirical evidence, do all of the data sets provide more than 1000 records (other than, perhaps, ECM-F)?

If they don't provide more than 1000 records, then your claim you're working from the full data sets is clearly false.[/QUOTE]

None contain exactly 1000, most more, some less. I'm quite sure it's not limited to 1000, or any other obvious number. Here are the counts, (from line counts of the text files, which equates to the number of users, not the rank all the ones with 0 credit tie at) just to clarify/verify:
All: 7629
P-1: 4430
TF: 4048
LL: 3066
DC: 2394
ECM: 458
ECM-F: 139
As you can see, only the two ECMs have under 1001 people. With the now-marginal difference, I'm pretty darn sure there's nothing else being missed.
Now that all the reports are showing as the 7:00 PM report, I can recalculate. I'll do that now and either edit or post, hopefully I'll now see exactly 0 unaccounted for. :smile:

Mini-Geek 2010-09-30 20:29

[QUOTE=Mini-Geek;232111]Now that all the reports are showing as the 7:00 PM report, I can recalculate. I'll do that now and either edit or post, hopefully I'll now see exactly 0 unaccounted for. :smile:[/QUOTE]

Well, not exactly 0, but plenty close enough for my purposes: 0.03 GHz-Days apart this time! (7870898.66 in my sum vs 7870898.63 in the total report)

Here are the new GHz-Days percentages (out of all GIMPS work):
P-1: 3.68%
LL: 74.04%
DC: 11.08%
TF: 9.74%
ECM: 0.76%
ECM-F: 0.71%

And Attempts percentages (out of all GIMPS work):
P-1: 0.25%
LL: 0.14%
DC: 0.09%
TF: 98.29%
ECM: 1.19%
ECM-F: 0.04%

And something new: The ratio of Successes to Attempts in that category. This has a different meaning for each category, but still fun to compare. :smile:
P-1: 4.70%
LL: 0.00%
DC: 93.39%
TF: 2.32%
ECM: 0.59%
ECM-F: 0.01%

And just for the record: none of these categories 'just happened' to have 1000 results. They're all the full rankings. Also, this was all based off of the Sep 30, 7:00 PM hourly report.

chalsall 2010-09-30 22:57

[QUOTE=Mini-Geek;232115]Well, not exactly 0, but plenty close enough for my purposes: 0.03 GHz-Days apart this time! (7870898.66 in my sum vs 7870898.63 in the total report)[/QUOTE]

Thanks for your work here Mini-Geek. It answers fully a question many have had.

The minor difference you've found working on the full publicly available dataset is probably explained by the fact that PrimeNet rounds all individual records to 0.001 GHzDays.


All times are UTC. The time now is 10:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.