mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Information & Answers (https://www.mersenneforum.org/forumdisplay.php?f=38)
-   -   scraping my "results" data from the website (https://www.mersenneforum.org/showthread.php?t=22780)

daxmick 2017-12-11 20:11

scraping my "results" data from the website
 
I'm looking to scrape the results webpage on [url]www.mersenne.org[/url] so that I can compute my daily average of GHz-days.

I noticed that there appears to be an API on mersenne.org. However, it appears to be more focused on communications to/from mprime/prime95 and not for users, such as myself, to pull our data. Is this true? Do I need to resort to "screen scraping" [url]www.mersenne.org/results?[/url]

Anyone else written a script (preferably in python) to do something similar?

chalsall 2017-12-11 20:31

[QUOTE=daxmick;473753]Anyone else written a script (preferably in python) to do something similar?[/QUOTE]

Yes. It's easy. But I mostly code in Perl and C.

I refuse to work with a language which dares to tell me how I am to indent.

Me human; you compiler. Do what I say.

if (0) { my n=0; }

daxmick 2017-12-11 20:54

[QUOTE=chalsall;473755]
I refuse to work with a language which dares to tell me how I am to indent.
[/QUOTE]

As apposed to needing to be told how to end each of my lines? :-P

<code>
#!/usr/bin/python3
"""Find average GHz-Days for recent work done on GIMPS. http://www.mersenne.org"""

#from userdata import *
import requests

main_url = 'http://www.mersenne.org/'
login_data = {'user_login': 'wbrandes', 'user_password': 'MyVoiceIsMyPassport'}

with requests.Session() as session:
result = session.get(main_url)

print(result.cookies)
result = session.post(main_url,
data=login_data,
headers=dict(referer=main_url),
cookies=result.cookies,)
print(result.cookies)
</code>

Can anyone see what I'm doing wrong? The first printout of the "cookies" shows data but the second doesn't. Thoughts?

Also, do we have a sub-forum for discussing code like this? I didn't know where else to post my question.

chalsall 2017-12-11 21:15

[QUOTE=daxmick;473760]As apposed to needing to be told how to end each of my lines? :-P[/QUOTE]

CR and/or LF should work, universally.

Have you done any drill-down on [url]https://www.gpu72.com/spider/[/url]

It looks like [url]https://github.com/MarkRose/primetools[/url] might be what your are after.

Mark Rose 2017-12-11 22:13

[QUOTE=chalsall;473763]It looks like [url]https://github.com/MarkRose/primetools[/url] might be what your are after.[/QUOTE]

I don't believe the scripts in that repository will do what he wants, but it does show an example of doing logins.

I've been tempted to rewrite the mfloop.py script in Go. It's got race conditions and doesn't handle failure very well.

daxmick 2017-12-11 22:21

[QUOTE=Mark Rose;473767]I don't believe the scripts in that repository will do what he wants, but it does show an example of doing logins.[/QUOTE]

I was thinking along those same lines. That said, the login section of code looks as simple as can be! I'm still not sure why my python code isn't working correctly. I'll reach out to a Pythonista I know and see if he can help.

As an aside, if I glean the "age days" and "GHz-Days" from [url]https://www.mersenne.org/results/[/url] and divide the GHz-Days by its age days value, would that give me the average GHz-Days/day for that CPU's work on a particular Exponent? (Then, if I sum up all these averages that would give me the total GHz-Days/day that my setup is doing.) Yes?

My goal is to figure out how many GHz-Days/day my set of servers are doing.

petrw1 2017-12-11 23:31

[QUOTE=daxmick;473753]I'm looking to scrape the results webpage on [url]www.mersenne.org[/url] so that I can compute my daily average of GHz-days.

I noticed that there appears to be an API on mersenne.org. However, it appears to be more focused on communications to/from mprime/prime95 and not for users, such as myself, to pull our data. Is this true? Do I need to resort to "screen scraping" [url]www.mersenne.org/results?[/url]

Anyone else written a script (preferably in python) to do something similar?[/QUOTE]

I've been saving all my results for about a dozen years strictly from here:
[url]https://www.mersenne.org/results/[/url]
I simply copy/paste the results into a spreadsheet.
It works for me because I download the results about monthly and I have dozens or results per day; not thousands.

daxmick 2017-12-12 00:02

[QUOTE=petrw1;473774]I've been saving all my results for about a dozen years strictly from here:
[URL]https://www.mersenne.org/results/[/URL]
I simply copy/paste the results into a spreadsheet.
It works for me because I download the results about monthly and I have dozens or results per day; not thousands.[/QUOTE]

And I'd like to automate that process. Wouldn't it be nicer if you could just run a script and it would grab the latest results for you?

Dubslow 2017-12-12 02:57

[QUOTE=chalsall;473755]Yes. It's easy. But I mostly code in Perl and C.

I refuse to work with a language which dares to tell me how I am to indent.

Me human; you compiler. Do what I say.

if (0) { my n=0; }[/QUOTE]

What does code layout have to do with you controlling the compiler?

[QUOTE=chalsall;473763]CR and/or LF should work, universally.
[/QUOTE]
He meant semicolons. How you end your line is not up to you, it just varies between languages.

petrw1 2017-12-12 15:55

[QUOTE=daxmick;473777]And I'd like to automate that process. Wouldn't it be nicer if you could just run a script and it would grab the latest results for you?[/QUOTE]

Good point....absolutely.

:victor:

I used to have a BASIC (:redface:) program that read the work distribution map nightly and parsed it but alas it died one day and then I got new computers that would not run that 32-bit version of BASIC any more so it is in limbo.

daxmick 2017-12-12 16:01

[QUOTE=petrw1;473832]Good point....absolutely.[/QUOTE]

I have to admit that I'm not Pythonista, BUT, I'm okay enough programmer to be able to RTFM and get things to work. That said, logging into the mersenne.org website via a Python script has proven to be hard. I swear I'm doing it right but then the return page (after login) still says that I haven't logged in yet.

Is there anyone out there that is Python savvy? I've made a github repository for this project, if anyone would like to assist me. [url]https://github.com/daxm/prime95/blob/master/ghz-days.py[/url]


All times are UTC. The time now is 06:51.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.