mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   OFFICIAL "SERVER PROBLEMS" THREAD (https://www.mersenneforum.org/showthread.php?t=5758)

Madpoo 2016-07-15 23:33

FYI, I've seen sporadic problems through the day where the monitoring service reported short duration outages.

Looking into it, it appears a certain IP address (with a user agent string of "Ruby") has decided to do a massive crawl of the exponent reports (EDIT: by "massive" I mean they're trying to crawl at a rate of 30+ pages per second for extended periods of time... 603K pages hit in the past nearly 24 hours).

Before I dig into it more and try to mitigate, may I suggest that if anyone is trying to gather data on exponents, please either download the daily archives of the result logs, or crawl the XML specific page which operates much more efficiently.

It would be too bad to have to implement some rate controls on the server side, so if you must crawl for data, try to do an appropriate rate limit on the crawler.

Madpoo 2016-07-16 03:01

[QUOTE=Madpoo;438225]It would be too bad to have to implement some rate controls on the server side, so if you must crawl for data, try to do an appropriate rate limit on the crawler.[/QUOTE]

Well, like I said, I hated to do it, but after setting up a rate limiter in logging only mode for a few hours and checking to make sure no other traffic is caught in the net, I've turned it on because that user is still aggressively crawling and sometimes forcing other connections to fail.

If that user was you (I know what city and ISP but beyond that I haven't tried to narrow it down) just PM me here and we'll work out a better way to do whatever you're doing.

0PolarBearsHere 2016-07-16 05:23

[QUOTE=Madpoo;438233]and we'll work out a better way to do whatever you're doing.[/QUOTE]

Like not using RubyOnRails :P ? (wasn't me by the way)

retina 2016-07-16 14:48

[QUOTE=Madpoo;438233]If that user was you (I know what city and ISP but beyond that I haven't tried to narrow it down) just PM me here and we'll work out a better way to do whatever you're doing.[/QUOTE]Not me either, but I just want to mention the possibility of a VPN. If it was me all you'd see was whichever VPN I chose, so perhaps instead of an IP block, a content/agent block might be better?

Madpoo 2016-07-17 04:11

[QUOTE=Madpoo;438233]Well, like I said, I hated to do it, but after setting up a rate limiter in logging only mode for a few hours and checking to make sure no other traffic is caught in the net, I've turned it on because that user is still aggressively crawling and sometimes forcing other connections to fail.

If that user was you (I know what city and ISP but beyond that I haven't tried to narrow it down) just PM me here and we'll work out a better way to do whatever you're doing.[/QUOTE]

I don't know who it was (I searched for past hits from that IP address to see if I could match to a user, but I couldn't). Best I could tell it was some new user since they checked out the home page, looked at the download page and a few other things and then started crawling the exponent reports for every. single. exponent. one. by. one.

I was on the road all day today and kept getting emails from the monitoring setup about downtime, so now that I'm back at my computer I finally just blocked their IP address altogether.

Not something I wanted to do, but the problem was that even blocking them when they did 50 requests in 5 seconds meant some were still being processed and the backlog it built up made each request take progressively longer, which made other requests queue up. Even though it was unintentional I'd have to classify it as a DoS so a block is appropriate, unfortunately.

So, if you're that person in a certain US state with lots of lakes, and you came here looking for help on why you can't hit mersenne.org any more and your crawler keeps getting 403 errors, PM me and let's work on a better way to do this.

Meanwhile this will urge me to get off my butt and setup better dynamic restrictions to prevent this from happening in the future. And also a good motivation to dig into that page again and optimize it...something that's been in the back of my mind for a while now. :smile:

Mark Rose 2016-07-17 23:12

I have two "Manual testing" CPUs. One says it appears lost. How can I make it go away?

retina 2016-07-18 00:10

[QUOTE=Madpoo;438289]... I finally just blocked their IP address altogether.[/QUOTE]Thankfully it was not my VPN you blocked. But I do hope that in general IP blocks are not going to have to be common things?

Madpoo 2016-07-18 03:38

[QUOTE=retina;438351]Thankfully it was not my VPN you blocked. But I do hope that in general IP blocks are not going to have to be common things?[/QUOTE]

I hope not either. It's a pain. Just a result of someone crawling far too aggressively.

I'll reiterate that for anyone looking to capture data on exponents, there are options far more suited to that than crawling the html report_exponent pages. XML reports are awesome and faster, you can also specify a range of exponents instead of doing one request per exponent, or if you want a specific large batch of something or another, ask and if it's not too cumbersome I may be able to do a BCP package or something, but since that's a manual thing on my part, I won't make any promises on if/when.

Mark Rose 2016-07-18 04:00

[QUOTE=Madpoo;438355]I hope not either. It's a pain. Just a result of someone crawling far too aggressively.

I'll reiterate that for anyone looking to capture data on exponents, there are options far more suited to that than crawling the html report_exponent pages. XML reports are awesome and faster, you can also specify a range of exponents instead of doing one request per exponent, or if you want a specific large batch of something or another, ask and if it's not too cumbersome I may be able to do a BCP package or something, but since that's a manual thing on my part, I won't make any promises on if/when.[/QUOTE]

Perhaps the page should be updated to tell people there are other options.

henryzz 2016-07-18 20:41

Can I suggest checking after a few days to see if it is possible to unblock this ip-address. If it is a public vpn it has potential to block people other than intended.

James Heinrich 2016-07-18 20:58

I'm not sure why so many people are jumping to the conclusion the IP in question is a VPN IP.
In any case, if all the DoS traffic is heading to a single report page, you could perhaps try a soft-block: rather than completely blocking the IP address from the server just insert a couple lines of code at the top of the report page to prevent running expensive queries but also provide feedback to the spider in question. Something like [code]if ($_SERVER['REMOTE_ADDR'] == '123.234.345.456') {
die('You have been blocked for aggressive spidering. Please email madpoo@primenet to discuss better ways of getting the data you want');
}[/code]


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.