![]() |
[QUOTE=James Heinrich;438391]I'm not sure why so many people are jumping to the conclusion the IP in question is a VPN IP.
In any case, if all the DoS traffic is heading to a single report page, you could perhaps try a soft-block: rather than completely blocking the IP address from the server just insert a couple lines of code at the top of the report page to prevent running expensive queries but also provide feedback to the spider in question. Something like [code]if ($_SERVER['REMOTE_ADDR'] == '123.234.345.456') { die('You have been blocked for aggressive spidering. Please email madpoo@primenet to discuss better ways of getting the data you want'); }[/code][/QUOTE] Not a bad idea... a friendlier way to communicate to the user that "you're welcome to crawl, but come, let us reason together". I'm fairly certain it's not a VPN endpoint... the IP has a PTR that indicates it's a common residential DSL dynamic IP. |
[QUOTE=Madpoo;438422]I'm fairly certain it's not a VPN endpoint... the IP has a PTR that indicates it's a common residential DSL dynamic IP.[/QUOTE]And once the user restarts the DSL router connection then a new IP is generated and we go through this all again. And some other innocent DSL user will soon get the tainted IP.
I'd suggest to trigger on something other than the IP address. You mention the user agent having some unusual characteristics, so perhaps that is a better way to filter the problem requests. |
If it's as aggressive as it sounds it could just be as simple as putting in blocks for any host that hits more than a certain amount of times per minute.
Google etc can be told what rate limits to use through various tools, so they should never trip it. |
[QUOTE=0PolarBearsHere;438484]If it's as aggressive as it sounds it could just be as simple as putting in blocks for any host that hits more than a certain amount of times per minute.
Google etc can be told what rate limits to use through various tools, so they should never trip it.[/QUOTE] That was my first attempt but even then I think I set the dynamic IP blocking (it's an IIS feature) too high... their crawl was still impacting the server in bad ways. Blocking the IP outright was my "hurry up and do something to get the server stable again" effort since I'm travelling this week and couldn't spend more time on it. So, to put retina's concerns to rest, it will get more attention soon and the IP will be unblocked when I'm actually going to be able to monitor the situation in real time (if they're still even trying by this point). :smile: |
[QUOTE=Madpoo;438489]Blocking the IP outright was my "hurry up and do something to get the server stable again" effort since I'm travelling this week and couldn't spend more time on it. So, to put retina's concerns to rest, it will get more attention soon and the IP will be unblocked when I'm actually going to be able to monitor the situation in real time (if they're still even trying by this point). :smile:[/QUOTE]
Now that I've had some quiet time to take a closer look... That user had hit ~ 600K pages for 3-4 days in a row. I looked at the hits in 5, 10 and 30 second intervals and we're talking about rates of 1500+ pages in any given 30 second interval. When I say it was an aggressive crawl, I'm not kidding. :smile: Anyway, I removed the IP block and put a dynamic block in that should prevent that type of thing in the future and I made sure the settings I used wouldn't have blocked any other access that might come up. For instance, it wouldn't be too unusual for some user to open a handful of exponent report pages in a row and while the short term load is higher, it's not a big deal and those will be fine. But if you're hitting a couple thousand URLs per minute, expect to get errors until the rate is reduced to a normal level. I'm guessing this person wrote a Ruby script or something to hit the exponent report page for every prime number between X and Y... I haven't done an exact peek but they started around 332M and just seemed to go up from there although it skips around a bit. At first they were actually doing bulk reports, not just a single exponent, so it would have been limited to 1000 or whatever. But at some point they just changed to crawl one at a time. FYI, they did get up to 366M or so before the IP block kicked in...would they really have gone all the way to 1000M ? :confused2: |
Again, I think the pages should have comments on them explaining XML is available, et cetera.
|
[QUOTE=Mark Rose;438519]Again, I think the pages should have comments on them explaining XML is available, et cetera.[/QUOTE]
Yeah... I know. In the back of my mind, I had a feeling I was going to do some additional testing on those to make sure they were all good (I'm pretty sure they are) and then add a link, but somehow I just didn't get to that yet. I'm not entirely sure where/how to add that info anyway... some text and/or link on the report page itself to the XML version of the same thing, or an extra link from the menus... yeah, not really sure. I'll have to mull it over. |
Ecm assigments/reservations problem(?)
Hello, i couldn't find a better thread to ask this question, move this if you want.
So about ecm assigments. I was working on[URL="http://www.mersenne.org/report_exponent/?exp_lo=81239&exp_hi=&full=1&ecmhist=1"] m81239[/URL] ecm, b1=1M... but about halfway there the server unreserved it from me. Somebody finished the current "range/bounds", by completing enough of the requied curves on [URL="http://www.mersenne.org/report_ecm/?txt=0&ecm_lo=81239&ecm_hi=81239"]ecm progress[/URL]. It was like 5GHzday 150 curves, but i cannot continue to ecm on with those bounds even from my existing savefile, it jumps to higher bounds assigment. Is it intended to unreserve exponents from users when the range/curve count is complete? [SIZE=1] My faults: i was slower than i promised, didn't run this 24h/day, and this was not my only exponent. Also this happened over a week ago. And looks like a separated(not connected p95 client also couldn't continue the exponent from the savefile(s). It could be a savefile corruption while copying after it, but the unreserving is still a question. [/SIZE] |
[QUOTE=thyw;439401]but i cannot continue to ecm on with those bounds even from my existing savefile,[/QUOTE]
Well, you can set UsePrimenet=0 in your prime.txt and edit your ECM2= line in worktodo.txt to remove the 32-digit assignment ID. Will it let you use the existing savefile then? Let it run to completion, then manually submit the result and cross your fingers and hope PrimeNet awards you credit. Then remember to turn UsePrimenet=1 back on. |
[U]Thank you GP2[/U], but i think i screwed up and probably overwrote the savefiles when tried to reassign. It won't continue, old or new instance. Nevermind, but this would be painful on a bigger job. [B]Backups![/B] Should've created one.[I][SIZE=1] Or disable/edit the ecm assigning/unreserving rule.[/SIZE][/I]
|
You should still be able to report your work already done and get credit for it. Then switch to higher curves. That number of curves in the table is a guideline, and not something that "must" be kept, i.e. if I want to do another 50 curves at 1M for an 8M exponent, then I can do them, and report them. The server will not cut me off, unless I have strange settings. If that happens, you can use N/A instead of the assignment key and the server will not be asked (replace the key with "N/A" without quotes, in your worktodo file, use any text editor you like, stop and exit P95 from "tools/exit" menu before editing worktodo). Of course, my chances to find a factor will be extremely small, if 1600 curves were done already at that size, but it is my hardware and my money, and I can do them, and report them. I don't believe the server "cut you off", most probably you did something strange there... It happened to all of us, and not only once.
|
| All times are UTC. The time now is 23:13. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.