![]() |
[QUOTE=Mark Rose;409481]I've always found Google's crawl to be reasonable. I have had issues with Bing and Baidu slamming small sites heavily all at once. That was years ago though.[/QUOTE]
Agreed. Googlebot is very well behaved. But, like I said before, I have found that many bad bots love to explore the URLs forbidden in robots.txt. ("Bad bot! Bad!") It can take a little while for Fail2Ban to trigger in these cases; I've seen 'bots request hundreds of URLs in a few seconds before they're blocked. |
[QUOTE=chalsall;409411]I'm working on caching some of the more expensive datasets.[/QUOTE]
I'm just starting to get into coding in general but i don't know much about websites. what exactly does this mean in this context? |
[QUOTE=dragonbud20;409555]I'm just starting to get into coding in general but i don't know much about websites. what exactly does this mean in this context?[/QUOTE]
I should probably let Chris answer for himself. That said, I think this means storing a pre-generated presentation of a data set, rather than generating the presentation on demand. It allows the data, and presumably graphics, to be updated when other demands are low, gives a much faster response to the inquirer, and lets the server better answer other demands. |
[QUOTE=wombatman;409387]To test the decreased RAM and see if it makes it stable, I'll double-check it.[/QUOTE]
Not sure if you're ktony or Dennis Moran, but mine matched with Dennis Moran. |
[QUOTE=wombatman;409557]Not sure if you're ktony or Dennis Moran, but mine matched with Dennis Moran.[/QUOTE]
I'm ktony. Thanks for running that. It lets me know I need to pull back and test some more. |
[QUOTE=dragonbud20;409555]I'm just starting to get into coding in general but i don't know much about websites. what exactly does this mean in this context?[/QUOTE]
[QUOTE=kladner;409556]I should probably let Chris answer for himself. That said, I think this means storing a pre-generated presentation of a data set, rather than generating the presentation on demand. It allows the data, and presumably graphics, to be updated when other demands are low, gives a much faster response to the inquirer, and lets the server better answer other demands.[/QUOTE] More or less. Currently whenever someone requests a page, all the graphs on that page are rendered (drawn) by the web server specifically for your request. So if 10 people happen to look at the page at the same time, the server will generate the graph ten different times, even though obviously to you or me all 10 people will get the same exact graph, with 9/10 of the CPU time expended being completely wasted. Caching means when the server renders a graph, it stores that rendered graph -- then when the next person comes asking for the graph, instead of completely regenerating it from scratch, it can just send the saved result from the previous request. The catch of course is that you don't necessarily get live data. But in this context, as in many others, getting data that's an hour old is no problem at all, so generating the graphs once per hour instead of once per request is a massive time and energy saver on Chris' CPUs, with little or no impact to downstream users like you and me. As an aside, there are of course better possible strategies to caching than just regenerating after a fixed time period -- large websites like Google or reddit or StackExchange etc likely have massively complex caching strategies to best serve the needs of webgoers, probably combining many sub strategies such as fixed-time cache regenerating, regenerating after a certain number of hits on some data, regenerating only when the underlying data changes, and probably a dozen other more advanced strategies as well. In GPU72's case it's probably a fixed-time regeneration (or *maybe* based on when the underlying data changes? Chris?) |
[QUOTE=Dubslow;409561]As an aside, there are of course better possible strategies to caching than just regenerating after a fixed time period -- large websites like Google or reddit or StackExchange etc likely have massively complex caching strategies to best serve the needs of webgoers, probably combining many sub strategies such as fixed-time cache regenerating, regenerating after a certain number of hits on some data, regenerating only when the underlying data changes, and probably a dozen other more advanced strategies as well. In GPU72's case it's probably a fixed-time regeneration (or *maybe* based on when the underlying data changes? Chris?)[/QUOTE]
Without going into too much boring detail: It's common to have a tiered architecture where you have the presentation or web server, a middle tier "service" layer, and then a back end data layer (SQL or whatever). The front end web server is obvious... it handles requests from visitors and spits out HTML. Pretty simple. It'll request things from the service layer which in turn will take those requests and turn them into whatever is needed to access the data layer. Could be a sproc, could be an actual select statement, whatever... you get the idea. Maybe there are multiple things on the data layer like SQL, or a search index, or even flat files, whatever. The websites I admin use a caching layer, Couchbase. It tends to float between the layers... we're using it as a key/value caching solution. You might have heard the term NoSQL. We cache things between the data and service layers, and each entry in there has an expiration so the web server will know how stale that data is. As an example we can relate to, if we had this setup on Primenet (we don't) it would look something like this... you request the history for a range of exponents. Right now it has to look at the database for each exponent and pull data from several different tables together and generates an array of things. Some of it is pretty easy to pull (how high it's been factored) and some of it takes longer (pulling all of the results that have ever been checked in, especially popular ECM targets like M1277). What if instead of grabbing that from SQL each time and doing whatever processing, you stuff that into a Couchbase "document" for that exponent and give it an expiration of like an hour or whatever? Then when you have something like the full history of M1277 which can take a while to show all the ECM stuff because it's parsing each and every historical result, it just reads all of that from Couchbase in the blink of an eye? The trick with it all is to set the expiration so you're not serving stale content when people expect it to be fresh, and it also helps if these are popular things being requested, otherwise it's not likely to save much time if it's cached but nobody uses it between that first database hit and when it expires. To avoid some of that, you can change the service layer to do a dual commit... when it's writing to the database, also update the cache store while you're at it. Let's say someone checks in a new result for M1277. It would update the database and also add a new entry to Couchbase and bump out that expiration time. If you're consistent with that, you can set the expiration to be days or weeks. It is VERY worthwhile to have caching if your site is heavy on the SQL side of things. You can only optimize a query so much, but at some point it's still going to be slow when it's handling lots of traffic. The TL;DR is this... caching=good. |
[QUOTE=Madpoo;409592]The TL;DR is this... caching=good.[/QUOTE]
I'm currently lurking guys. Having to deal with Atoms (shudder) and Humans (double shudder). The Pool Plumbing guys managed to route a drain such that there is an airlock (I told them I thought there was an error but they said "Everything is fine"), and the Diamond Bright guys applied the render when it was too dry. |
[QUOTE=chalsall;409704]I'm currently lurking guys. Having to deal with Atoms (shudder) and Humans (double shudder).[/QUOTE]
A quick update... Spent the last two days pumping 38K litres of water back into the pool (good thing it rained heavily today). Should have "Start Up" tomorrow. |
A breaker tripped Friday night, so I missed 4 THzd of work over the weekend. I'll have to rearrange the power configuration tonight. I'll also be running at reduced throughput during the day for the next few days. I should be back to full tilt Friday evening.
|
It must be calamity time. I had a main water pump fail and have only some servers running until I can do some maintenance. Busy work week.
|
| All times are UTC. The time now is 23:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.