Semi Protection

UESPWiki:Administrator Noticeboard/Archives/Squid Assessment

The UESPWiki – Your source for The Elder Scrolls since 1995
Jump to: navigation, search
This is an archive of past UESPWiki:Administrator Noticeboard discussions. Do not edit the contents of this page, except for maintenance such as updating links.

Squid Assessment

Since we just made it through our first weekend with our new squid two-server setup, I thought it might be a good time to discuss how it's working and what (if anything) needs to be done next to improve our server's responsiveness.

First, for those reading this and wondering what on earth "squid" means. Last week Daveh added a second server to UESP, so we now have twice the computing computer. The change was basically invisible to readers and editors because the second server is a squid cache. The new machine (squid1) receives all of UESP's requests, responds directly whenever the request is for a cached (i.e., saved and unchanged copy of a) page, or else transparently forwards the request to our main server (content1). In other words, you just type in "www.uesp.net," then the computers figure out which one needs to do the work, and you get a response without any idea that there are now multiple UESP computers.

The good news is that the squid setup is working ;) There have been a couple of issues reported.

  • The Random Page Link is relatively minor issue; while it would be nice to get it fixed, I don't think it's a high priority while the site is still having performance issues.
  • One site outage which is worrisome, but so far just a single case and might possibly have been a side effect of the switch to squid.

And the new server has improved site responsiveness overall. Pages have been loading faster, and our site's downtimes have been less prolonged and/or less severe. Server status shows that the content server's workload has decreased substantially: content1 rarely has more than 10 requests at a time, it's responding to incoming requests very quickly, and its CPU load is great (1.2% right now).

The bad news is that I really don't think that squid itself is enough to fix the site's problems. Over the weekend, the site was better than it has been on past weekends. In other words, I didn't just walk away and give up on trying to use the site for 12 hours at a time. Nevertheless, performance was poor. It took minutes to access pages most of the time. And at one point on Sunday afternoon, I was unable to access anything (even a server status) for nearly half an hour. I finally gave up and restarted apache on content1, which prompted the site to start responding again. As I'm typing this, the site is clearly getting busy again, and it's taking a couple minutes to load pages. One issue that's unclear right now, though, is to what extent these slowdowns are affecting the typical (anonymous) reader, or to what extent they only affect logged-in readers/editors (with the squid cache, it's possible that anonymous editors who only view cached pages could get good responses while logged-in editors who always view freshly generated pages get poor responses). From a few (possibly non-representative) tests I've done while not logged in, the slowdowns seem to affect anonymous readers, too.

So I think more tweaks are needed if we really want to have a site where readers and editors aren't constantly frustrated by inaccessibility problems. Unfortunately, one side effect of the switch to squid is that it is now very difficult to diagnose performance problems. I don't know of any ways to find out what's happening on squid1, i.e., if squid1 doesn't respond, what's going on? And from content1 there's no way to keep track of who is making the requests (the immediate IP source is always squid1), so it's not really possible to monitor for bogus or problematic requests. Which means that I don't know how to go about figuring out what types of tweaks are needed.

For me to help more with diagnosing and recommending what would be useful, I'd like to start by requesting some ability to access squid1. Even just being able to login to squid1 and run netstat would provide some useful information; if there are other tools available on the server to monitor incoming requests (e.g., number of requests from a given IP, types of requests, etc.) then those would also help.

Also, logic tells me that the same problems that we had with rogue IPs are probably still happening now. There's no reason why the IPs would suddenly disappear overnight just because our servers were reconfigured; it seems far more likely that those IPs are still bombarding the site with useless requests but the requests are now effectively invisible to the available monitoring tools (because the requests are all showing up on squid1 not content1). Having access to squid1 will help to confirm or deny this theory. But I think implementing some tools to deal with these IPs will be needed. The simplest short term solution would be for me to have access to iptables on squid1 and therefore have the ability to block the IPs for a week or a month at a time. Or else a better long term solution would probably be some type of apache module that does this automatically.

Any feedback? --NepheleTalk 16:45, 28 January 2008 (EST)

I was planning on getting Nephele access to squid1 I just haven't had the time (had to work some over the weekend plus still away from home). The issue on the weekend was strange. The site had just about the same traffic on Saturday/Sunday yet there were no issues on Saturday. I spent a little time on Sunday trying to track down the issue but couldn't find anything obvious (no huge DoS or other clients abusing the site it seemed). I'm not entirely sure the caching is caching everything it should be and the bottom line may be site traffic is still too much for two servers anyways (more logged in users which bypass the cache).
Another thing to keep in mind that poor site performance is self-limiting to a point. When performance gets very bad people will end up aborting the web request which reduces load on the server a little bit. Even though we've introduced a cache we still may hit the peak performance of the server albeit by serving more requests. This weekend had a slightly higher number of requests than the usual weekend but not by a huge margin (the previous weekend was a bit higher).
There are still a bunch of things Nephele suggested a while ago that we can try to get even better performance but they'll take time to do. I prefer to just change one thing at a time and see what happens over a few days rather than do them all at once and hope nothing breaks. I should have some time this week even though I'm away. -- Daveh 17:01, 28 January 2008 (EST)
One more issue that needs to be fixed was just pointed out: the forum software now sees every single contributor as coming from the squid IP address. Is it possible to set up the forum software to be more squid-aware (obviously the wiki software is still able to access the original IP address; can phpbb do something similar)? Or is there some other way to fix this problem? Because at the moment the forum moderators have lost one of their useful tools for monitoring/controlling spammers and other miscreants. --NepheleTalk 14:20, 29 January 2008 (EST)
Sounds terrific to me. Are there like, any other "updated" versions of the "Squid" Daveh uploaded? If so think we should give it a try once we test the current one out? And what is a "phpbb" that Nephele mentioned recently. I see you two are working very hard, and you deserve this when I say it, thank you for doing so much to the site for us other users. Thank You, and I will keep in touch with this discussion later. --Playjex 15:03, 29 January 2008 (EST)
What do you mean by 'updated'? I installed whatever the lastest stable/release Squid package was (2.6.?). This is the same version as Wikipedia uses. The easiest fix to the forum issue is to either figure out how to properly foward IPs (even if possible) or to have the forums avoid use of the Squid completely (e.g., a seperate subdomain forums.uesp.net). -- Daveh 21:27, 29 January 2008 (EST)
For the record, phpbb is the software that runs the forums part of the site. --TheRealLurlock Talk 21:37, 29 January 2008 (EST)
For now the forums are now accessed via forums.uesp.net which bypasses the cache (they can still be accessed from the old link which does not bypass the cache). A quick search doesn't reveal any easy solution to the IP issue...this is exactly how a Squid cache is supposed to work. There is the X-Forwarded-For header but it requires whatever app to specifically check for and use it (i.e., I'd have to modify phpbb assuming its even possible). -- Daveh 22:25, 29 January 2008 (EST)
Correction -- Looking more closely we did actually experience significantly higher traffic last weekend by about 20% of a normal weekend (which is 10-20% higher than a typical weekday). It was actually the highest number of page requests we've seen in at least several months. -- Daveh 21:27, 29 January 2008 (EST)
Haha, I apologize Daveh. I just thought that maybe there was an other version of it. Sounds good to me (even though I'm not an admin). Thank you for replying. -Playjex 14:10, 30 January 2008 (EST) P.S. Thanks for specifying what a phpbb is ;]
There is Squid 3.0 which was just released in December, but 2.6 is fine for now. -- Daveh 17:14, 30 January 2008 (EST)
We have a new problem reported with the squid server: it is not allowing non-logged-in users to navigate through the category pages properly. For example, on Category:Oblivion-Quests there's a "next 200" link that is supposed to take you to the next page and show you the rest of the quests. If you're logged in, the link works; if you're not logged in, then the link just gives you the exact same page (starting from entry 1 again instead of starting from entry 201).
I'm guessing the problem is that the squid server is not recognizing that these two links are different:
  • http://www.uesp.net/w/index.php?title=Category:Oblivion-Quests
  • http://www.uesp.net/w/index.php?title=Category:Oblivion-Quests&from=To+Serve+Sithis
In other words, it isn't recognizing that the "from" keyword causes the content of the HTML page to change, so it just keeps dishing out the same version of the page sitting in its cache instead of requesting the correct modified version of the page. --NepheleTalk 17:13, 20 February 2008 (EST)