This is kind of a free flowing discussion for IT folks here in our community, and friendly questions for ct.
I'm a professional computer programmer specializing in Ruby on Rails.
I'm a little curious about how your'e going to manage pageviews tonight.
Warning: This is some really complicated stuff, so if you dont understand what's being discussed, feel free to crack a couple of Ted Stevens jokes.
More after the flip
Now i'm scanning the earlier diaries you wrote, ct, and i'm picking up some interesting information.
The front-facing proxy is lighttpd, with your custom memcached enhancements
Do you have a failover setup in place?
How big is the memcached server? A gig? Is there just one memcached instance, or are they spread across several of your app servers.
I think i read somewhere that facebook has the biggest installation of memcached anywhere, at (i could be really wrong) 16 TB.
Using Daily Kos as an example, the lighttpd proxy server sits in front of the mod_perl apache that runs Scoop. When a request is made, lighttpd checks to see if the request does not have a session cookie. If it doesn't, it sees if the URI matches the pattern of URIs to check. If it does, it goes to a mod_magnet lua script that queries the memcached server for the page. If it's present, memcached returns the page, lighttpd gunzips it if necessary, sets the content type, and returns it to the user. If the page is not present in memcached, the request proceeds to the backed. There, the page is made, and if the request is an anonymous one and fits the same pattern of URIs that lighttpd looks in, it places a copy of the page into memcached before serving it up to the user. While that page is active in memcached, any of the other webservers can retrieve the page, saving them the work of regenerating it themselves.
A wise man once said the the hardest things in computer science are naming things and expiring caches. Was the cache expiring logic a pain to setup?
How many apache/mod_perl instances are you using per each quad core xeon server?
For the new webservers, we got six quad core Xeons, each with 8GB RAM and an 80GB SATA disk for logs and such. After much trial, tribulation, and confusion with both nfsroot and iSCSI, they were finally set up with an nfsroot served up from a Sun x4500. This way, they all share one root filesystem for ease of maintenance. Plus, if required we can throw extra machines into the pool and they'll come right up, even configuring swap space if it isn't there.
I'm not entirely famliar with nfsroot. That's the equivalent of a samba share, IIRC, but older and better. Would a SAN be a good idea for using shared disks? Or how about ZFS with solaris?
The new database machines are eight core (or, as I like to say, OCTOCORE) Xeons with 16GB RAM, one 73GB disk for the OS, one 73GB disk dedicated to tmp, and a 6x73 GB RAID-10 for the database files (and with tmp and the db RAID each having a finely tuned XFS filesystem set up on them). Setting those machines up was easier than the webservers, except for the time involved in loading all the data onto them and getting kicked in the head with this MySQL bug, necessitating me upgrading all the MySQL servers to 5.0.51. For the database servers, I'm running a 64 bit Debian etch and the icc compiled MySQL 5.0.51 server. The difference between the icc and gcc versions of MySQL don't seem to be too extreme, but I'm keeping icc for the moment anyway.
I just bought the O'Reilly high performance mysql book. How are you handling replication between the DBs
So, if i get this right, you custom-compiled mysql5 on debian etch?
What does icc do better than gcc for the requirements of this site?
CT, i know these are a ton of questions. If you have time to answer them, great, if not, good work keeping DKos up and running.
More on my background:
I'm a rails programmer. I'm changing my stack from Apache2/mod_proxy/mongrel_cluster to NGINX/HAProxy/Thin. I use mysql for my database, and at one point i'm going to use memcached. I deploy using ubuntu, and just got the God process monitorying framework to work using Ubuntu intrepid... I cant get the damn thing to work in hardy due to a kernel issue, and i cant change the kernel on slicehost...
Once this is all over, i strongly recommend that you take a look at HAProxy. It works well with rails apps due to making sure that only one request hits an instance of mongrel or thin at a time. Rails isnt threadsafe, and has some nasty mutex lock problems that they're just beginning to resolve. Even so, I'm using thin, an evented web server, which is great.
Thanks for all the good work!