I'm Jon Olick. I make shiny things. I simplify.

I presented Sparse Voxel Octrees at Siggraph 2008.

Sunday, December 27, 2009

Web Scalability

The problem of web-scalability is an interesting one. In general you need a web server to scale between 1 and 250,000 simultaneous connections. The 250,000 peak is to survive a digging or slashdotting or the like. Although the peak doesn't happen very often, you still have to prepare for it as it can be a very important time to be responsive. There really are two sides to the problem. On the one side, you need a solution to add hardware dynamically as needed and on the other side, you need a software solution which can actually take advantage of all that hardware. Luckily, cloud services like Amazon and Rackspace fit the hardware problem nicely. The problem left is the software problem. So first, lets describe the process every web request goes through.
1) The web-browser connects to the web server via TCP/IP.
2) The web-browser send a HTTP request looking something like "GET /\n\n" (in the simplest possible format).
3) The web-server gets the requested document, and sends it back to the client.

In any modern web app step 3 is performed differently if you are serving static or dynamic content. If the content is static, then the file can be cached in memory in the best case and served rather quickly. If the content is dynamic then the php, perl, python, whatever is run with the source file and the given parameters (index.php?foo=bar&bar=ack). The output is then piped back to the web-browser through the TCP connection. Done.

There are a couple problems with this. If we had a 1,000 processor machine this would be trivial, but alas, such a computer is the stuff of future for at least another 7 or 8 years. So a solution must be found with todays hardware, and so that is what we will do.

First, lets go into a little more depth on the previous list:
1) TCP/IP connection to port 80
2) HTTP request
3) CGI program startup (Python,Perl,PHP,ASP,etc)
4) CGI code parse
5) Database connect
6) Database requests
7) HTTP response

In 2010, a single computer can't handle a digging, so we must share the load.

Turns out that there already is a solution that parallelizes 3,4 and 5. Its called FastCGI (http://www.fastcgi.com/drupal/).

Parallelizing 6 takes a little thought. At first glance, you might think that FastCGI will parallelize that too, however upon closer inspection it is pretty easy to see that while you may have 10 different FastCGI servers, there is still only a single SQL database which serves them all... Ok, so you don't want to pay ridiculous amounts of money (or any money really) for a SQL solution which can handle this so you choose MySQL. MySQL however as far as I know doesn't have a parallelization through automatic database replication and synchronization of multple computers. However, that is pretty much exactly what you want to do. Luckily you can do this by hand in your CGI. First you need an identical initial DB setup, then any time you UPDATE or INSERT you do it to every server replication, but any SELECTs you only need to do from any one of the servers. Reads are fast, writes are slow. So you randomly select a server to do the SELECT requests from (or round robin, or whatever) in order to distribute the load. With the typical property that you will generally do 10x to 20x more DB reads than writes this will likely do a good job at paralellizing enough to be able to support a digging.

So what are we going to do about 1,2, and 7?
In order to give yourself the best chance of reaching 250,000 simultaneous requests, you need a good choice of web server. Apparently, Apache 2 sucks, Apache 1 is better, but Lighttpd beats them all. Ok, so once you chose a good server, the rest is quite simple, make duplicates of the server at different IPs then add multiple "A" records to your DNS entry. Each "A" record will get sent an equal amount of traffic. So you want 3 front end web-servers, then just have 3 "A" records. Each server will get sent 1/3 of the total traffic. Easy.

No comments:

Post a Comment