Web server performance analysis : 2012

The famous Apache web server has recently received a negative reputation for being hard to configure, slow, and unable to handle “web-scale” traffic. Many people are moving their web traffic to newer web servers such as node.js and Nginx, but don’t really understand the benefits of these servers as compared to Apache. The performance benefits of these newer web servers is notable, but only with specific traffic patterns.

Apache can do “web-scale”. Sometimes.

Benchmark results can very wildly, depending on everything from the color of Ethernet cables used (not really) to the kernel revision, and the major/minor point release and patch set of the software tested. Nginx 1.2.1 may have very different performance characteristics as compared to the Nginx 1.1.19-ubuntu-1 package that comes with Ubuntu 12.04. Due to the unpredicable performance of EC2 instances, all of the benchmarks were run multiple times, and the fastest test results were selected for comparison.

The performance of web server software will be very different for various web traffic workloads. For example, Nginx is a much better choice for internet radio streaming compared to Apache. You should perform benchmarks with your application on various web servers to understand the unique performance characteristics of your site.

The goal of this benchmark was to test raw throughput and average response time under load of various web servers. This benchmark did not test websockets, long-polling, large file transfers, or streaming. Each of the web servers were tuned and configured to the simple task of returning the string: “Hello World”. All of the benchmarks were performed on c1.medium EC2 instances, running Ubuntu 12.04 64bit. The benchmarks were run using JMeter 2.7, and visualizations were produced by Loadosophia.org.

The Contenders:

Software Revision Notes
Apache – mpm_worker 2.2.22-1ubuntu1 Used the native Ubuntu packages available via `sudo apt-get install apache2`
Apache – mpm_event 2.2.22-1ubuntu1 Same as previous test target, but with a different MPM. Apache has plugable MPM back-ends, that have different response characteristics for various workloads.
Nginx 1.1.19-1 sudo apt-get install nginx
Brubeck b/w Mongrel2 Brubeck: 0.4.0
Mongrel2: Dev branch as of May 29, 2012
Advised in #mongrel2 to use the dev branch for benchmarks
Brubeck WSGI 0.4.0 Single process
Tornado 2.3 Single process.

Maximum throughput test

The goal of this benchmark was to test the average response times under various amounts of load. Ideally, a good web server will respond quickly and the standard deviation of the requests will be minimal. We want low latency and consistent response times.

This test was run for about 8 minutes, started out with 100 jmeter threads un-capped with no timer, then moved upto 500, 1000, and finally 2000. Between each thread group run, there was a small cool-down period.

Apache 2 – mpm_worker

mpm_worker was able to sustain about 3.1k Requests-per-second, with an average response time of 132ms. Not much to say here, this test was performed as a simple baseline.

Apache 2 – mpm_event

mpm_event was slightly slower than mpm_worker, coming in at 3k Requests-per-second, with an average response time of 136ms. However in the 500-thread test, the standard deviation of the test results was lower compared to mpm_worker.

These results were surprising to me, since mpm_event is generally considered to be faster than mpm_worker.

 

Nginx

Nginx was able to outperform Apache in these tests, but by a very small margin. The max requests-per-second hit 3.3k, and average response time was 117ms. However, the standard deviation was much higher than Apache.

Nginx is slightly faster than Apache in this test, but Nginx does not show nearly as consistent performance.

 

Brubeck b/w Mongrel2

The performance of Mongrel2 was the fastest of all servers, but it quickly it became overloaded. At 100 threads Mongrel2 was able to sustain the highest RPS of the test, 6k RPS, at 28ms latency! However, as soon as Mongrel2 hit a tipping point, it quickly fell over.

With the default configuration, running as non-root, Mongrel2 hits a limit of file descriptors. I increased the file descriptors to 65535, and found that the max requests-per-second that Mongrel2 was able to serve got cut in half from 6k down to 3k.

 
Zed Shaw wrote an article about the polling mechanism in Mongrel2, and Willy Tarreau (author of haproxy, and epoll advocate) explains why adding more file descriptors actually slows down Mongrel2 in this article.

Even in the test with the increased file descriptors, Mongrel2 had really good performance up to through 500 threads, but quickly fell over when 1000 jmeter threads came stampeding through. I was told that this performance limitation was probably related to internal ZeroMQ configs that Mongrel2 does not currently expose for tuning. See also 6k RPS Data here: https://loadosophia.org/examples/11579/

 

Brubeck WSGI

Until recently, Brubeck required running a Mongrel2 instance in order to serve web traffic. But to ease the Brubeck development workflow, they recently added WSGI support. This benchmark shows the WSGI interface is highly performant and stable (I encountered no errors up until the 2000 thread test at the end of the test).

My first test followed the “hello world” example on the front of brubeck.io. I shared the results of the test with the development team in #brubeck,. They asked me to rerun the tests using function-based views.

By switching from class-based to function-based views I was able to increase the performance of Brubeck by 30%.

In the function-based views test, Brubeck WSGI was able to maintain an average 2.6kRPS, and response times had the lowest standard deviation observed throughout the entire test at 100 and 500 threads.

My theory is Brubeck has the most consistent performance of all the servers tested because it’s the only server that uses green threads. Context switches are extremely slow in virtualized environments like EC2, much slower than on real hardware. By using python-managed threads instead of OS-managed threads like the other servers tested above, Brubeck is able to handle many more requests in a single scheduled clock cycle. For more information about gevent, and green threads, go watch Bob Hancock’s tutorial from PyCon 2012 about concurrency.

See also: Brubeck class-based views data: https://loadosophia.org/examples/11575/

 

Tornado

The design of Brubeck was inspired by the Tornado web server. However, where Brubeck differs is that Tornado prefers to use callbacks instead of gevent for concurrency. This model of programming can be confusing to implement, because you essentially have to program in reverse.

The performance of Tornado was not much faster than the traditional Apache/Nginx servers for this test, and was only able to sustain an average of 2k RPS, at 344ms. Tornado was not designed to handle this type of rapid ballistic traffic pattern, and it is much better suited for long-polling, or streaming data.

This entry was posted in Performance, Python. Bookmark the permalink. Comments are closed, but you can leave a trackback: Trackback URL.
sitemap