Title: Page 115 – Alex Kirk

---

 * 
   ## 󠀁[10 Realistic Steps to a Faster Web Site](https://alex.kirk.at/2006/02/02/10-steps-to-a-faster-web-site/)󠁿
   
 * February 2, 2006
 * I complained [before](https://alex.kirk.at/2006/01/03/49/) about [bad guides to improve the performance](https://alex.kirk.at/2006/01/03/49/)
   of your website.
 * [digg it](http://digg.com/programming/10_Realistic_Steps_to_a_Faster_Web_Site),
   [add to delicious](http://del.icio.us/post?v=2&url=http%3A%2F%2Falex.kirk.at%2F2006%2F02%2F02%2F10-steps-to-a-faster-web-site%2F&title=10%20Realistic%20Steps%20to%20a%20Faster%20Web%20Site)
 * I’d like to give you a more realistic guide on how to achieve the goal. I have
   written my [master thesis in computer sciences](https://alex.kirk.at/papers/caching-strategies/diploma_thesis.html)
   on this topic and will refer to it throughout the guide.
 * **1. Determine the bottleneck**
    When you want to improve the speed of your website,
   you feel that it’s somehow slow. There are various points that can affect the
   performance of your page. Here are the most common ones.
 * Before we move on, you should always remember that you answer each question with
   your target audience in mind.
 * **1.1. File Size**
    How much data is the user required to load before (s)he can
   use the page.
 * It is a frequent question, how much data your web page is allowed to have. You
   cannot answer this unless you know your target audience.
 * In the early years of the internet one would suggest a size of 30k max for the
   whole page (including images, etc.). Now that many people have a broadband connection,
   I think we can push the level to a value between 60k and 100k. Although, you 
   should consider lowering the size if you also target modem users.
 * Still, the less data you require to download, the faster your page will appear.
 * **1.2. Latency**
    The time it takes between your request to the server and when
   the data reaches your PC.
 * This time adds together from twice the network latency (which depends on the 
   uplink of the hosting provider, the geographical distance between server and 
   user, and some other factors) and the time it takes until the server produces
   the output.
 * Network latency can hardly be optimized without moving the server, so this guide
   will not cover this.
    The processing time of the server combines complex time
   factors and contains most often much room for improvement.
 * **2. Reducing the file size**
    First, you need to know how large your page really
   is. There are some useful tools out there. I picked [Web Page Analyzer](http://www.websiteoptimization.com/services/analyze/index.html)
   which does a nice job at this.
 * I suggest not spending too much time on this, unless your page size is larger
   than 100kb. So skip to step 3.
 * Large page sizes are nowadays often caused by large JavaScript libraries. Often
   you only need a small part of their functionality, so you could use a cut-down
   version of it. For example when using prototype.js just for Ajax, you could use
   [pt.ajax.js](https://alex.kirk.at/2005/10/05/prototypejs-just-for-ajax/) (also
   see [moo.ajax](http://www.mad4milk.net/entry/moo.ajax)), or the [moo.fx](http://moofx.mad4milk.net/)
   as a script.aculo.us replacement.
 * [Digg](http://digg.com/) for example used to have [about 290kb](http://project-2501.net/?view=document&id=525),
   they now have reduced the size to [160kb](http://www.websiteoptimization.com/services/analyze/wso.php?url=digg.com)
   by leaving out unnecessary libraries.
 * Also large images can cause large file sizes, this is often caused by the wrong
   image format. A rule of thumb: JPG for photos, PNG for most other aspects, especially
   if plain colors are involved. Also: use PNG for screen shots, JPGs are not only
   larger but also look ugly. You can also use GIF instead of PNG when the image
   has only few colors and/or you want to create an animation.
 * Also often large images are scaled via the HTML `width` and `height` attributes.
   You should do this in your graphical editor and scale it there. This will also
   reduce the size.
 * Old HTML style can also cause large file size. There is no need for thousands
   of  tags anymore. Use [XHTML](http://www.w3.org/TR/xhtml11/) and [CSS](http://www.w3.org/Style/CSS/)!
 * A further important step to smaller size is on-the-fly compressing of your content.
   Almost all browsers already support [gzip compression](http://en.wikipedia.org/wiki/Gzip).
   For an Apache 2 web server, for example, there is the [mod_deflate](http://httpd.apache.org/docs/2.0/mod/mod_deflate.html)
   module can do this transparently for you.
 * If you don’t have access to your server’s configuration, you can use the [zlib](http://php.net/zlib)
   for PHP or for Django (Python) there is [GZipMiddleware](http://www.djangoproject.com/documentation/middleware/#django-middleware-gzip-gzipmiddleware),
   Ruby on Rails has a [gzip plugin](http://wiki.rubyonrails.org/rails/pages/Output+Compression+Plugin),
   too.
 * Beware of compressing JavaScript, there are [quite](http://support.microsoft.com/kb/312496)
   [some](http://support.microsoft.com/kb/871205) bugs with Internet Explorer.
 * And for heaven’s sake, you can also strip the white space after you’ve completed
   the previous steps.
 * **3. Check what’s causing a high latency**
    As mentioned, the latency can be 
   caused by two large factors.
 * **3.1. Is it the network latency?**
    To determine whether the network latency
   is the blocking factor you can ping your server. This can be done from the command
   line via the command `ping servername.com`
 * If your server admin has disabled the pinging function you can also use a traceroute
   which uses another method to determine the time `tracert servername.com` (Windows)
   or `traceroute servername.com` (Unix).
 * If you address an audience that is geographically not very close to you, you 
   can also use a service such as [Just Ping](http://www.just-ping.com/) which pings
   the given address from 12 different locations in the world.
 * **3.2. Does it take too long to generate the page?**
    If the ping times are ok,
   it might take too long to generate the page. Note that this applies to dynamic
   pages, for example written in a scripting language such as PHP. Static pages 
   are usually served very quickly.
 * You can measure the time it takes to generate the page quite easily. You just
   need to save an time stamp at the beginning of the page and subtract it from 
   the time stamp when the page has been generated. For example in PHP you do it
   like this (due to technical restrictions a space is inserted before the question
   mark):
 * `< ?php // Start of the Page $start_time = explode(' ', microtime()); $start_time
   = $start_time[1] + $start_time[0]; ?>`
 * and at the end of the page:
 * `< ?php $end_time = explode(' ', microtime()); $total_time = $end_time[0] + $
   end_time[1] - $start_time; printf('Page loaded in %.3f seconds.', $total_time);?
   >`
 * The time needed to generate the page is now displayed at the bottom of it.
 * You can also compare the time between loading a static page (often a file ending
   in .html) and a dynamic one. I’d advise to use the first method because you are
   going to need that method to go on optimizing the page.
 * You can also use a [Profiler](http://en.wikipedia.org/wiki/Profiler_(computer_science))
   which usually offers even more information on the generation process.
 * For PHP you can, as a first easy step, enable [Output Buffering](http://php.net/ob_start)
   and restart the test.
 * Also you should consider testing your page with a benchmarking program such as
   [ApacheBench (ab)](https://alex.kirk.at/papers/caching-strategies/diploma_thesisch4.html#x8-510004.8).
   This will stress the server via requesting several copies at once.
 * It is difficult to say what time suffices for generating a web page. It depends
   on your own requirements. You should try to keep the generation time under 1 
   second, as this is a delay which users usually can cope with.
 * **3.3. Is it the rendering performance?**
    This plays only a minor role in my
   guide, but still this can be a reason why your page takes long to load.
 * If you use a complex table structure (which can render slowly), you most probably
   are using old style HTML, try to switch to XHTML and CSS.
 * Don’t use overly complex JavaScript, like slow scripts in combination with `onmousemove`
   events make a page real sluggish. If your JavaScript makes the page load slowly(
   you can use a similar technique as the PHP time measuring, using the `(new Date()).
   getMilliseconds()`), you are doing something wrong. Rethink your concept.
 * **4. Determine the lagging component(s)**
    As your page usually consists of more
   than one component (such as header, login window, navigation, footer, etc.) you
   should next check which one needs tuning. You can do this by integrating a few
   of the measuring fragments to the page which will show you several split times
   throughout the page.
 * The following steps can now be applied to the slowest parts of the page.
 * **5. Enable a Compiler Cache**
    Scripting languages recompile their script upon
   each request. As there are far more requests to the unchanged script, it makes
   no sense to compile the script over and over (especially when core development
   has finished).
 * For PHP there is amongst others [APC](http://pecl.php.net/apc) (which will probably
   be integrated with [PHP 6](http://www.php.net/~derick/meeting-notes.html#add-an-opcode-cache-to-the-distribution-apc)),
   Python stores a [compiled version](http://www.python.org/doc/2.2.3/tut/node8.html#SECTION008120000000000000000)
   by itself.
 * **6. Look at the DB Queries**
    At university most complex queries with lots of
   JOINs and GROUPs are taught, but in real life it can often be useful to avoid
   JOINs between (especially large) tables. Instead you do multiple selects which
   can be cached by the SQL server. This is especially true if you don’t need the
   joined data for every row. It really depends on your application, but trying 
   without a JOIN is often worth it.
 * Ensure that you use query folding (also called query cache; such as the [MySQL Query Cache](https://alex.kirk.at/papers/caching-strategies/diploma_thesisch4.html#x8-350004.3.2)).
   Because in a web environment the same SELECT statements are executed over and
   over. This almost screams for a cache (and explains why avoiding JOINs can be
   much faster).
 * **7. Send the correct Modification Data**
    Dynamic Web pages often make one big
   mistake: They don’t have their date of last modification set. This means that
   the browser always has to load the whole page from the server and cannot use 
   its cache.
 * In HTTP there are various headers important for caching: for 1.0 there is the`
   Last-Modified` header which plays together with the browser-sent `If-Modified-
   Since` (see [specification](http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.1)).
   HTTP 1.1 uses the `ETag` (so called [Entity Tag](http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.2))
   which allows different last modification dates for the same page (e.g. for different
   languages). Other relevant headers are `Cache-Control` and `Expires`.
 * Read on about [how to set the headers correctly and respond to them (1.0)](https://alex.kirk.at/papers/caching-strategies/diploma_thesisch6.html#x17-720006.1.1)
   and [1.1](http://simon.incutio.com/archive/2003/04/23/conditionalGet).
 * **8. Consider Component Caching** (advanced)
    If optimizing the database does
   not improve your generation time enough, you are most likely doing something 
   complex ;) So for public pages it’s very likely that you will present two users
   with the same content (at least for a specific component). So instead of doing
   complex database queries, you can store a pre-rendered copy and use that when
   needed, to save time.
 * This is a rather complex topic but can be the ultimate solution to your performance
   problems. You need to make sure that you don’t deliver a stale copy to the client,
   you need think about how to organize your cache files so you can invalidate them
   quickly.
 * Most web frameworks give you a hand when doing component caching: for PHP there
   is [Smarty’s template caching](https://alex.kirk.at/papers/caching-strategies/diploma_thesisch9.html#x25-1040009),
   Perl has [Mason’s Data Caching](http://www.masonhq.com/docs/manual/Devel.html#data_caching),
   Ruby’s Rails has [Page Caching](http://api.rubyonrails.com/classes/ActionController/Caching/Pages.html),
   Django [supports it as well](http://www.djangoproject.com/documentation/cache/).
 * This technique can eventually lead to a result when loading your page does not
   need any request to the data base. This can be a favorable result as a connection
   to the database is often the most obvious bottleneck.
 * If your page is not that complex you could also consider just caching the whole
   page. This is easier but makes the page usually feel less up-to-date.
 * One more thing: If you have enough RAM you should also consider storing the cache
   files in a RAM drive. As the data is discardable (as it can be re-generated at
   any time) a loss when rebooting would not matter. Keeping disk I/O low can boost
   the speed once again.
 * **9. Reducing the Server Load**
    Consider that your page loads quickly and everything
   looks alright, but when too many users access the page, it suddenly becomes slow.
 * This is most likely due to a lack of resources on the server. You cannot add 
   an indefinite amount of CPU power or RAM into the server but you can handle what
   you’ve got more carefully.
 * **9.1. Use a Reverse Proxy** (needs access to the server)
    Whenever a request
   needs to be handled, a whole copy (or child process) of the web server executable
   needs to be held in memory. Not only for the time of generating the page but 
   also until the page has been transferred to the client. Slow clients can cost
   performance. When you have many users connecting, you can be sure that quite 
   a few slow ones will block the line for somebody else just for transferring back
   the data.
 * So there is a solution for this. The well known Squid proxy has a [HTTP Acceleration](https://alex.kirk.at/2005/11/29/squids-http-acceleration-mode/)
   mode which handles communication with the client. It’s like a secretary that 
   handles all communication.
 * It waits patiently until the client has filed his request. Asks the web server
   to respond, quickly receives the response (while the web server can move on to
   the next request) and then will patiently return the file to the client.
 * Also the Squid server is small, lightweight, and specialized for that task. Therefore
   you need less RAM for more clients which allows a higher throughput (regarding
   served clients per time unit).
 * **9.2. Take a lightweight HTTP Server** (needs access to the server)
    Often people
   also say that Apache is quite huge and does not do it’s work quickly enough. 
   Personally I am satisfied with its performance, but when it comes to dealing 
   with scripting languages that handle their web server communication via the (
   fast)CGI interface, Apache is easily trumped by a lightweight alternative.
 * It’s called [LightTPD](http://www.lighttpd.net/) (pronounced “lighty”) and does
   a good job at doing that special task very quickly. You can already see from 
   a [configuration file](http://www.lighttpd.net/documentation/configuration.html)
   that it keeps things simple.
 * I suggest testing both scenarios if you gain from using LightTPD or if you should
   stay with your old web server. The Apache Web Server is stable and is built on
   long lasting experience in the web server business, but LightTPD is taking it’s
   chance.
 * **10. Server Scaling** (extreme technique)
    Once you have gone through all steps
   and your page still does not load fast enough (most obvious because of too many
   concurrent users), you can now duplicate your hardware. Because of the previous
   steps there isn’t too much work left.
 * The Reverse Proxy can act as a load balancer by sending its requests to one of
   the web servers, either quite-randomly ([Round Robin](http://en.wikipedia.org/wiki/Round-robin))
   or server load driven.
 * **Conclusion**
    All in all you can say that the main strategy for a large page
   is a combination of caching and intelligent handling of the resources helps you
   reach the goal. While the first 7 steps apply to any page, the last 3 points 
   are usually only useful (and needed) at sites with many concurrent users.
 * The guide shows that you don’t need a special server to withstand [slashdotting](http://slashdot.org/)
   or [digging](http://digg.com/).
 * **Further Reading**
    For more detail on each step I recommend taking a look at
   my [diploma thesis](https://alex.kirk.at/papers/caching-strategies/diploma_thesis.html).
 * MySQL tuning is nicely described in [Jeremy Zawodny’s](http://jeremy.zawodny.com/blog/)
   [High Performance MySQL](http://highperformancemysql.com/). A presentation about
   how [Yahoo tunes its Apache Servers](http://public.yahoo.com/~radwin/talks/yapache-apachecon2005.htm).
   Some tips for [Websites running on Java](http://www.javaperformancetuning.com/tips/j2ee_srvlt.shtml).
   [George Schlossnagle](http://www.schlossnagle.org/~george/blog/) gives some good
   tips for caching in his [Advanced PHP Programming](http://www.samspublishing.com/bookstore/product.asp?isbn=0672325616&rl=1).
   His tips are not restricted to PHP as a scripting language.
 * [digg it](http://digg.com/programming/10_Realistic_Steps_to_a_Faster_Web_Site),
   [add to delicious](http://del.icio.us/post?v=2&url=http%3A%2F%2Falex.kirk.at%2F2006%2F02%2F02%2F10-steps-to-a-faster-web-site%2F&title=10%20Realistic%20Steps%20to%20a%20Faster%20Web%20Site)
 * performance, tuning, website
 * [Code](https://alex.kirk.at/category/code/), [Web](https://alex.kirk.at/category/web/)
 * 
   ## 󠀁[Blummy: Major Update](https://alex.kirk.at/2006/01/23/blummy-major-update/)󠁿
   
 * January 23, 2006
 * I’m proud to announce a major update of Blummy.
 * Blummy now includes blummyWiki, a small notebook-wiki that can be displayed within
   the current page. So you could store favorite URLs or commonly used information
   there which will be available on any page just like Blummy. And of course it’s
   crossbrowser and crosscomputer.
 * In my opinion this is a very unique and useful feature.
 * There is now also custom color and css support. You can browse other users’ Blummys
   via the “Explore” link in the menu.
    Furthermore Blummy is now secured by password
   at last (so private blummlets are quite secure now).
 * A few words on the blummyWiki: It is invoked by switching an option on the [Prefs page](http://blummy.com/prefs.php)
   or by using one of these blummlets: [open on the left](http://blummy.com/config.php?query=id%3D2131),
   [open below](http://blummy.com/config.php?query=id%3D2133) and [open on the right](http://blummy.com/config.php?query=id%3D2132).(
   [display all three blummlets in configuration view](http://blummy.com/config.php?query=blummyWiki)).
 * blummy, major, update
 * [blummy](https://alex.kirk.at/category/projects/blummy/), [Projects](https://alex.kirk.at/category/projects/)

 [Previous Page](https://alex.kirk.at/page/114/?output_format=md&term_id=1122) [Next Page](https://alex.kirk.at/page/116/?output_format=md&term_id=1122)