Title: Page 132 – Alex Kirk

---

 * 
   ## 󠀁[PHP and Multibyte](https://alex.kirk.at/2005/04/05/php-and-multibyte/)󠁿
   
 * April 5, 2005
 * ever messed around with umlauts or other non [a-z] letters? it’s quite horrible.
 * for the german speaking region there are mainly two encoding types: iso8859-1
   and utf-8. the former encodes each letter with one byte by extending old 7-bit
   ascii with 127 more letters, amongst others also umlauts. utf-8 includes up to
   32,640 more letters (ascii 0x80-0xff are used to select the range of the following
   byte). this is established by allowing multi-byte characters. in the case of 
   utf-8 the maximum is two letters, but there exist utf-16 and utf-32 with up to
   4 bytes per char.
 * so, what’s the problem? with bandnews we have different sources for our data,
   meaning that we receive many pages with many different encodings and have to 
   deliver a page that follows only one encoding. we chose to use utf-8 now, because
   a wide range of letters from many other encodings can be displayed which are 
   not included in iso8859-1.
 * now it is important that you stop using [strlen](http://php.net/strlen) and [substr](http://php.net/substr)
   because it can easily happen that you split an utf-8 character into parts, and
   forget comparing it to anything, then. alterenatives are [mb_strlen](http://php.net/mb_strlen)
   and [mb_substr](http://php.net/mb_substr) and all other sorts of [mb_*](http://php.net/manual/en/ref.mbstring.php)
   functions. well… this does not work out of the box, you need to specify what 
   encoding is to be expected. this can be done like this:
    ` mb_internal_encoding("
   UTF-8");  all mb_* commands use this encoding if no other is specified.
 * still, non-utf-8 code can come through to the browser, e.g. if you receive it
   from the database. but there is a chance to get around this quite comfortably:
   `
   mb_http_output("UTF-8"); ob_start("mb_output_handler");  the output buffer is
   cleared from wrong charactes by the mb_output_handler. it is also easily possible
   to have the output converted to iso8859-1, just by specifying it with the [mb_http_output](http://php.net/mb_http_output)
   command. a drawback is, though, that no other output filter can be applied, such
   as for output compression  ob_start("ob_gzhandler");
 * the manual states that instead zlib compression should be used, as specified 
   in the php.ini file or via [ini_set](http://php.net/ini_set):
    ` ini_set ('zlib.
   output_compression', 'on'); ini_set ('zlib.output_handler', 'mb_output_handler');
   ob_start();  note that the output-handler for [ob_start](http://php.net/ob_start)
   has to be empty and it is moved to the config option. this sounds great, but 
   i was not able to get it to work. well, i must admit that i did not put so much
   time into it because i simply decided to move the responsibility to apache: [mod_deflate](http://httpd.apache.org/docs-2.0/mod/mod_deflate.html).
   you might want to modify the configuration line, as i did:  AddOutputFilterByType
   DEFLATE text/html text/plain text/xml text/javascript text/css
 * have fun with character encoding. it works after some while. but its a lot of
   trial and error.
 * [bandnews](https://alex.kirk.at/category/projects/bandnews/), [PHP](https://alex.kirk.at/category/code/php/)
 * 
   ## 󠀁[live search](https://alex.kirk.at/2005/03/23/live-search/)󠁿
   
 * March 23, 2005
 * I have just integrated a cool new feature into bandnews: while typing, matching
   bands are now displayed underneath the searchbox. cool and fast alternative to
   the band dropdown.
 * [bandnews](https://alex.kirk.at/category/projects/bandnews/)

 [Previous Page](https://alex.kirk.at/page/131/?output_format=md&term_id=1122) [Next Page](https://alex.kirk.at/page/133/?output_format=md&term_id=1122)