PHP – Alex Kirk

Suppose you have a ressource on the web (for example an API) that either generates a lot of load, or that is prone to be abused by excessive use, you want to rate-limit it. That is, only a certain number of requests is allowed per time-period.

A possible way to do this is to use Memcache to record the number of requests received per a certain time period.

Task: Only allow 1000 requests per 5 minutes

First attempt:
The naive approach would be to have a key rate-limit-1.2.3.4 (where 1.2.3.4 would be the client’s IP address) with a expiration time of 5 minutes (aka 300 seconds) and increment it with every request. But consider this:

10:00: 250 reqs -> value 250
10:02: 500 reqs -> value 750
10:04: 250 reqs -> value 1000
10:06: 100 reqs -> value 1250 -> fails! (though there were only 850 requests in the last 5 minutes)

Whats the problem?

Memcache renews the expiration time with every set.

Second attempt:
Have a new key every 5 minutes: rate-limit-1.2.3.4-${minutes modulo 5}. This circumvents the problem that the key expiration but creates another one:

10:00: 250 reqs -> value 250
10:02: 500 reqs -> value 750
10:04: 250 reqs -> value 1000
10:06: 300 reqs -> value 300 -> doesn’t fail! (though there were 1050 requests in the last 5 minutes)

Solution:
Store the value for each minute separately: rate-limit-1.2.3.4-$hour$minute. When checking, query all the keys in the last 5 minutes to calculate the requests in the last 5 minutes.

Sample code:


foreach ($this->getKeys($minutes) as $key) {
    $requests += $this->memcache->get($key);
}

$this->memcache->increment($key, 1);

if ($requests > $allowedRequests) throw new RateExceededException;

For your convenience I have open sourced my code at github: php-ratelimiter.

Just a quick note, be careful when using the whitespace character \s in preg_match when operating with UTF-8 strings.

Suppose you have a string containing a dagger symbol. When you try to strip all whitespace from the string like this, you will end up with an invalid UTF-8 character:

$ php -r 'echo preg_replace("#\s#", "", "?");' | xxd 0000000: e280

(On a side note: xxd displays all bytes in hexadecimal representation. The resulting string here consists of two bytes e2 and 80)

\s stripped away the a0 byte. I was unaware that this character was included in the whitespace list, but actually it represents the non-breaking space.

So actually use the u (PCRE8) modifier as it will be aware of the a0 “belonging” to the dagger:

$ php -r 'echo preg_replace("#\s#u", "", "?");' | xxd 0000000: e280 a0

By the way, trim() doesn’t strip non-breaking spaces and can therefore safely be used for UTF-8 strings. (If you still want to trim non-breaking spaces with trim, read this comment on PHP.net)

Finally here you can see the ASCII characters matched by \s when using the u modifier.

$ php -r '$i = 0; while (++$i < 256) echo preg_replace("#[^\s]#", "", chr($i));' | xxd 0000000: 090a 0c0d 2085 a0 $ php -r '$i = 0; while (++$i < 256) echo preg_replace("#[^\s]#u", "", chr($i));' | xxd 0000000: 090a 0c0d 20

Functions operating just on the ASCII characters (with a byte code below 128) are generally safe, as the multi-byte characters of UTF-8 have a leading bit of one (and are therefore above 128).

Category: PHP

Add a Rate Limit to Your Website

preg_match, UTF-8 and whitespace