PHP multibyte (UTF-8) string problem solved

Estimated reading time of this article: 1 minute

I had many PHP warnings in my Drupal logs.

The problem

Warning: htmlspecialchars() [function.htmlspecialchars]: Invalid multibyte sequence in argument in check_plain() (Line 1577 of includes/bootstrap.inc).

The reason for this problem was that an UTF-8 string containing multibyte charachers was manipulated with standard (i.e. singlebyte) PHP functions such as substr() and preg_match().

The solution

PHP multibyte string functions must be used, see in the PHP reference Multibyte String Functions.

  • Use mb_substr() instead of substr().
  • Use the u modifier for regex such as preg_match().

Multibyte problems with UTF-8 can be quite tricky as the may occur only sporadic since UTF-8 is for western countries mostly singlebyte.