UTF-8 at PHP Level:
Must use the mb_* functions (such as mb_strpos() and mb_strlen() ) whenever operate on Unicode string. For example you use substr() on a UTF-8 string, there's a good chance the result will include some garbled half-characters.
Not all string functions have an mb_* counterpart, in this case, should use the mb_internal_encoding() function at the top of every PHP script (or at the top of your global include script), and mb_http_output() function right after it if script is outputting to a browser.
Many PHP functions that operate on strings have an optional parameter letting you specify the character encoding. Always explicitly indicate UTF-8 when given the option. For example, htmlentities() has an option for character encoding, and you should always specify UTF-8 if dealing with such strings.
UTF-8 at the Database level:
Make sure strings go from PHP to MySQL as UTF-8, make sure database and tables are all set to the utf8mb4 character set and collation, and that you use the utf8mb4 character set in the PDO connection string.
Note: Must use the utf8mb4 character set for complete UTF-8 support, not the utf8 character set!
UTF-8 at the browser level:
Use the mb_http_output() function to ensure that your PHP script outputs UTF-8 strings to your browser.
The browser will then need to be told by the HTTP response that this page should be considered as UTF-8. Today, it is common to set the character set in the HTTP response header like this:
<?php header('Content-Type: text/html; charset=UTF-8');
<?php // Tell PHP that we're using UTF-8 strings until the end of the script mb_internal_encoding('UTF-8'); $utf_set = ini_set('default_charset', 'utf-8'); if (!$utf_set) { throw new Exception('could not set default_charset to utf-8, please ensure it\'s set on your system!'); } // Tell PHP that we'll be outputting UTF-8 to the browser mb_http_output('UTF-8'); // Our UTF-8 test string $string = 'Êl síla erin lû e-govaned vîn.'; // Transform the string in some way with a multibyte function // Note how we cut the string at a non-Ascii character for demonstration purposes $string = mb_substr($string, 0, 15); // Connect to a database to store the transformed string // See the PDO example in this document for more information // Note the `charset=utf8mb4` in the Data Source Name (DSN) $link = new PDO( 'mysql:host=your-hostname;dbname=your-db;charset=utf8mb4', 'your-username', 'your-password', array( PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, PDO::ATTR_PERSISTENT => false ) ); // Store our transformed string as UTF-8 in our database // Your DB and tables are in the utf8mb4 character set and collation, right? $handle = $link->prepare('insert into ElvishSentences (Id, Body, Priority) values (default, :body, :priority)'); $handle->bindParam(':body', $string, PDO::PARAM_STR); $priority = 45; $handle->bindParam(':priority', $priority, PDO::PARAM_INT); // explicitly tell pdo to expect an int $handle->execute(); // Retrieve the string we just stored to prove it was stored correctly $handle = $link->prepare('select * from ElvishSentences where Id = :id'); $id = 7; $handle->bindParam(':id', $id, PDO::PARAM_INT); $handle->execute(); // Store the result into an object that we'll output later in our HTML // This object won't kill your memory because it fetches the data Just-In-Time to $result = $handle->fetchAll(\PDO::FETCH_OBJ); // An example wrapper to allow you to escape data to html function escape_to_html($dirty){ echo htmlspecialchars($dirty, ENT_QUOTES, 'UTF-8'); } header('Content-Type: text/html; charset=UTF-8'); // Unnecessary if your default_charset is set to utf-8 already ?><!doctype html> <html> <head> <meta charset="UTF-8"> <title>UTF-8 test page</title> </head> <body> <?php foreach($result as $row){ escape_to_html($row->Body); // This should correctly output our transformed UTF-8 string to the browser } ?> </body> </html>