{"id":73,"date":"2009-02-14T14:03:31","date_gmt":"2009-02-14T06:03:31","guid":{"rendered":"http:\/\/www.computersolutions.cn\/blog\/?p=73"},"modified":"2009-03-26T01:37:06","modified_gmt":"2009-03-25T17:37:06","slug":"utf-8-mysql-woes-and-how-to-force-it-to-work","status":"publish","type":"post","link":"https:\/\/www.computersolutions.cn\/blog\/2009\/02\/utf-8-mysql-woes-and-how-to-force-it-to-work\/","title":{"rendered":"UTF-8, MySQL woes and how to force it to work!"},"content":{"rendered":"<p>As other web developers in China (and other non-latin language locations) probably know, UTF-8 is something that we need for Chinese to work correctly, however despite asking for UTF8 in the table structure, it doesn&#8217;t mean that MySQL or PHP is going to accede to our wishes.<\/p>\n<p><!--more-->Here is how we at Computer Solutions cope with forcing MySQL to UTF-8, despite its unwillingness!<\/p>\n<p>First off, we need to configure the MySQL config via the my.cnf file to force the defaults to utf-8, instead of latin1.<\/p>\n<p>Most systems my.cnf should be in either \/etc\/my.cnf or \/etc\/mysql\/my.cnf (assuming *nix)<\/p>\n<p>Make sure the following lines are added under mysqld:<\/p>\n<blockquote>\n<pre>[mysqld]\r\ndefault-character-set=utf8\r\ncharacter-set-server=utf8\r\ncollation-server=utf8_general_ci<\/pre>\n<\/blockquote>\n<p>Restart mysql (debian or similar init.d should be \/etc\/init.d\/mysql restart)<\/p>\n<p>Lets check that worked &#8211;<\/p>\n<pre>mysql -u root -p\r\nshow variables like \"%character%\";<\/pre>\n<p>If you see results like this:<\/p>\n<blockquote>\n<pre>show variables like \"%character%\";<\/pre>\n<pre>+--------------------------+----------------------------+<\/pre>\n<pre>| Variable_name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | Value\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>+--------------------------+----------------------------+<\/pre>\n<pre>| character_set_client\u00a0\u00a0\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_connection | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_database\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_filesystem | binary\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_results\u00a0\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_server\u00a0\u00a0\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_system\u00a0\u00a0\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_sets_dir\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | \/usr\/share\/mysql\/charsets\/ |<\/pre>\n<pre>+--------------------------+----------------------------+<\/pre>\n<pre>8 rows in set (0.00 sec)<\/pre>\n<\/blockquote>\n<p><strong>Then it worked<\/strong>.\u00a0\u00a0 If it <strong>doesn&#8217;t<\/strong> and you see something similar to below<\/p>\n<blockquote>\n<pre>mysql&gt; show variables like \"%character%\";<\/pre>\n<pre>+--------------------------+----------------------------+<\/pre>\n<pre>| Variable_name\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | Value\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>+--------------------------+----------------------------+<\/pre>\n<pre>| character_set_client\u00a0\u00a0\u00a0\u00a0 | latin1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_connection | latin1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_database\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_filesystem | binary\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_results\u00a0\u00a0\u00a0 | latin1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_server\u00a0\u00a0\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_set_system\u00a0\u00a0\u00a0\u00a0 | utf8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |<\/pre>\n<pre>| character_sets_dir\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | \/usr\/share\/mysql\/charsets\/ |<\/pre>\n<pre>+--------------------------+----------------------------+<\/pre>\n<pre>8 rows in set (0.00 sec)<\/pre>\n<\/blockquote>\n<p>Then we need to re-edit \/etc\/my.cnf<\/p>\n<p>Add this under the [mysqld] lines we edited earlier.<\/p>\n<blockquote>\n<pre>skip-character-set-client-handshake<\/pre>\n<\/blockquote>\n<p>then restart mysql (\/etc\/init.d\/mysql restart )\u00a0\u00a0 and recheck.<\/p>\n<p>That forces MySQL to *use* what we &lt;expletive deleted&gt; well told it too!<\/p>\n<p>That unfortunately isn&#8217;t the end of the story.\u00a0 If like us, you&#8217;ve been merrily using UTF-8 data inside Latin1 encoded tables accidentally, you&#8217;ll still need to export your data in the correct format, and reimport it.<\/p>\n<p>This can be done as follows:<\/p>\n<p>Use MySQLDump with an explicit encoding passed to it:<\/p>\n<blockquote>\n<pre><span class=\"ident\">mysqldump<\/span> <span class=\"punct\">--<\/span><span class=\"ident\">user<\/span><span class=\"punct\">=<\/span><span class=\"ident\">username<\/span> <span class=\"punct\">--<\/span><span class=\"ident\">password<\/span><span class=\"punct\">=<\/span><span class=\"ident\">password<\/span> <span class=\"punct\">--<\/span><span class=\"ident\">default<\/span><span class=\"punct\">-<\/span><span class=\"ident\">character<\/span><span class=\"punct\">-<\/span><span class=\"ident\">set<\/span><span class=\"punct\">=<\/span><span class=\"ident\">latin1 \\\r\n<\/span>\u00a0<span class=\"punct\">--<\/span><span class=\"ident\">skip<\/span><span class=\"punct\">-<\/span><span class=\"ident\">set<\/span><span class=\"punct\">-<\/span><span class=\"ident\">charset<\/span> <span class=\"ident\">myscrewedupdb<\/span> <span class=\"punct\">&gt;<\/span> <span class=\"ident\">dump<\/span><span class=\"punct\">.<\/span><span class=\"ident\">sql<\/span><\/pre>\n<\/blockquote>\n<p>This will give us a dump of the data in the correct format, albeit with the wrong encoding specified in the file.\u00a0 You&#8217;ll need to edit the file to change latin1 to utf8\u00a0\u00a0 eg<\/p>\n<blockquote>\n<pre><span class=\"ident\">sed<\/span> <span class=\"punct\">-<\/span><span class=\"ident\">i<\/span> <span class=\"punct\">\"<\/span><span class=\"punct\">\"<\/span> <span class=\"punct\">'<\/span><span class=\"string\">s\/latin1\/utf8\/g<\/span><span class=\"punct\">'<\/span> <span class=\"ident\">dump<\/span><span class=\"punct\">.<\/span><span class=\"ident\">sql<\/span><\/pre>\n<\/blockquote>\n<p>..and then re-import the data (I *strongly* suggest you do this to a NEW db for testing, as you <strong>DON&#8217;T<\/strong> want to lose data now, do you!)<\/p>\n<blockquote>\n<pre><span class=\"ident\">mysql<\/span> <span class=\"punct\">--<\/span><span class=\"ident\">user<\/span><span class=\"punct\">=<\/span><span class=\"ident\">username<\/span> <span class=\"punct\">--<\/span><span class=\"ident\">password<\/span><span class=\"punct\">=<\/span><span class=\"ident\">password<\/span> <span class=\"punct\">--<\/span><span class=\"ident\">default<\/span><span class=\"punct\">-<\/span><span class=\"ident\">character<\/span><span class=\"punct\">-<\/span><span class=\"ident\">set<\/span><span class=\"punct\">=<\/span><span class=\"ident\">utf8<\/span> \\\r\n<span class=\"ident\">newtestdbname<\/span> <span class=\"punct\">&lt;<\/span> <span class=\"ident\">dump<\/span><span class=\"punct\">.<\/span><span class=\"ident\">sql<\/span><\/pre>\n<\/blockquote>\n<p>With a bit of luck, your data should be intact, and you should now be basking in the glory of UTF-8&#8217;ness, ready for the 20th Century etc!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As other web developers in China (and other non-latin language locations) probably know, UTF-8 is something that we need for Chinese to work correctly, however despite asking for UTF8 in the table structure, it doesn&#8217;t mean that MySQL or PHP is going to accede to our wishes.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[24,4],"tags":[],"class_list":["post-73","post","type-post","status-publish","format-standard","hentry","category-mysql","category-useful-info"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/73","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/comments?post=73"}],"version-history":[{"count":3,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/73\/revisions"}],"predecessor-version":[{"id":108,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/posts\/73\/revisions\/108"}],"wp:attachment":[{"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/media?parent=73"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/categories?post=73"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.computersolutions.cn\/blog\/wp-json\/wp\/v2\/tags?post=73"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}