Daily Archives: 2012-12-23

Emoji and MySQL

One of the great things about being a software developer is that I get to learn new and relatively useless information all the time.  Today, I learned about emoji or Japanese picture characters.  As far as I can tell, these characters allow users of many Japanese cell phones to text cute pictures like balloons and bowing businessmen to one another with ease and at minimal expense.

One neat fact about emoji is that it falls outside of the basic multilingual plane (BML), which means that it cannot be represented using three or fewer bytes in UTF-8 (a popular character encoding in modern software). Unfortunately, for historical reasons, MySQL’s utf8 character set can only represent characters within the BML.  That means that if you want Japanese cell phone users to be able to store balloon picture characters in your MySQL database, you can’t store that text as utf8 character data.  (Luckily, the folks on the MySQL team recognized this issue and added a new character set [utf8mb4] capable of storing emoji and other characters outside the BML in version 5.5.3.)

If you’re on a Japanese cell phone, here’s a balloon: