Many existing legacy systems in Japan were built with Shift-JIS characters in mind. The problem is that most web pages these days work in UTF-8 and not all UTF-8 characters can be mapped properly to Shift-JIS. This leads to the problem of character garbling where the user sends a common last name such as “髙橋” in a user entry form, and the database inserts a question mark “?橋”. This happens because the character “髙” does not convert properly from UTF-8 to Shift-JIS. In the case where your back-end system is using a Shift-JIS database or you are creating CSV files that will be opened using Excel, you will likely run into this problem.
To demonstrate this problem we can use a simple example of creating a CSV file in a web browser. If you download the CSV file generated by the below JS Fiddle you should see character garbling when try to you open the CSV in Excel.
The issue is that Excel opens everything in Shift-JIS by default. If you open the CSV using UTF-8 it should open up properly, but its a convoluted number of steps. Please check out this link. More than likely your users will see the following garbled CSV file.
If you do manage to open the file in UTF-8 into Excel and then save the UTF-8 file as Shift-JIS you will then run into another character garbling issue where the UTF-8 character “髙” cannot be converted properly to Shift-JIS. As you see below when we try to open the converted SJIS file the “髙” character is being replaced with “_”.
You can find a history of the japanese character set issues in wikipedia for those that are interested.
Validation on Entry of Japanese Characters
The simplest way I’ve found to restrict the entry of characters to only allow entry of valid Shift-JIS characters is just to check each character in the input field and throw an error on the entry of an invalid character.
You should check what type of characters your legacy system will accept. If you have just built your database it is probably already set to UTF-8 and you won’t run into character garbling issues. However, if you find out that your database only accepts Shift-JIS or CP932 you will then need to ascertain which Shift-JIS level1-4 characters are acceptable. If you are working with older mainframe systems, the characters need to be restricted to Shift-JIS level1-2.
I’ve posted a sample on JSFiddle. You can also take this code and convert to either JAVA or any other coding language that you require. The below sample blocks entry of all characters except for Shift-JIS level1-2 and thus should not run into any character conversion issues.