The World’s Leading Microsoft .NET Magazine
   
 
The .NET Addict's Blog

My Top Tags

                                                           

My RSS Feeds








I heart FeedBurner

Latest Diggs - Programming

Computers Blogs - Blog Top Sites

Site Hits

Total: 4,909,005
since: 19 Jan 2005

Using .NET to Parse Non-Arabic Numerals is Problematic

posted Thu 27 Dec 07

Globalization is a really, really big task for any application. You can't simply translate all of your text into a foreign language and call your application perfect. There are a lot of cultural things that you need to take into account. Some of those things are design details such as color changes. For example, a "good luck" color in one culture might be considered a "bad luck" culture in another and you don't want to slather your application with negative connotations before people even start using the software.

The other problem is that not everybody uses the Arabic (0..9) digits to represent numbers. Take, for example, Japanese. While it is quite common to see standard numbers, I've also been told that it is not uncommon to see the Kanji for those numbers as well. Here's a quick example:

521 - This is a pretty standard number, and this number will parse out as an integer by .NET when run through TryParse(). In fact, it'll parse out as a number in any culture (I tried 10 different ones and it worked in each).

五百二十一 - This is the Japanese Kanji representation of the number 521 (written out it would be ごひゃくにじゅういち). Given that .NET has such amazing globalization support already, I fully expected TryParse(), when supplied with a Japanese culture, to return the integer value 521. I was wrong.

Here's the sample code I wrote:

string culture = "ja-JP";
CultureInfo japan = new System.Globalization.CultureInfo(culture);
Thread.CurrentThread.CurrentUICulture = japan;
Thread.CurrentThread.CurrentCulture = japan;               
string input = "五百二十一"; // 521
int x = new int();
Console.Write(input + " is an integer : " +
                int.TryParse(input,
                System.Globalization.NumberStyles.Any,
                japan.NumberFormat, out x));
Console.WriteLine();
string input2 = "521"; // "baseline" test
int y = new int();
Console.Write(input2 + " is an integer : " +
    int.TryParse(input2,
    System.Globalization.NumberStyles.Any,
    japan.NumberFormat, out y));

I expected to see to outputs of True, but in fact I got an output of False for the Kanji and True for what us folks in the US like to consider "regular" numbers.

Is this a "bug" in the .NET globalization code or an oversight, or did they intentionally choose not to support Kanji and Chinese numeric representations? I would think it would be a terrible oversight if this were the case. If you have any experience with globalization and applications that deal with a lot of numbers supplied by foriegn IMEs and you've got a workaround for this, I'd love to see it.

One theory I have for why it doesn't work is that there are Kanji that aren't actual "digits", they're more like factors. For example, to write the Kanji for 521, you write 5 , then 100, then 2, then 10, then 1. So, your symbols will look like:

5 100 2 10 1

Which is "5 100s, 2 10s, and a 1", or 521. My guess is that this was too much logic (sarcasm) for Microsoft to throw into the number format parser, meaning I'll probably have to do it myself if I want to accept Chinese number symbols as numeric input.

tags:      

links: digg this    del.icio.us    technorati    reddit

AddThis Social Bookmark Button




1. Kevin Hoffman left...
Thu 27 Dec 07 1:00 pm

After talking to some people, it turns out that there are very few times when anyone actually types Kanji for numbers so, practically speaking, everyone pretty much thinks its acceptable if a piece of software only recognizes Arabic numerals and can't parse Kanji numbers. Still, it would've been cool if .NET had support for it automatically ;)


2. Tim Bedford left...
Tue 08 Jan 08 9:14 am

I just wanted to point out that this is true for dates too. Kanji has glyphs for day, month and year and so you would write a date: 2008 year 1 month 4 day. Although that may be a bad example since I've used a date in the Gregorian calendar. I think it is quite a wonderful system.

If you are writing some code to parse Kanji numbers it is not much extra work to support dates too.


Tag Related Posts

JLPT3の試験があります。

Tue 11 Nov 08 1:44 P GMT-05
tags:      

Smart, Deep Property Notifications in CLINQ v2.0

Tue 07 Oct 08 1:15 P GMT-05
tags:          

Microsoft's Lofty Direction

Sun 05 Oct 08 2:30 P GMT-05

JLPT3のしけんが申し込みました

Sat 02 Aug 08 3:40 P GMT-05

MobileMe vs. Live Mesh Throwdown - Round 1

Wed 16 Jul 08 10:33 A GMT-05

One Framework to Rule them All

Mon 25 Feb 08 6:49 P GMT-05

My Little Pony .NET Unleashed 2007

Fri 30 Mar 07 1:59 P GMT-05

Authorness

Thu 15 Mar 07 1:44 P GMT-05

Localizing a WPF Application

Tue 22 Aug 06 11:39 A GMT-05
tags:            

Is Windows Workflow Foundation Too Complex?

Fri 18 Aug 06 12:15 P GMT-05

Lambda Lambda Lambda

Sun 21 May 06 1:01 A GMT-05

The Adventures of LINQ (Not Zelda)

Fri 19 May 06 11:21 P GMT-05
tags: