Jan 4, 2007

SMS Character Set & Sending special charcters

Like they say necessity is the mother of invention, today, i was forced to understand the GSM SMS standards. I was trying to send a range of special characters in my message. After hours of grueling work and debugging, I realized only a handful of special characters are allowed.

So, I dug up the standard for SMS - GSM 03.38. This corresponds to an ISO character set called ISO 8859-1, which is extremely similar to Microsoft's Windows-1252 character set.

Sending a message through Kannel

I was using Kannel, the open source SMS Gateway software to send out the SMSs. Now, Kannel accepts all messages posted over HTTP only in the Windows-1252 encoding.

So if you're using ASP.NET, you must URL encode your text using the Windows-1252 encoding before making the HTTP request to Kannel. Otherwise, the message received on the device on the other end will look like gibberish.

Receiving a message through Kannel (Kannel Post)

When Kannel receives a message, It tries to see if the character encoding matches ISO 8859-1. If it decoding the message fails using the 8-bit character set, it tries 16-bit Unicode Big Endian (UTF-16BE).

If it is configured to post the message to a designated URL, it will first URL Encode the received text using using the determined formatting, and then supply the character set in the URL as a query string parameter.

If you want to receive your messages in ISO 8859-1, it is important to stick to the characters defined in the set. Failing to do so will call your Post URL with Unicode encoded text.

Resources / References:

No comments:

Post a Comment