Click here to view and discuss this page in DocCommentXchange. In the future, you will be sent there automatically.

SQL Anywhere 11.0.0 » SQL Anywhere Server - Database Administration » Configuring Your Database » International languages and character sets » Understanding character sets

 

Multibyte character sets

Some languages, such as Japanese and Chinese, have many more than 256 characters. These characters cannot all be represented using a single byte, and therefore must be encoded using a multibyte encoding. In addition, some character sets use the much larger number of characters available in a multibyte representation to represent characters from many languages in a single, more comprehensive, character set. An example of this is UTF-8.

Multibyte character sets may be of variable width whereby some characters are single-byte characters; others are double-byte, and so on.

For more information about multibyte character sets and collations, see SQL Anywhere Collation Algorithm (SACA).

Example

As an example, characters in code page 932 (Japanese) are either one or two bytes in length. If the value of the first byte, also called the lead byte, is in the range of hexadecimal values from \x81 to \x9F or from \xE0 to \xFC (decimal values 129-159 or 224-252), the character is a two-byte character and the subsequent byte, also called a follow byte, completes the character. A follow byte is any byte(s) other than the first byte.

If the first byte is outside the lead byte range, the character is a single-byte character and the next byte is the first byte of the following character.