When a character cannot be represented in the character set into which it is being converted, a substitution character set is used instead. Conversions of this type are considered lossy; that is, the original character is lost if it cannot be represented in the destination character set.
Also, not only may different character sets may have different substitution characters, but the substitution character for one character set may be a non-substitution character in another character set. This is important to understand when multiple conversions are performed on a character because the final character may not appear as the expected substitution character of the destination character set.
For example, suppose that the client character set is Windows-1252, and the database character set is ISO_8859-1:1987, the U.S. default for some versions of Unix. Then, suppose a non-Unicode client application (for example, embedded SQL) attempts to insert the euro symbol into a CHAR, VARCHAR, or LONG VARCHAR column. Since the character does not exist in the CHAR character set, the substitution character for ISO_8859-1:1987, 0x1A, is inserted.
Now, if this same ISO_8859-1:1987 substitution character is then fetched to a UTF-16 value (for example, by doing a
SELECT * FROM t into a SQL_C_WCHAR bound column in ODBC), this character becomes the UTF-16 character 0x001A. However, this is not the substitution character, defined for UTF-16, for a euro. This example illustrates that even if your data contains substitution characters, those characters, due to multiple conversions, may be inconsistent with the characters defined for the target character set.
Therefore, it is important to understand and test how substitution characters may be used when converting between multiple character sets.
The on_charset_conversion_failure option determines the behavior during conversion when a character cannot be represented in the destination character set. See on_charset_conversion_failure option [database].