The cCharTranslate class allows you to create an object or subclass that can handle the conversion of data from various character encodings (OEM, ANSI, UTF-8, UTF-16 and Base64 encoded) to various destination types (String, Variant String, memory Buffer).
This class has no storage and no properties and is primarily a helper class. You can use this as a single global object to handle an application's character conversion needs, embed this in other objects and classes or sub-class it to provide data storage.
Object oCharTranslate is a cCharTranslate End_Object : Function ReadIntoArray Returns String String DataArray String sVarUTF8 Integer iLines // assume this is reading from a UTF-8 file ReadLn Channel 1 sVarUTF8 While (Not(SeqEof)) // translate UTF-8 sring to OEM string and add to array Get UTF8ToStr of oCharTranslate (AddressOf(sVarUTF8)) CP_OEMCP to DataArray[iLines] Increment iLines ReadLn Channel 1 sVarUTF8 Loop Function_Return DataArray End_Function
This class supports conversions between the following encoding types:
UTF-8 - UTF-8 is a Unicode character format, where characters are stored in multi-byte format (a single character can be represented by 1 byte, 2 bytes or more). UTF-8 is a widely used format for transmitting data and is the encoding of choice for transmitting HTML and XML. It has the advantage that the basic character set (what used to be called the ASCII character set, 0-9, A-Z, a-z, etc.) is the same for UTF-8, OEM and ANSI. Beyond that, the encoding for UTF-8 is completely different. Note that a UTF-8 string's length does not tell you how many characters are being represented. In addition, single byte string functions (Length, Left, Mid, Character, etc.) cannot be used sensibly with UTF-8. UTF-8, like any Unicode encoding format, does not need code page information. All characters that can ever possibly exist have a unique encoding. UTF-8 strings will not contain embedded nulls. DataFlex internally works with UTF-8 characters.
OEM - This is single byte character encoding. The value of characters is determined by your system's OEM code page. Prior to DataFlex 20.0, most of DataFlex internally worked with OEM characters.
ANSI - This is single byte character encoding. The value of characters is determined by your system's ANSI code page. Sometimes OEM characters must be converted to ANSI to work with the Windows "A" interface. Internally, Windows controls use a wide Unicode interface, so ANSI characters are eventually converted to Unicode by Windows before processing.
UTF-16 - UTF-16 encodes all characters as 2 bytes. These are referred to as being wide characters (in Windows, this is the "W" interface). This two character encoding is not entirely accurate, but for our purposes we can think of UTF-16 as being the same as UCS-2, which is two bytes per character. Transmitting UTF-16 has the disadvantage that the size of a transmission will be longer than UTF-8 or any of the single character formats. When working internally, UTF-16 has the advantage that characters are fixed length, making it easier to process Unicode string functions (which DataFlex does not currently support). Internally, Windows processes most data in UTF-16. UTF-16, like any Unicode encoding format, does not need code page information. UTF-16 strings may contain embedded nulls.
Base64Encoding - Base64Encoding is a special encoding that is used to encode binary data in a transmission-friendly and storage-friendly format. Base64 strings contain no embedded nulls and they exist within the basic character set range so they are not changed by OEM, ANSI, UTF-8 character translations.
cCharTranslate allows you to convert data between all of these encoding formats. It also allows you to copy data across the following variable types.
cCharTranslate allows you to perform encoding translation on the following variable data-types: String, Variant String, Address (or memory buffer)
String - This is the normal DataFlex String type. It uses UTF-8 encoded strings. It supports a complete set of String functions. It is tightly integrated into our framework.
In certain circumstances, ANSI or OEM encoded data can also be represented by the DataFlex String data type. The primary limitation is that the collating sequence, which is used to determine sort order and greater than/less than ordering, is not accurate (the collating sequence for the embedded database is an OEM sequence).
In general, you want to use UTF-8 strings. When other formats are needed, you change perform conversions as needed.
Variant - This is the DataFlex Variant type when the variant value is a string (sometimes called a Variant BSTR). Internally, a Variant string will be stored as wide character Unicode (UTF-16). While the data is Unicode data, almost all access of this data includes converting the data to an OEM string. For example, "Move 'Data' to vVar" causes 'Data' to be converted to a UTF-16 before storing it as a Variant. A "Set Value to vVar", would cause the Unicode data in vVar to be converted back to OEM and stored in Value as such. Even "Move (Length(vVar)) to iLen" would cause the Unicode data to be converted to an internal OEM string before evaluating its length. Any string function will convert the Unicode data to an OEM string before performing the function.
Therefore, you are limited as to what you can do with a Variant string without first converting it to a string. There are a couple of methods that support direct Unicode access to a Variant variable. Those are the function VariantStringLength() and the XML methods pvNodeValue, pvXML and LoadXMLFromVariant. In addition, some of the methods in cCharTranslate provide direct access to Unicode data. While this direct access is limited, it has the advantage that there is no OEM to Unicode conversion and the DataFlex String Agument_Size limit is not applied.
Pointer (memory buffer) - Data can be stored in a memory address pointer. This data can be in any format and of any size. When working directly with memory you should use the Pointer data type, which provides a pointer to the memory heap area. You must create the heap memory using Alloc() or RecAlloc() and when done, you must release any memory using Free(). Manipulation of this data is done via pointers and the built-in memory functions. Working with Pointer variable data is the most flexible. It is also low level, tedious and error prone.
When working with binary data in memory, you need to keep track of the size of the memory area. For example, if you are encoding binary data to a base64 string, you are going to need to provide the address of the binary data and its length.
The cCharTranslate functions can split into three groups - Buffer Encoding Translations, Variant Encoding Translations and Base64 Encoding Translations. The class uses the naming convention of "Buffer" = memory address, "Str" = DataFlex string type, "VariantStr" = Variant string type. Below is a summary of these helper methods.
Utf16FromBuffer - Creates a UTF-16 string in a memory buffer from an OEM, ANSI or UTF-8 memory buffer.
Utf16ToBuffer - Converts a UTF-16 string in a memory buffer into a newly created OEM, ANSI or UTF-8 address.
Utf16FromStr - Creates a UTF-16 string in a memory buffer from an OEM, ANSI or UTF-8 String variable.
Utf16ToStr - Converts a UTF-16 string in a memory buffer into a string encoded as OEM, ANSI or UTF-8.
Utf8FromBuffer - Creates a UTF-8 string in a memory buffer from an OEM or ANSI memory buffer.
Utf8ToBuffer - Converts a UTF-8 string in a memory buffer into a newly created OEM or ANSI address.
Utf8FromStr - Creates a UTF-8 string in a memory buffer from an OEM or ANSI String variable.
Utf8ToStr - Converts a UTF-8 string in a memory buffer into a string encoded as OEM or ANSI.
VariantStrFromBuffer - Creates a Variant String from an OEM, ANSI or UTF-8 memory buffer.
VariantStrToBuffer - Converts a Variant String into a newly created OEM, ANSI or UTF-8 address.
VariantStrFromStr - Creates a Variant String from an OEM, ANSI or UTF-8 String variable.
VariantStrToStr - Converts a Variant String to a string encoded as OEM, ANSI or UTF-8.
VariantStrFromUTF16 - Creates a Variant String from a UTF-16 Unicode string in a memory buffer.
VariantStrToUTF16 - Creates a UTF-16 string in memory buffer from a Variant string.
Base64EncodeToStr - Creates a base 64 encoded String from a buffer.
Base64DecodeFromStr - Decodes a base 64 encoded string and places it in a newly created buffer.
Base64EncodeToVariantStr - Creates a base 64 encoded Variant String from a buffer.
Base64DecodeFromVariantStr - Decodes a base 64 encoded Variant String and places it in a newly created buffer.