From 1ca6396a3dd284241de11bcaa210ad5836e8e5a8 Mon Sep 17 00:00:00 2001 From: Boris Kolpackov Date: Tue, 8 Dec 2009 16:18:01 +0200 Subject: Multiple object model character encodings support Also add support for ISO-8859-1. --- documentation/cxx/parser/guide/index.xhtml | 40 +++++++++++++++++++----------- documentation/cxx/tree/guide/index.xhtml | 25 +++++++++++++------ documentation/cxx/tree/manual/index.xhtml | 18 ++++++++++++-- 3 files changed, 59 insertions(+), 24 deletions(-) (limited to 'documentation/cxx') diff --git a/documentation/cxx/parser/guide/index.xhtml b/documentation/cxx/parser/guide/index.xhtml index 7379c96..9653e37 100644 --- a/documentation/cxx/parser/guide/index.xhtml +++ b/documentation/cxx/parser/guide/index.xhtml @@ -280,7 +280,7 @@ 5Mapping Configuration - + @@ -1615,8 +1615,8 @@ namespace http://www.example.com/xmlns/my following map files. The string-based XML Schema types are mapped to either std::string or std::wstring depending on the character type - selected (see Section 5.1, "Character Type" for - more information).

+ selected (see Section 5.1, "Character Type and + Encoding" for more information).

 namespace http://www.w3.org/2001/XMLSchema
@@ -1909,7 +1909,7 @@ age:    28
      Compiler Command Line Manual.
   

-

5.1 Character Type

+

5.1 Character Type and Encoding

The C++/Parser mapping has built-in support for two character types: char and wchar_t. You can select the @@ -1921,15 +1921,24 @@ age: 28

Another aspect of the mapping that depends on the character type is character encoding. For the char character type - the encoding is UTF-8. For the wchar_t character type - the encoding is automatically selected between UTF-16 and - UTF-32/UCS-4 depending on the size of the wchar_t type. - On some platforms (for example, Windows with Visual C++ and AIX with IBM XL - C++) wchar_t is 2 bytes long. For these platforms the + the default encoding is UTF-8. Other supported encodings are + ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as + custom encodings. You can select which encoding should be used + in the object model with the --char-encoding command + line option.

+ +

For the wchar_t character type the encoding is + automatically selected between UTF-16 and UTF-32/UCS-4 depending + on the size of the wchar_t type. On some platforms + (for example, Windows with Visual C++ and AIX with IBM XL C++) + wchar_t is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms wchar_t is 4 bytes - long and UTF-32/UCS-4 is used. -

+ long and UTF-32/UCS-4 is used.

+

Note also that the character encoding that is used in the object model + is independent of the encodings used in input and output XML. In fact, + all three (object mode, input XML, and output XML) can have different + encodings.

5.2 Underlying XML Parser

@@ -3306,7 +3315,7 @@ namespace xml_schema document type has the following interface. Note that if the character type is wchar_t, then the string type in the interface becomes std::wstring - (see Section 5.1, "Character Type").

+ (see Section 5.1, "Character Type and Encoding").

 namespace xml_schema
@@ -3601,7 +3610,7 @@ namespace xml_schema
      document type has the following interface. Note that
      if the character type is wchar_t, then the string type
      in the interface becomes std::wstring
-     (see Section 5.1, "Character Type").

+ (see Section 5.1, "Character Type and Encoding").

 namespace xml_schema
@@ -3886,7 +3895,8 @@ main (int argc, char* argv[])
      character type is wchar_t, then the string type
      and output stream type in the definition become
      std::wstring and std::wostream,
-     respectively (see Section 5.1, "Character Type").

+ respectively (see Section 5.1, "Character Type + and Encoding").

 namespace xml_schema
@@ -3998,7 +4008,7 @@ main (int argc, char* argv[])
      listing presents the definition of the error_handler
      interface. Note that if the character type is wchar_t,
      then the string type in the interface becomes std::wstring
-     (see Section 5.1, "Character Type").

+ (see Section 5.1, "Character Type and Encoding").

 namespace xml_schema
diff --git a/documentation/cxx/tree/guide/index.xhtml b/documentation/cxx/tree/guide/index.xhtml
index 787610a..f96b09b 100644
--- a/documentation/cxx/tree/guide/index.xhtml
+++ b/documentation/cxx/tree/guide/index.xhtml
@@ -226,7 +226,7 @@
     
5.1Character Type
5.1Character Type and Encoding
5.2Underlying XML Parser
5.3XML Schema Validation
5.4Support for Polymorphism
3Overall Mapping Configuration - + @@ -1148,7 +1148,7 @@ $ doxygen hello.doxygen Compiler Command Line Manual.

-

3.1 Character Type

+

3.1 Character Type and Encoding

The C++/Tree mapping has built-in support for two character types: char and wchar_t. You can select the @@ -1160,14 +1160,25 @@ $ doxygen hello.doxygen

Another aspect of the mapping that depends on the character type is character encoding. For the char character type - the encoding is UTF-8. For the wchar_t character type - the encoding is automatically selected between UTF-16 and - UTF-32/UCS-4 depending on the size of the wchar_t type. - On some platforms (for example, Windows with Visual C++ and AIX with IBM XL - C++) wchar_t is 2 bytes long. For these platforms the + the default encoding is UTF-8. Other supported encodings are + ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as + custom encodings. You can select which encoding should be used + in the object model with the --char-encoding command + line option.

+ +

For the wchar_t character type the encoding is + automatically selected between UTF-16 and UTF-32/UCS-4 depending + on the size of the wchar_t type. On some platforms + (for example, Windows with Visual C++ and AIX with IBM XL C++) + wchar_t is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms wchar_t is 4 bytes long and UTF-32/UCS-4 is used.

+

Note also that the character encoding that is used in the object model + is independent of the encodings used in input and output XML. In fact, + all three (object mode, input XML, and output XML) can have different + encodings.

+

3.2 Support for Polymorphism

By default XSD generates non-polymorphic code. If your vocabulary diff --git a/documentation/cxx/tree/manual/index.xhtml b/documentation/cxx/tree/manual/index.xhtml index d468fe3..91c6154 100644 --- a/documentation/cxx/tree/manual/index.xhtml +++ b/documentation/cxx/tree/manual/index.xhtml @@ -226,7 +226,7 @@

3.1Character Type
3.1Character Type and Encoding
3.2Support for Polymorphism
3.3Namespace Mapping
3.4Thread Safety
2.1Preliminary Information - +
2.1.1Identifiers
2.1.2Character Type
2.1.2Character Type and Encoding
2.1.3XML Schema Namespace
2.1.4Anonymous Types
@@ -567,7 +567,7 @@ CONVENTION section in the XSD Compiler Command Line Manual.

-

2.1.2 Character Type

+

2.1.2 Character Type and Encoding

The code that implements the mapping, depending on the --char-type option, is generated using either @@ -577,6 +577,20 @@ your schemas, for example std::basic_string<C>.

+

Another aspect of the mapping that depends on the character type + is character encoding. For the char character type + the default encoding is UTF-8. Other supported encodings are + ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as + custom encodings and can be selected with the + --char-encoding command line option.

+ +

For the wchar_t character type the encoding is + automatically selected between UTF-16 and UTF-32/UCS-4 depending + on the size of the wchar_t type. On some platforms + (for example, Windows with Visual C++ and AIX with IBM XL C++) + wchar_t is 2 bytes long. For these platforms the + encoding is UTF-16. On other platforms wchar_t is 4 bytes + long and UTF-32/UCS-4 is used.

2.1.3 XML Schema Namespace

-- cgit v1.1