diff options
Diffstat (limited to 'documentation')
-rw-r--r-- | documentation/custom-literals.xsd | 49 | ||||
-rw-r--r-- | documentation/cxx/parser/guide/index.xhtml | 40 | ||||
-rw-r--r-- | documentation/cxx/tree/guide/index.xhtml | 25 | ||||
-rw-r--r-- | documentation/cxx/tree/manual/index.xhtml | 18 | ||||
-rw-r--r-- | documentation/makefile | 2 | ||||
-rw-r--r-- | documentation/xsd.1 | 39 | ||||
-rw-r--r-- | documentation/xsd.xhtml | 27 |
7 files changed, 176 insertions, 24 deletions
diff --git a/documentation/custom-literals.xsd b/documentation/custom-literals.xsd new file mode 100644 index 0000000..ab2d649 --- /dev/null +++ b/documentation/custom-literals.xsd @@ -0,0 +1,49 @@ +<?xml version="1.0"?> + +<!-- + +file : documentation/custom-literals.xsd +author : Boris Kolpackov <boris@codesynthesis.com> +copyright : not copyrighted - public domain + +This schema describes the XML format used to provide the custom string +to C++ string literal mapping with the -custom-literals XSD compiler +command line option. Here is a sample instance: + +<string-literal-map> + <entry> + <string>hello</string> + <literal>"hello"</literal> + </entry> + <entry> + <string>greeting</string> + <literal>"greeting"</literal> + </entry> +</string-literal-map> + +--> + +<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> + + <xsd:simpleType name="literal_t"> + <xsd:restriction base="xsd:string"> + <xsd:pattern value='".+"'/> + </xsd:restriction> + </xsd:simpleType> + + <xsd:complexType name="entry_t"> + <xsd:sequence> + <xsd:element name="string" type="xsd:string"/> + <xsd:element name="literal" type="literal_t"/> + </xsd:sequence> + </xsd:complexType> + + <xsd:complexType name="string_literal_map_t"> + <xsd:sequence> + <xsd:element name="entry" type="entry_t" maxOccurs="unbounded"/> + </xsd:sequence> + </xsd:complexType> + + <xsd:element name="string-literal-map" type="string_literal_map_t"/> + +</xsd:schema> diff --git a/documentation/cxx/parser/guide/index.xhtml b/documentation/cxx/parser/guide/index.xhtml index 7379c96..9653e37 100644 --- a/documentation/cxx/parser/guide/index.xhtml +++ b/documentation/cxx/parser/guide/index.xhtml @@ -280,7 +280,7 @@ <tr> <th>5</th><td><a href="#5">Mapping Configuration</a> <table class="toc"> - <tr><th>5.1</th><td><a href="#5.1">Character Type</a></td></tr> + <tr><th>5.1</th><td><a href="#5.1">Character Type and Encoding</a></td></tr> <tr><th>5.2</th><td><a href="#5.2">Underlying XML Parser</a></td></tr> <tr><th>5.3</th><td><a href="#5.3">XML Schema Validation</a></td></tr> <tr><th>5.4</th><td><a href="#5.4">Support for Polymorphism</a></td></tr> @@ -1615,8 +1615,8 @@ namespace http://www.example.com/xmlns/my following map files. The string-based XML Schema types are mapped to either <code>std::string</code> or <code>std::wstring</code> depending on the character type - selected (see <a href="#5.1"> Section 5.1, "Character Type"</a> for - more information).</p> + selected (see <a href="#5.1"> Section 5.1, "Character Type and + Encoding"</a> for more information).</p> <pre class="type-map"> namespace http://www.w3.org/2001/XMLSchema @@ -1909,7 +1909,7 @@ age: 28 Compiler Command Line Manual</a>. </p> - <h2><a name="5.1">5.1 Character Type</a></h2> + <h2><a name="5.1">5.1 Character Type and Encoding</a></h2> <p>The C++/Parser mapping has built-in support for two character types: <code>char</code> and <code>wchar_t</code>. You can select the @@ -1921,15 +1921,24 @@ age: 28 <p>Another aspect of the mapping that depends on the character type is character encoding. For the <code>char</code> character type - the encoding is UTF-8. For the <code>wchar_t</code> character type - the encoding is automatically selected between UTF-16 and - UTF-32/UCS-4 depending on the size of the <code>wchar_t</code> type. - On some platforms (for example, Windows with Visual C++ and AIX with IBM XL - C++) <code>wchar_t</code> is 2 bytes long. For these platforms the + the default encoding is UTF-8. Other supported encodings are + ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as + custom encodings. You can select which encoding should be used + in the object model with the <code>--char-encoding</code> command + line option.</p> + + <p>For the <code>wchar_t</code> character type the encoding is + automatically selected between UTF-16 and UTF-32/UCS-4 depending + on the size of the <code>wchar_t</code> type. On some platforms + (for example, Windows with Visual C++ and AIX with IBM XL C++) + <code>wchar_t</code> is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms <code>wchar_t</code> is 4 bytes - long and UTF-32/UCS-4 is used. - </p> + long and UTF-32/UCS-4 is used.</p> + <p>Note also that the character encoding that is used in the object model + is independent of the encodings used in input and output XML. In fact, + all three (object mode, input XML, and output XML) can have different + encodings.</p> <h2><a name="5.2">5.2 Underlying XML Parser</a></h2> @@ -3306,7 +3315,7 @@ namespace xml_schema <code>document</code> type has the following interface. Note that if the character type is <code>wchar_t</code>, then the string type in the interface becomes <code>std::wstring</code> - (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p> + (see <a href="#5.1">Section 5.1, "Character Type and Encoding"</a>).</p> <pre class="c++"> namespace xml_schema @@ -3601,7 +3610,7 @@ namespace xml_schema <code>document</code> type has the following interface. Note that if the character type is <code>wchar_t</code>, then the string type in the interface becomes <code>std::wstring</code> - (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p> + (see <a href="#5.1">Section 5.1, "Character Type and Encoding"</a>).</p> <pre class="c++"> namespace xml_schema @@ -3886,7 +3895,8 @@ main (int argc, char* argv[]) character type is <code>wchar_t</code>, then the string type and output stream type in the definition become <code>std::wstring</code> and <code>std::wostream</code>, - respectively (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p> + respectively (see <a href="#5.1">Section 5.1, "Character Type + and Encoding"</a>).</p> <pre class="c++"> namespace xml_schema @@ -3998,7 +4008,7 @@ main (int argc, char* argv[]) listing presents the definition of the <code>error_handler</code> interface. Note that if the character type is <code>wchar_t</code>, then the string type in the interface becomes <code>std::wstring</code> - (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p> + (see <a href="#5.1">Section 5.1, "Character Type and Encoding"</a>).</p> <pre class="c++"> namespace xml_schema diff --git a/documentation/cxx/tree/guide/index.xhtml b/documentation/cxx/tree/guide/index.xhtml index 787610a..f96b09b 100644 --- a/documentation/cxx/tree/guide/index.xhtml +++ b/documentation/cxx/tree/guide/index.xhtml @@ -226,7 +226,7 @@ <tr> <th>3</th><td><a href="#3">Overall Mapping Configuration</a> <table class="toc"> - <tr><th>3.1</th><td><a href="#3.1">Character Type</a></td></tr> + <tr><th>3.1</th><td><a href="#3.1">Character Type and Encoding</a></td></tr> <tr><th>3.2</th><td><a href="#3.2">Support for Polymorphism </a></td></tr> <tr><th>3.3</th><td><a href="#3.3">Namespace Mapping</a></td></tr> <tr><th>3.4</th><td><a href="#3.4">Thread Safety</a></td></tr> @@ -1148,7 +1148,7 @@ $ doxygen hello.doxygen Compiler Command Line Manual</a>. </p> - <h2><a name="3.1">3.1 Character Type</a></h2> + <h2><a name="3.1">3.1 Character Type and Encoding</a></h2> <p>The C++/Tree mapping has built-in support for two character types: <code>char</code> and <code>wchar_t</code>. You can select the @@ -1160,14 +1160,25 @@ $ doxygen hello.doxygen <p>Another aspect of the mapping that depends on the character type is character encoding. For the <code>char</code> character type - the encoding is UTF-8. For the <code>wchar_t</code> character type - the encoding is automatically selected between UTF-16 and - UTF-32/UCS-4 depending on the size of the <code>wchar_t</code> type. - On some platforms (for example, Windows with Visual C++ and AIX with IBM XL - C++) <code>wchar_t</code> is 2 bytes long. For these platforms the + the default encoding is UTF-8. Other supported encodings are + ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as + custom encodings. You can select which encoding should be used + in the object model with the <code>--char-encoding</code> command + line option.</p> + + <p>For the <code>wchar_t</code> character type the encoding is + automatically selected between UTF-16 and UTF-32/UCS-4 depending + on the size of the <code>wchar_t</code> type. On some platforms + (for example, Windows with Visual C++ and AIX with IBM XL C++) + <code>wchar_t</code> is 2 bytes long. For these platforms the encoding is UTF-16. On other platforms <code>wchar_t</code> is 4 bytes long and UTF-32/UCS-4 is used.</p> + <p>Note also that the character encoding that is used in the object model + is independent of the encodings used in input and output XML. In fact, + all three (object mode, input XML, and output XML) can have different + encodings.</p> + <h2><a name="3.2">3.2 Support for Polymorphism</a></h2> <p>By default XSD generates non-polymorphic code. If your vocabulary diff --git a/documentation/cxx/tree/manual/index.xhtml b/documentation/cxx/tree/manual/index.xhtml index d468fe3..91c6154 100644 --- a/documentation/cxx/tree/manual/index.xhtml +++ b/documentation/cxx/tree/manual/index.xhtml @@ -226,7 +226,7 @@ <th>2.1</th><td><a href="#2.1">Preliminary Information</a> <table class="toc"> <tr><th>2.1.1</th><td><a href="#2.1.1">Identifiers</a></td></tr> - <tr><th>2.1.2</th><td><a href="#2.1.2">Character Type</a></td></tr> + <tr><th>2.1.2</th><td><a href="#2.1.2">Character Type and Encoding</a></td></tr> <tr><th>2.1.3</th><td><a href="#2.1.3">XML Schema Namespace</a></td></tr> <tr><th>2.1.4</th><td><a href="#2.1.4">Anonymous Types</a></td></tr> </table> @@ -567,7 +567,7 @@ CONVENTION section in the <a href="http://www.codesynthesis.com/projects/xsd/documentation/xsd.xhtml">XSD Compiler Command Line Manual</a>.</p> - <h3><a name="2.1.2">2.1.2 Character Type</a></h3> + <h3><a name="2.1.2">2.1.2 Character Type and Encoding</a></h3> <p>The code that implements the mapping, depending on the <code>--char-type</code> option, is generated using either @@ -577,6 +577,20 @@ your schemas, for example <code>std::basic_string<C></code>. </p> + <p>Another aspect of the mapping that depends on the character type + is character encoding. For the <code>char</code> character type + the default encoding is UTF-8. Other supported encodings are + ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as + custom encodings and can be selected with the + <code>--char-encoding</code> command line option.</p> + + <p>For the <code>wchar_t</code> character type the encoding is + automatically selected between UTF-16 and UTF-32/UCS-4 depending + on the size of the <code>wchar_t</code> type. On some platforms + (for example, Windows with Visual C++ and AIX with IBM XL C++) + <code>wchar_t</code> is 2 bytes long. For these platforms the + encoding is UTF-16. On other platforms <code>wchar_t</code> is 4 bytes + long and UTF-32/UCS-4 is used.</p> <h3><a name="2.1.3">2.1.3 XML Schema Namespace</a></h3> diff --git a/documentation/makefile b/documentation/makefile index 0638928..81a26fe 100644 --- a/documentation/makefile +++ b/documentation/makefile @@ -20,6 +20,7 @@ $(install): $(out_base)/cxx/.install $(call install-data,$(src_base)/future.xhtml,$(install_doc_dir)/xsd/future.xhtml) $(call install-data,$(src_base)/schema-authoring-guide.xhtml,$(install_doc_dir)/xsd/schema-authoring-guide.xhtml) $(call install-data,$(src_base)/xsd.xhtml,$(install_doc_dir)/xsd/xsd.xhtml) + $(call install-data,$(src_base)/custom-literals.xsd,$(install_doc_dir)/xsd/custom-literals.xsd) $(call install-data,$(src_base)/xsd.1,$(install_man_dir)/man1/xsd.1) # Dist. @@ -32,6 +33,7 @@ $(dist-common): $(call install-data,$(src_base)/xsd.1,$(dist_prefix)/documentation/xsd.1) $(call install-data,$(src_base)/future.xhtml,$(dist_prefix)/documentation/future.xhtml) $(call install-data,$(src_base)/schema-authoring-guide.xhtml,$(dist_prefix)/documentation/schema-authoring-guide.xhtml) + $(call install-data,$(src_base)/custom-literals.xsd,$(dist_prefix)/documentation/custom-literals.xsd) $(dist): $(dist-common) $(out_base)/cxx/.dist $(dist-win): $(dist-common) $(out_base)/cxx/.dist-win diff --git a/documentation/xsd.1 b/documentation/xsd.1 index b84586d..1038d50 100644 --- a/documentation/xsd.1 +++ b/documentation/xsd.1 @@ -127,6 +127,34 @@ Valid values are and .BR wchar_t . . +.IP "\fB\--char-encoding \fIenc\fR" +Specify the character encoding that should be used in the object model. +Valid values for the +.B char +character type are +.B utf8 +(default), +.BR iso8859-1 , lcp +(Xerces-C++ local code page), +and +.BR custom . +If you pass +.B custom +as the value then you will need to include the transcoder implementation +header for your encoding at the beginning of the generated header files +(see the +.B --hxx-prologue +option). + +For the +.B wchar_t +character type the only valid value is +.B auto +and the encoding is automatically selected between UTF-16 and UTF-32/UCS-4, +depending on the +.B wchar_t +type size. +. .IP "\fB\--output-dir \fIdir\fR" Write generated files to .I dir @@ -450,6 +478,17 @@ in places where DLL export/import control statements ( .BR __declspec(dllexport/dllimport) ) are necessary. +.IP "\fB\--custom-literals \fIfile\fR" +Load custom XML string to C++ literal mappings from +.IR file . +This mechanism can be useful if you are using a custom character encoding +and some of the strings in your schemas, for example element/attribute +names or enumeration values, contain non-ASCII characters. In this case +you will need to provide a custom mapping to C++ literals for such +strings. The format of this file is specified in the +.B custom-literals.xsd +XML Schema file that can be found in the documentation directory. + .IP "\fB\--export-xml-schema\fR" Export/import types in the XML Schema namespace using the export symbol provided with the diff --git a/documentation/xsd.xhtml b/documentation/xsd.xhtml index 49d6503..da2b52c 100644 --- a/documentation/xsd.xhtml +++ b/documentation/xsd.xhtml @@ -125,6 +125,21 @@ instead of the default <code><b>char</b></code>. Valid values are <code><b>char</b></code> and <code><b>wchar_t</b></code>.</dd> + <dt><code><b>--char-encoding</b> <i>enc</i></code></dt> + <dd>Specify the character encoding that should be used in the object + model. Valid values for the <code><b>char</b></code> character type + are <code><b>utf8</b></code> (default), <code><b>iso8859-1</b></code>, + <code><b>lcp</b></code> (Xerces-C++ local code page), and + <code><b>custom</b></code>. If you pass <code><b>custom</b></code> as + the value then you will need to include the transcoder implementation + header for your encoding at the beginning of the generated header + files (see the <code><b>--hxx-prologue</b></code> option). + + <p>For the <code><b>wchar_t</b></code> character type the only valid + value is <code><b>auto</b></code> and the encoding is automatically + selected between UTF-16 and UTF-32/UCS-4, depending on the + <code><b>wchar_t</b></code> type size.</p></dd> + <dt><code><b>--output-dir</b> <i>dir</i></code></dt> <dd>Write generated files to <code><i>dir</i></code> instead of the current directory.</dd> @@ -393,6 +408,18 @@ generated file for which there is no file-specific epilogue file. </dd> + <dt><code><b>--custom-literals</b> <i>file</i></code></dt> + <dd>Load custom XML string to C++ literal mappings from + <code><i>file</i></code>. This mechanism can be useful if you + are using a custom character encoding and some of the strings + in your schemas, for example element/attribute names or enumeration + values, contain non-ASCII characters. In this case you will need + to provide a custom mapping to C++ literals for such + strings. The format of this file is specified in the + <code><b>custom-literals.xsd</b></code> XML Schema file that + can be found in the documentation directory. + </dd> + <dt><code><b>--export-symbol</b> <i>symbol</i></code></dt> <dd>Insert <code><i>symbol</i></code> in places where DLL export/import control statements |