summaryrefslogtreecommitdiff
path: root/documentation
diff options
context:
space:
mode:
authorBoris Kolpackov <boris@codesynthesis.com>2009-12-08 16:18:01 +0200
committerBoris Kolpackov <boris@codesynthesis.com>2009-12-08 16:18:01 +0200
commit1ca6396a3dd284241de11bcaa210ad5836e8e5a8 (patch)
tree465c19f0d668a91bb556d748911847acfb80cb09 /documentation
parentd71611d5fb575078bdf573c35257bb86bb7054e0 (diff)
Multiple object model character encodings support
Also add support for ISO-8859-1.
Diffstat (limited to 'documentation')
-rw-r--r--documentation/custom-literals.xsd49
-rw-r--r--documentation/cxx/parser/guide/index.xhtml40
-rw-r--r--documentation/cxx/tree/guide/index.xhtml25
-rw-r--r--documentation/cxx/tree/manual/index.xhtml18
-rw-r--r--documentation/makefile2
-rw-r--r--documentation/xsd.139
-rw-r--r--documentation/xsd.xhtml27
7 files changed, 176 insertions, 24 deletions
diff --git a/documentation/custom-literals.xsd b/documentation/custom-literals.xsd
new file mode 100644
index 0000000..ab2d649
--- /dev/null
+++ b/documentation/custom-literals.xsd
@@ -0,0 +1,49 @@
+<?xml version="1.0"?>
+
+<!--
+
+file : documentation/custom-literals.xsd
+author : Boris Kolpackov <boris@codesynthesis.com>
+copyright : not copyrighted - public domain
+
+This schema describes the XML format used to provide the custom string
+to C++ string literal mapping with the -custom-literals XSD compiler
+command line option. Here is a sample instance:
+
+<string-literal-map>
+ <entry>
+ <string>hello</string>
+ <literal>"hello"</literal>
+ </entry>
+ <entry>
+ <string>greeting</string>
+ <literal>"greeting"</literal>
+ </entry>
+</string-literal-map>
+
+-->
+
+<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
+
+ <xsd:simpleType name="literal_t">
+ <xsd:restriction base="xsd:string">
+ <xsd:pattern value='".+"'/>
+ </xsd:restriction>
+ </xsd:simpleType>
+
+ <xsd:complexType name="entry_t">
+ <xsd:sequence>
+ <xsd:element name="string" type="xsd:string"/>
+ <xsd:element name="literal" type="literal_t"/>
+ </xsd:sequence>
+ </xsd:complexType>
+
+ <xsd:complexType name="string_literal_map_t">
+ <xsd:sequence>
+ <xsd:element name="entry" type="entry_t" maxOccurs="unbounded"/>
+ </xsd:sequence>
+ </xsd:complexType>
+
+ <xsd:element name="string-literal-map" type="string_literal_map_t"/>
+
+</xsd:schema>
diff --git a/documentation/cxx/parser/guide/index.xhtml b/documentation/cxx/parser/guide/index.xhtml
index 7379c96..9653e37 100644
--- a/documentation/cxx/parser/guide/index.xhtml
+++ b/documentation/cxx/parser/guide/index.xhtml
@@ -280,7 +280,7 @@
<tr>
<th>5</th><td><a href="#5">Mapping Configuration</a>
<table class="toc">
- <tr><th>5.1</th><td><a href="#5.1">Character Type</a></td></tr>
+ <tr><th>5.1</th><td><a href="#5.1">Character Type and Encoding</a></td></tr>
<tr><th>5.2</th><td><a href="#5.2">Underlying XML Parser</a></td></tr>
<tr><th>5.3</th><td><a href="#5.3">XML Schema Validation</a></td></tr>
<tr><th>5.4</th><td><a href="#5.4">Support for Polymorphism</a></td></tr>
@@ -1615,8 +1615,8 @@ namespace http://www.example.com/xmlns/my
following map files. The string-based XML Schema types are
mapped to either <code>std::string</code> or
<code>std::wstring</code> depending on the character type
- selected (see <a href="#5.1"> Section 5.1, "Character Type"</a> for
- more information).</p>
+ selected (see <a href="#5.1"> Section 5.1, "Character Type and
+ Encoding"</a> for more information).</p>
<pre class="type-map">
namespace http://www.w3.org/2001/XMLSchema
@@ -1909,7 +1909,7 @@ age: 28
Compiler Command Line Manual</a>.
</p>
- <h2><a name="5.1">5.1 Character Type</a></h2>
+ <h2><a name="5.1">5.1 Character Type and Encoding</a></h2>
<p>The C++/Parser mapping has built-in support for two character types:
<code>char</code> and <code>wchar_t</code>. You can select the
@@ -1921,15 +1921,24 @@ age: 28
<p>Another aspect of the mapping that depends on the character type
is character encoding. For the <code>char</code> character type
- the encoding is UTF-8. For the <code>wchar_t</code> character type
- the encoding is automatically selected between UTF-16 and
- UTF-32/UCS-4 depending on the size of the <code>wchar_t</code> type.
- On some platforms (for example, Windows with Visual C++ and AIX with IBM XL
- C++) <code>wchar_t</code> is 2 bytes long. For these platforms the
+ the default encoding is UTF-8. Other supported encodings are
+ ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as
+ custom encodings. You can select which encoding should be used
+ in the object model with the <code>--char-encoding</code> command
+ line option.</p>
+
+ <p>For the <code>wchar_t</code> character type the encoding is
+ automatically selected between UTF-16 and UTF-32/UCS-4 depending
+ on the size of the <code>wchar_t</code> type. On some platforms
+ (for example, Windows with Visual C++ and AIX with IBM XL C++)
+ <code>wchar_t</code> is 2 bytes long. For these platforms the
encoding is UTF-16. On other platforms <code>wchar_t</code> is 4 bytes
- long and UTF-32/UCS-4 is used.
- </p>
+ long and UTF-32/UCS-4 is used.</p>
+ <p>Note also that the character encoding that is used in the object model
+ is independent of the encodings used in input and output XML. In fact,
+ all three (object mode, input XML, and output XML) can have different
+ encodings.</p>
<h2><a name="5.2">5.2 Underlying XML Parser</a></h2>
@@ -3306,7 +3315,7 @@ namespace xml_schema
<code>document</code> type has the following interface. Note that
if the character type is <code>wchar_t</code>, then the string type
in the interface becomes <code>std::wstring</code>
- (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p>
+ (see <a href="#5.1">Section 5.1, "Character Type and Encoding"</a>).</p>
<pre class="c++">
namespace xml_schema
@@ -3601,7 +3610,7 @@ namespace xml_schema
<code>document</code> type has the following interface. Note that
if the character type is <code>wchar_t</code>, then the string type
in the interface becomes <code>std::wstring</code>
- (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p>
+ (see <a href="#5.1">Section 5.1, "Character Type and Encoding"</a>).</p>
<pre class="c++">
namespace xml_schema
@@ -3886,7 +3895,8 @@ main (int argc, char* argv[])
character type is <code>wchar_t</code>, then the string type
and output stream type in the definition become
<code>std::wstring</code> and <code>std::wostream</code>,
- respectively (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p>
+ respectively (see <a href="#5.1">Section 5.1, "Character Type
+ and Encoding"</a>).</p>
<pre class="c++">
namespace xml_schema
@@ -3998,7 +4008,7 @@ main (int argc, char* argv[])
listing presents the definition of the <code>error_handler</code>
interface. Note that if the character type is <code>wchar_t</code>,
then the string type in the interface becomes <code>std::wstring</code>
- (see <a href="#5.1">Section 5.1, "Character Type"</a>).</p>
+ (see <a href="#5.1">Section 5.1, "Character Type and Encoding"</a>).</p>
<pre class="c++">
namespace xml_schema
diff --git a/documentation/cxx/tree/guide/index.xhtml b/documentation/cxx/tree/guide/index.xhtml
index 787610a..f96b09b 100644
--- a/documentation/cxx/tree/guide/index.xhtml
+++ b/documentation/cxx/tree/guide/index.xhtml
@@ -226,7 +226,7 @@
<tr>
<th>3</th><td><a href="#3">Overall Mapping Configuration</a>
<table class="toc">
- <tr><th>3.1</th><td><a href="#3.1">Character Type</a></td></tr>
+ <tr><th>3.1</th><td><a href="#3.1">Character Type and Encoding</a></td></tr>
<tr><th>3.2</th><td><a href="#3.2">Support for Polymorphism </a></td></tr>
<tr><th>3.3</th><td><a href="#3.3">Namespace Mapping</a></td></tr>
<tr><th>3.4</th><td><a href="#3.4">Thread Safety</a></td></tr>
@@ -1148,7 +1148,7 @@ $ doxygen hello.doxygen
Compiler Command Line Manual</a>.
</p>
- <h2><a name="3.1">3.1 Character Type</a></h2>
+ <h2><a name="3.1">3.1 Character Type and Encoding</a></h2>
<p>The C++/Tree mapping has built-in support for two character types:
<code>char</code> and <code>wchar_t</code>. You can select the
@@ -1160,14 +1160,25 @@ $ doxygen hello.doxygen
<p>Another aspect of the mapping that depends on the character type
is character encoding. For the <code>char</code> character type
- the encoding is UTF-8. For the <code>wchar_t</code> character type
- the encoding is automatically selected between UTF-16 and
- UTF-32/UCS-4 depending on the size of the <code>wchar_t</code> type.
- On some platforms (for example, Windows with Visual C++ and AIX with IBM XL
- C++) <code>wchar_t</code> is 2 bytes long. For these platforms the
+ the default encoding is UTF-8. Other supported encodings are
+ ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as
+ custom encodings. You can select which encoding should be used
+ in the object model with the <code>--char-encoding</code> command
+ line option.</p>
+
+ <p>For the <code>wchar_t</code> character type the encoding is
+ automatically selected between UTF-16 and UTF-32/UCS-4 depending
+ on the size of the <code>wchar_t</code> type. On some platforms
+ (for example, Windows with Visual C++ and AIX with IBM XL C++)
+ <code>wchar_t</code> is 2 bytes long. For these platforms the
encoding is UTF-16. On other platforms <code>wchar_t</code> is 4 bytes
long and UTF-32/UCS-4 is used.</p>
+ <p>Note also that the character encoding that is used in the object model
+ is independent of the encodings used in input and output XML. In fact,
+ all three (object mode, input XML, and output XML) can have different
+ encodings.</p>
+
<h2><a name="3.2">3.2 Support for Polymorphism</a></h2>
<p>By default XSD generates non-polymorphic code. If your vocabulary
diff --git a/documentation/cxx/tree/manual/index.xhtml b/documentation/cxx/tree/manual/index.xhtml
index d468fe3..91c6154 100644
--- a/documentation/cxx/tree/manual/index.xhtml
+++ b/documentation/cxx/tree/manual/index.xhtml
@@ -226,7 +226,7 @@
<th>2.1</th><td><a href="#2.1">Preliminary Information</a>
<table class="toc">
<tr><th>2.1.1</th><td><a href="#2.1.1">Identifiers</a></td></tr>
- <tr><th>2.1.2</th><td><a href="#2.1.2">Character Type</a></td></tr>
+ <tr><th>2.1.2</th><td><a href="#2.1.2">Character Type and Encoding</a></td></tr>
<tr><th>2.1.3</th><td><a href="#2.1.3">XML Schema Namespace</a></td></tr>
<tr><th>2.1.4</th><td><a href="#2.1.4">Anonymous Types</a></td></tr>
</table>
@@ -567,7 +567,7 @@
CONVENTION section in the <a href="http://www.codesynthesis.com/projects/xsd/documentation/xsd.xhtml">XSD
Compiler Command Line Manual</a>.</p>
- <h3><a name="2.1.2">2.1.2 Character Type</a></h3>
+ <h3><a name="2.1.2">2.1.2 Character Type and Encoding</a></h3>
<p>The code that implements the mapping, depending on the
<code>--char-type</code> option, is generated using either
@@ -577,6 +577,20 @@
your schemas, for example <code>std::basic_string&lt;C></code>.
</p>
+ <p>Another aspect of the mapping that depends on the character type
+ is character encoding. For the <code>char</code> character type
+ the default encoding is UTF-8. Other supported encodings are
+ ISO-8859-1, Xerces-C++ Local Code Page (LPC), as well as
+ custom encodings and can be selected with the
+ <code>--char-encoding</code> command line option.</p>
+
+ <p>For the <code>wchar_t</code> character type the encoding is
+ automatically selected between UTF-16 and UTF-32/UCS-4 depending
+ on the size of the <code>wchar_t</code> type. On some platforms
+ (for example, Windows with Visual C++ and AIX with IBM XL C++)
+ <code>wchar_t</code> is 2 bytes long. For these platforms the
+ encoding is UTF-16. On other platforms <code>wchar_t</code> is 4 bytes
+ long and UTF-32/UCS-4 is used.</p>
<h3><a name="2.1.3">2.1.3 XML Schema Namespace</a></h3>
diff --git a/documentation/makefile b/documentation/makefile
index 0638928..81a26fe 100644
--- a/documentation/makefile
+++ b/documentation/makefile
@@ -20,6 +20,7 @@ $(install): $(out_base)/cxx/.install
$(call install-data,$(src_base)/future.xhtml,$(install_doc_dir)/xsd/future.xhtml)
$(call install-data,$(src_base)/schema-authoring-guide.xhtml,$(install_doc_dir)/xsd/schema-authoring-guide.xhtml)
$(call install-data,$(src_base)/xsd.xhtml,$(install_doc_dir)/xsd/xsd.xhtml)
+ $(call install-data,$(src_base)/custom-literals.xsd,$(install_doc_dir)/xsd/custom-literals.xsd)
$(call install-data,$(src_base)/xsd.1,$(install_man_dir)/man1/xsd.1)
# Dist.
@@ -32,6 +33,7 @@ $(dist-common):
$(call install-data,$(src_base)/xsd.1,$(dist_prefix)/documentation/xsd.1)
$(call install-data,$(src_base)/future.xhtml,$(dist_prefix)/documentation/future.xhtml)
$(call install-data,$(src_base)/schema-authoring-guide.xhtml,$(dist_prefix)/documentation/schema-authoring-guide.xhtml)
+ $(call install-data,$(src_base)/custom-literals.xsd,$(dist_prefix)/documentation/custom-literals.xsd)
$(dist): $(dist-common) $(out_base)/cxx/.dist
$(dist-win): $(dist-common) $(out_base)/cxx/.dist-win
diff --git a/documentation/xsd.1 b/documentation/xsd.1
index b84586d..1038d50 100644
--- a/documentation/xsd.1
+++ b/documentation/xsd.1
@@ -127,6 +127,34 @@ Valid values are
and
.BR wchar_t .
.
+.IP "\fB\--char-encoding \fIenc\fR"
+Specify the character encoding that should be used in the object model.
+Valid values for the
+.B char
+character type are
+.B utf8
+(default),
+.BR iso8859-1 , lcp
+(Xerces-C++ local code page),
+and
+.BR custom .
+If you pass
+.B custom
+as the value then you will need to include the transcoder implementation
+header for your encoding at the beginning of the generated header files
+(see the
+.B --hxx-prologue
+option).
+
+For the
+.B wchar_t
+character type the only valid value is
+.B auto
+and the encoding is automatically selected between UTF-16 and UTF-32/UCS-4,
+depending on the
+.B wchar_t
+type size.
+.
.IP "\fB\--output-dir \fIdir\fR"
Write generated files to
.I dir
@@ -450,6 +478,17 @@ in places where DLL export/import control statements (
.BR __declspec(dllexport/dllimport) )
are necessary.
+.IP "\fB\--custom-literals \fIfile\fR"
+Load custom XML string to C++ literal mappings from
+.IR file .
+This mechanism can be useful if you are using a custom character encoding
+and some of the strings in your schemas, for example element/attribute
+names or enumeration values, contain non-ASCII characters. In this case
+you will need to provide a custom mapping to C++ literals for such
+strings. The format of this file is specified in the
+.B custom-literals.xsd
+XML Schema file that can be found in the documentation directory.
+
.IP "\fB\--export-xml-schema\fR"
Export/import types in the XML Schema namespace using the export
symbol provided with the
diff --git a/documentation/xsd.xhtml b/documentation/xsd.xhtml
index 49d6503..da2b52c 100644
--- a/documentation/xsd.xhtml
+++ b/documentation/xsd.xhtml
@@ -125,6 +125,21 @@
instead of the default <code><b>char</b></code>. Valid values
are <code><b>char</b></code> and <code><b>wchar_t</b></code>.</dd>
+ <dt><code><b>--char-encoding</b> <i>enc</i></code></dt>
+ <dd>Specify the character encoding that should be used in the object
+ model. Valid values for the <code><b>char</b></code> character type
+ are <code><b>utf8</b></code> (default), <code><b>iso8859-1</b></code>,
+ <code><b>lcp</b></code> (Xerces-C++ local code page), and
+ <code><b>custom</b></code>. If you pass <code><b>custom</b></code> as
+ the value then you will need to include the transcoder implementation
+ header for your encoding at the beginning of the generated header
+ files (see the <code><b>--hxx-prologue</b></code> option).
+
+ <p>For the <code><b>wchar_t</b></code> character type the only valid
+ value is <code><b>auto</b></code> and the encoding is automatically
+ selected between UTF-16 and UTF-32/UCS-4, depending on the
+ <code><b>wchar_t</b></code> type size.</p></dd>
+
<dt><code><b>--output-dir</b> <i>dir</i></code></dt>
<dd>Write generated files to <code><i>dir</i></code> instead of
the current directory.</dd>
@@ -393,6 +408,18 @@
generated file for which there is no file-specific epilogue file.
</dd>
+ <dt><code><b>--custom-literals</b> <i>file</i></code></dt>
+ <dd>Load custom XML string to C++ literal mappings from
+ <code><i>file</i></code>. This mechanism can be useful if you
+ are using a custom character encoding and some of the strings
+ in your schemas, for example element/attribute names or enumeration
+ values, contain non-ASCII characters. In this case you will need
+ to provide a custom mapping to C++ literals for such
+ strings. The format of this file is specified in the
+ <code><b>custom-literals.xsd</b></code> XML Schema file that
+ can be found in the documentation directory.
+ </dd>
+
<dt><code><b>--export-symbol</b> <i>symbol</i></code></dt>
<dd>Insert <code><i>symbol</i></code> in places where DLL
export/import control statements