From 5e527213a2430bb3018e5eebd909aef294edf9b5 Mon Sep 17 00:00:00 2001 From: Karen Arutyunov Date: Fri, 18 Dec 2020 18:48:46 +0300 Subject: Switch to build2 --- xsd/doc/xsd-epilogue.xhtml | 422 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 422 insertions(+) create mode 100644 xsd/doc/xsd-epilogue.xhtml (limited to 'xsd/doc/xsd-epilogue.xhtml') diff --git a/xsd/doc/xsd-epilogue.xhtml b/xsd/doc/xsd-epilogue.xhtml new file mode 100644 index 0000000..aef0418 --- /dev/null +++ b/xsd/doc/xsd-epilogue.xhtml @@ -0,0 +1,422 @@ +

NAMING CONVENTION

+ +

The compiler can be instructed to use a particular naming + convention in the generated code. A number of widely-used + conventions can be selected using the --type-naming + and --function-naming options. A custom + naming convention can be achieved using the + --type-regex, + --accessor-regex, + --one-accessor-regex, + --opt-accessor-regex, + --seq-accessor-regex, + --modifier-regex, + --one-modifier-regex, + --opt-modifier-regex, + --seq-modifier-regex, + --parser-regex, + --serializer-regex, + --const-regex, + --enumerator-regex, and + --element-type-regex options. +

+ +

The --type-naming option specifies the + convention that should be used for naming C++ types. Possible + values for this option are knr (default), + ucc, and java. The + knr value (stands for K&R) signifies + the standard, lower-case naming convention with the underscore + used as a word delimiter, for example: foo, + foo_bar. The ucc (stands + for upper-camel-case) and + java values a synonyms for the same + naming convention where the first letter of each word in the + name is capitalized, for example: Foo, + FooBar.

+ +

Similarly, the --function-naming option + specifies the convention that should be used for naming C++ + functions. Possible values for this option are knr + (default), lcc, and java. The + knr value (stands for K&R) signifies + the standard, lower-case naming convention with the underscore + used as a word delimiter, for example: foo(), + foo_bar(). The lcc value + (stands for lower-camel-case) signifies a naming convention + where the first letter of each word except the first is + capitalized, for example: foo(), fooBar(). + The java naming convention is similar to + the lower-camel-case one except that accessor functions are prefixed + with get, modifier functions are prefixed + with set, parsing functions are prefixed + with parse, and serialization functions are + prefixed with serialize, for example: + getFoo(), setFooBar(), + parseRoot(), serializeRoot().

+ +

Note that the naming conventions specified with the + --type-naming and + --function-naming options perform only limited + transformations on the names that come from the schema in the + form of type, attribute, and element names. In other words, to + get consistent results, your schemas should follow a similar + naming convention as the one you would like to have in the + generated code. Alternatively, you can use the + --*-regex options (discussed below) + to perform further transformations on the names that come from + the schema.

+ +

The + --type-regex, + --accessor-regex, + --one-accessor-regex, + --opt-accessor-regex, + --seq-accessor-regex, + --modifier-regex, + --one-modifier-regex, + --opt-modifier-regex, + --seq-modifier-regex, + --parser-regex, + --serializer-regex, + --const-regex, + --enumerator-regex, and + --element-type-regex options allow you to + specify extra regular expressions for each name category in + addition to the predefined set that is added depending on + the --type-naming and + --function-naming options. Expressions + that are provided with the --*-regex + options are evaluated prior to any predefined expressions. + This allows you to selectively override some or all of the + predefined transformations. When debugging your own expressions, + it is often useful to see which expressions match which names. + The --name-regex-trace option allows you + to trace the process of applying regular expressions to + names.

+ +

The value for the --*-regex options should be + a perl-like regular expression in the form + /pattern/replacement/. + Any character can be used as a delimiter instead of /. + Escaping of the delimiter character in pattern or + replacement is not supported. + All the regular expressions for each category are pushed into a + category-specific stack with the last specified expression + considered first. The first match that succeeds is used. For the + --one-accessor-regex (accessors with cardinality one), + --opt-accessor-regex (accessors with cardinality optional), and + --seq-accessor-regex (accessors with cardinality sequence) + categories the --accessor-regex expressions are + used as a fallback. For the + --one-modifier-regex, + --opt-modifier-regex, and + --seq-modifier-regex + categories the --modifier-regex expressions are + used as a fallback. For the --element-type-regex + category the --type-regex expressions are + used as a fallback.

+ +

The type name expressions (--type-regex) + are evaluated on the name string that has the following + format:

+ +

[namespace ]name[,name][,name][,name]

+ +

The element type name expressions + (--element-type-regex), effective only when + the --generate-element-type option is specified, + are evaluated on the name string that has the following + format:

+ +

namespace name

+ +

In the type name format the namespace part + followed by a space is only present for global type names. For + global types and elements defined in schemas without a target + namespace, the namespace part is empty but + the space is still present. In the type name format after the + initial name component, up to three additional + name components can be present, separated + by commas. For example:

+ +

http://example.com/hello type

+

foo

+

foo,iterator

+

foo,const,iterator

+ +

The following set of predefined regular expressions is used to + transform type names when the upper-camel-case naming convention + is selected:

+ +

/(?:[^ ]* )?([^,]+)/\u$1/

+

/(?:[^ ]* )?([^,]+),([^,]+)/\u$1\u$2/

+

/(?:[^ ]* )?([^,]+),([^,]+),([^,]+)/\u$1\u$2\u$3/

+

/(?:[^ ]* )?([^,]+),([^,]+),([^,]+),([^,]+)/\u$1\u$2\u$3\u$4/

+ +

The accessor and modifier expressions + (--*accessor-regex and + --*modifier-regex) are evaluated on the name string + that has the following format:

+ +

name[,name][,name]

+ +

After the initial name component, up to two + additional name components can be present, + separated by commas. For example:

+ +

foo

+

dom,document

+

foo,default,value

+ +

The following set of predefined regular expressions is used to + transform accessor names when the java naming + convention is selected:

+ +

/([^,]+)/get\u$1/

+

/([^,]+),([^,]+)/get\u$1\u$2/

+

/([^,]+),([^,]+),([^,]+)/get\u$1\u$2\u$3/

+ +

For the parser, serializer, and enumerator categories, the + corresponding regular expressions are evaluated on local names of + elements and on enumeration values, respectively. For example, the + following predefined regular expression is used to transform parsing + function names when the java naming convention + is selected:

+ +

/(.+)/parse\u$1/

+ +

The const category is used to create C++ constant names for the + element/wildcard/text content ids in ordered types.

+ +

See also the REGEX AND SHELL QUOTING section below.

+ +

TYPE MAP

+ +

Type map files are used in C++/Parser to define a mapping between + XML Schema and C++ types. The compiler uses this information + to determine the return types of post_* + functions in parser skeletons corresponding to XML Schema + types as well as argument types for callbacks corresponding + to elements and attributes of these types.

+ +

The compiler has a set of predefined mapping rules that map + built-in XML Schema types to suitable C++ types (discussed + below) and all other types to void. + By providing your own type maps you can override these predefined + rules. The format of the type map file is presented below: +

+ +
+namespace <schema-namespace> [<cxx-namespace>]
+{
+  (include <file-name>;)*
+  ([type] <schema-type> <cxx-ret-type> [<cxx-arg-type>];)*
+}
+  
+ +

Both <schema-namespace> and + <schema-type> are regex patterns while + <cxx-namespace>, + <cxx-ret-type>, and + <cxx-arg-type> are regex pattern + substitutions. All names can be optionally enclosed in + " ", for example, to include white-spaces.

+ +

<schema-namespace> determines XML + Schema namespace. Optional <cxx-namespace> + is prefixed to every C++ type name in this namespace declaration. + <cxx-ret-type> is a C++ type name that is + used as a return type for the post_* functions. + Optional <cxx-arg-type> is an argument + type for callback functions corresponding to elements and attributes + of this type. If + <cxx-arg-type> is not specified, it defaults + to <cxx-ret-type> if <cxx-ret-type> + ends with * or & (that is, + it is a pointer or a reference) and + const <cxx-ret-type>& + otherwise. + <file-name> is a file name either in the + " " or < > format + and is added with the #include directive to + the generated code.

+ +

The # character starts a comment that ends + with a new line or end of file. To specify a name that contains + # enclose it in " ". + For example:

+ +
+namespace http://www.example.com/xmlns/my my
+{
+  include "my.hxx";
+
+  # Pass apples by value.
+  #
+  apple apple;
+
+  # Pass oranges as pointers.
+  #
+  orange orange_t*;
+}
+  
+ +

In the example above, for the + http://www.example.com/xmlns/my#orange + XML Schema type, the my::orange_t* C++ type will + be used as both return and argument types.

+ +

Several namespace declarations can be specified in a single + file. The namespace declaration can also be completely + omitted to map types in a schema without a namespace. For + instance:

+ +
+include "my.hxx";
+apple apple;
+
+namespace http://www.example.com/xmlns/my
+{
+  orange "const orange_t*";
+}
+  
+ +

The compiler has a number of predefined mapping rules that can be + presented as the following map files. The string-based XML Schema + built-in types are mapped to either std::string + or std::wstring depending on the character type + selected with the --char-type option + (char by default).

+ +
+namespace http://www.w3.org/2001/XMLSchema
+{
+  boolean bool bool;
+
+  byte "signed char" "signed char";
+  unsignedByte "unsigned char" "unsigned char";
+
+  short short short;
+  unsignedShort "unsigned short" "unsigned short";
+
+  int int int;
+  unsignedInt "unsigned int" "unsigned int";
+
+  long "long long" "long long";
+  unsignedLong "unsigned long long" "unsigned long long";
+
+  integer "long long" "long long";
+
+  negativeInteger "long long" "long long";
+  nonPositiveInteger "long long" "long long";
+
+  positiveInteger "unsigned long long" "unsigned long long";
+  nonNegativeInteger "unsigned long long" "unsigned long long";
+
+  float float float;
+  double double double;
+  decimal double double;
+
+  string std::string;
+  normalizedString std::string;
+  token std::string;
+  Name std::string;
+  NMTOKEN std::string;
+  NCName std::string;
+  ID std::string;
+  IDREF std::string;
+  language std::string;
+  anyURI std::string;
+
+  NMTOKENS xml_schema::string_sequence;
+  IDREFS xml_schema::string_sequence;
+
+  QName xml_schema::qname;
+
+  base64Binary std::auto_ptr<xml_schema::buffer>
+               std::auto_ptr<xml_schema::buffer>;
+  hexBinary std::auto_ptr<xml_schema::buffer>
+            std::auto_ptr<xml_schema::buffer>;
+
+  date xml_schema::date;
+  dateTime xml_schema::date_time;
+  duration xml_schema::duration;
+  gDay xml_schema::gday;
+  gMonth xml_schema::gmonth;
+  gMonthDay xml_schema::gmonth_day;
+  gYear xml_schema::gyear;
+  gYearMonth xml_schema::gyear_month;
+  time xml_schema::time;
+}
+  
+ +

The last predefined rule maps anything that wasn't mapped by + previous rules to void:

+ +
+namespace .*
+{
+  .* void void;
+}
+  
+ + +

When you provide your own type maps with the + --type-map option, they are evaluated first. + This allows you to selectively override predefined rules.

+ +

REGEX AND SHELL QUOTING

+ +

When entering a regular expression argument in the shell + command line it is often necessary to use quoting (enclosing + the argument in " " or + ' ') in order to prevent the shell + from interpreting certain characters, for example, spaces as + argument separators and $ as variable + expansions.

+ +

Unfortunately it is hard to achieve this in a manner that is + portable across POSIX shells, such as those found on + GNU/Linux and UNIX, and Windows shell. For example, if you + use " " for quoting you will get a + wrong result with POSIX shells if your expression contains + $. The standard way of dealing with this + on POSIX systems is to use ' ' instead. + Unfortunately, Windows shell does not remove ' ' + from arguments when they are passed to applications. As a result you + may have to use ' ' for POSIX and + " " for Windows ($ is + not treated as a special character on Windows).

+ +

Alternatively, you can save regular expression options into + a file, one option per line, and use this file with the + --options-file option. With this approach + you don't need to worry about shell quoting.

+ +

DIAGNOSTICS

+ +

If the input file is not a valid W3C XML Schema definition, + xsd will issue diagnostic messages to STDERR + and exit with non-zero exit code.

+ +

BUGS

+ +

Send bug reports to the + xsd-users@codesynthesis.com mailing list.

+ + + + + + -- cgit v1.1