From 901dfde45302b8e96209a9792644ade60e80be32 Mon Sep 17 00:00:00 2001
From: Boris Kolpackov 14.1.12 transient
14.1.13 sectionable
+ 14.1.14 deleted
@@ -665,6 +666,7 @@ for consistency.
14.1.15 bulk
@@ -838,6 +840,7 @@ for consistency.
15.1 Transaction Callbacks
+ 15.2 Persistent Class Template Instantiations 15.3 Bulk Database Operations 21.5.5 Affected Row Count and DDL Statements 21.5.6 Long Data and Auto Object Ids, ROWVERSION
+ 21.5.7 Long Data and By-Value Accessors/Modifiers
@@ -3679,6 +3682,12 @@ cerr << "Jane's id: " << jane_id << endl;
threads in your application and to other applications as soon as
possible.21.5.8 Bulk Update and ROWVERSION
Some database systems support persisting multiple objects with a
+ single underlying statement execution which can result in significantly
+ improved performance. For such database systems ODB provides
+ bulk persist()
functions. For details, refer to
+ Section 15.3, "Bulk Database Operations".
Once an object is made persistent, and you know its object id, it
@@ -3898,6 +3907,11 @@ t.commit ();
update()
functions for such an object will result in a
compile-time error.
Similar to persist()
, for database systems that support
+ this functionality, ODB provides bulk update()
functions.
+ For details, refer to Section 15.3, "Bulk Database
+ Operations".
To delete a persistent object's state from the database we use the @@ -3966,6 +3980,11 @@ t.commit (); instead. For a more detailed discussion of optimistic concurrency, refer to Chapter 12, "Optimistic Concurrency".
+Similar to persist()
and update()
, for
+ database systems that support this functionality, ODB provides
+ bulk erase()
functions. For details, refer to
+ Section 15.3, "Bulk Database Operations".
The erase_query()
function allows us to delete
the state of multiple objects matching certain criteria. It uses
the query expression of the database::query()
function
@@ -4452,6 +4471,16 @@ namespace odb
virtual const char*
what () const throw ();
};
+
+ // Bulk operation exceptions.
+ //
+ struct multiple_exceptions: exception
+ {
+ ...
+
+ virtual const char*
+ what () const throw ();
+ };
}
@@ -4557,6 +4586,10 @@ namespace odb
to the corresponding object. See Chapter 9,
"Sections" for more information on these exceptions.
The multiple_exceptions
exception is thrown by the
+ bulk API functions. Refer to Section 15.3, "Bulk
+ Database Operations" for details.
The odb::exception
class is defined in the
<odb/exception.hxx>
header file. All the
concrete ODB exceptions are defined in
@@ -13852,6 +13885,12 @@ class person
bulk
table
bulk
The bulk
specifier enables bulk operation support for
+ the persistent class. The single required argument to this specifier
+ is the batch size. For more information on this functionality, refer
+ to Section 15.3, "Bulk Database Operations".
A pragma with the view
qualifier declares a C++ class
@@ -18188,6 +18234,401 @@ class derived: public string_base
};
+
Some database systems supported by ODB provide a mechanism, often + called bulk or batch statement execution, that allows us to execute + the same SQL statement on multiple sets of data at once and with a + single database API call. This often results in significantly + better performance if we need to execute the same statement for a + large number of data sets (thousands to millions).
+ +ODB translates this mechanism to bulk operations which allow + us to persist, update, or erase a range of objects in the database. + Currently, from all the database systems supported by ODB, only + Oracle and Microsoft SQL Server are capable of bulk operations. + There is also currently no emulation of the bulk API for other + databases nor dynamic multi-database support. As a result, if + you are using dynamic multi-database support, you will need to + "drop down" to static support in order to access the bulk API. + Refer to Chapter 16, "Multi-Database Support" + for details.
+ +As we will discuss later in this section, bulk operations have
+ complex failure semantics that is dictated by the underlying
+ database API. As a result, support for bulk persist, update,
+ and erase is limited to persistent classes for which these
+ operations can be performed with a single database statement
+ execution. In particular, bulk operations are not available
+ for polymorphic objects (Section 8.2,
+ "Polymorphism Inheritance") or objects that have
+ containers (inverse containers of object pointers are an
+ exception). Furthermore, for objects that have sections
+ (Chapter 9, "Sections") the bulk update operation
+ will only be available if all the sections are manually-updated.
+ On the other hand, bulk operations are supported for objects
+ that use optimistic concurrency (Chapter 12,
+ "Optimistic Concurrency") or have no object id
+ (Section 14.1.6, "no_id
").
To enable the generation of bulk operation support for a persistent
+ class we use the bulk
pragma. For example:
+#pragma db object bulk(5000) +class person +{ + ... + + #pragma db id auto + unsigned long id; +}; ++ +
The single argument to the bulk
pragma is the batch
+ size. The batch size specifies the maximum number of data sets
+ that should be handled with a single underlying statement execution.
+ If the range that we want to perform the bulk operation on contains
+ more objects than the batch size, then ODB will split this operation
+ into multiple underlying statement executions (batches). To illustrate
+ this point with an example, suppose we want to persist 53,000 objects
+ and the batch size is 5,000. ODB will then execute the statement
+ 11 times, the first 10 times with 5,000 data sets each, and the
+ last time with the remaining 3,000 data sets.
The commonly used batch sizes are in the 2,000-5,000 range, though + smaller or larger batches could provide better performance, + depending on the situation. As a result, it is recommended to + experiment with different batch sizes to determine the optimum + number for a particular object and its use-cases. Note also that + you may achieve better performance by also splitting a large bulk + operation into multiple transactions (Section 3.5, + "Transactions").
+ +For database systems that do not support bulk operations the
+ bulk
pragma is ignored. It is also possible to
+ specify different batch sizes for different database systems
+ by using the database prefix, for example:
+#pragma db object mssql:bulk(3000) oracle:bulk(4000) +class person +{ + ... +}; ++ +
Once the bulk support is enabled for a particular object, we can
+ use the following database
functions to perform bulk
+ operations:
+template <typename I> +void +persist (I begin, I end, bool continue_failed = true); + +template <typename I> +void +update (I begin, I end, bool continue_failed = true); + +template <typename I> +void +erase (I obj_begin, I obj_end, bool continue_failed = true); + +template <typename T, typename I> +void +erase (I id_begin, I id_end, bool continue_failed = true); ++ +
Every bulk API function expects a range of elements, passed in
+ the canonical C++ form as a pair of input iterators. In case of
+ persist()
, update()
, and the first
+ erase()
overload, we pass a range of objects,
+ either as references or as pointers, raw or smart. The following
+ example illustrates the most common scenarios using the
+ persist()
call:
+// C array of objects. +// +person a[2] {{"John", "Doe"}, {"Jane", "Doe"}}; + +db.persist (a, a + sizeof(a) / sizeof(a[0])); + + +// Vector of objects. +// +std::vector<person> v {{"John", "Doe"}, {"Jane", "Doe"}}; + +db.persist (v.begin (), v.end ()); + + +// C array of raw pointers to objects. +// +person p1 ("John", "Doe"); +person p2 ("Jane", "Doe"); +person* pa[2] {&p1, &p2}; + +db.persist (pa, pa + sizeof(pa) / sizeof(pa[0])); + + +// Vector of raw pointers to objects. +// +std::vector<person*> pv {&p1, &p2}; + +db.persist (pv.begin (), pv.end ()); + + +// Vector of smart (shared) pointers to objects. +// +std::vector<std::shared_ptr<person>> sv { + std::make_shared<person> ("John", "Doe"), + std::make_shared<person> ("Jane", "Doe")}; + +db.persist (sv.begin (), sv.end ()); ++ +
The ability to perform a bulk operation on a range of raw pointers + to objects can be especially useful when the application stores + objects in a way that does not easily conform to the pair of + iterators interface. In such cases we can create a temporary + container of shallow pointers to objects and use that to perform + the bulk operation, for example:
+ ++struct person_entry +{ + person obj; + + // Some additional data. + ... +}; + +typedef std::vector<person_entry> people; + +void +persist (odb::database& db, people& p) +{ + std::vector<person*> tmp; + tmp.reserve (p.size ()); + std::for_each (p.begin (), + p.end (), + [&tmp] (person_entry& pe) + { + tmp.push_back (&pe.obj); + }); + + + db.persist (tmp.begin (), tmp.end ()); +} ++ +
The second overload of the bulk erase()
function
+ allows us to pass a range of object ids rather than objects
+ themselves. As with the corresponding non-bulk version, we
+ have to specify the object type explicitly, for example:
+std::vector<unsigned long> ids {1, 2}; + +db.erase<person> (ids.begin (), ids.end ()); ++ +
Conceptually, a bulk operation is equivalent to performing the
+ corresponding non-bulk version in a loop, except when it comes to the
+ failure semantics. Both databases that currently are capable of
+ bulk operations (Oracle and SQL Server) do not stop when a data
+ set in a batch fails (for example, because of a unique constraint
+ violation). Instead, they continue executing subsequent data
+ sets until every element in the batch has been attempted. The
+ continue_failed
argument in the bulk functions listed
+ above specifies whether ODB should extend this behavior and continue
+ with subsequent batches if the one it has tried to execute has failed
+ elements. The default behavior is to continue.
The consequence of this failure semantics is that we may have
+ multiple elements in the range failed for different reasons.
+ For example, if we tried to persist a number of objects, some
+ of them might have failed because they are already persistent
+ while others — because of a unique constraint violation.
+ As a result, ODB uses the special odb::multiple_exceptions
+ exception to report failures in the bulk API functions. This
+ exception is thrown if one or more elements in the range have
+ failed and it contains the error information in the form of other
+ ODB exception for each failed position. The
+ multiple_exceptions
class has the following interface:
+struct multiple_exceptions: odb::exception +{ + // Element type. + // + struct value_type + { + std::size_t + position () const; + + const odb::exception& + exception () const; + + bool + maybe () const; + }; + + // Iteration. + // + typedef set_type::const_iterator iterator; + typedef set_type::const_iterator const_iterator; + + iterator + begin () const; + + iterator + end () const; + + // Lookup. + // + const value_type* + operator[] (std::size_t) const; + + // Severity, failed and attempted counts. + // + std::size_t + attempted () const; + + std::size_t + failed () const; + + bool + fatal () const; + + void + fatal (bool); + + // Direct data access. + // + typedef std::set<value_type> set_type; + + const set_type& + set () const; + + // odb::exception interface. + // + virtual const char* + what () const throw (); +}; ++ +
The multiple_exceptions
class has a map-like interface
+ with the key being the position in the range and the value being
+ the exception plus the maybe
flag (discussed below).
+ As a result, we can either iterate over the failed positions or
+ we can check whether a specific position in the range has failed.
+ The following example shows what a catch
-handler for
+ this exception might look like:
+std::vector<person> objs {{"John", "Doe"}, {"Jane", "Doe"}}; + +try +{ + db.persist (objs.begin (), objs.end ()); +} +catch (const odb::multiple_exceptions& me) +{ + for (const auto& v: me) + { + size_t p (v.position ()); + + try + { + throw v.exception (); + } + catch (const odb::object_already_persistent&) + { + cerr << p << ": duplicate id: " << objs[p].id () << endl; + } + catch (const odb::exception& e) + { + cerr << p << ": " << e.what () << endl; + } + } +} ++ +
If, however, all we want is to show the diagnostics to the user,
+ then the string returned by the what()
function
+ will contain the error information for each failed position.
+ Here is what it might look like (using Oracle as an example):
+multiple exceptions, 4 elements attempted, 2 failed: +[0] object already persistent +[3] 1: ORA-00001: unique constraint (ODB_TEST.person_last_i) violated ++ +
Both databases that currently are capable of bulk operations return
+ a total count of affected rows rather than individual counts for
+ each data set. This limitation prevents ODB from being able to
+ always determine which elements in the batch haven't affected
+ any rows and, for the update and erase operations, translate
+ this to the object_not_persistent
exceptions. As
+ a result, if some elements in the batch haven't affected any
+ rows and ODB is unable to determine exactly which ones, it will mark
+ all the elements in this batch as "maybe not persistent". That
+ is, it will insert the object_not_persistent
exception
+ and set the maybe
flag for every position in the
+ batch. The diagnostics string returned by what()
+ will also reflect this situation, for example (assuming batch
+ size of 3):
+multiple exceptions, 4 elements attempted, 4 failed: +[0-2] (some) object not persistent +[3] object not persistent ++ +
The way to handle and recover from such "maybe failures" will have
+ to be application-specific. For example, for some applications the
+ fact that some objects no longer exist in the database when
+ performing bulk erase might be an ignorable error. If, however,
+ the application needs to determine exactly which elements in the batch
+ have failed, then a load()
call will be required for each
+ element in the batch (or a query using a view to avoid loading all
+ the data members; Chapter 10, "Views"). This is also
+ something to keep in mind when selecting the batch size since for
+ larger sizes it will be more expensive (more loads to perform) to
+ handle such "maybe failures". If the failures are not uncommon, as
+ is the case, for example, when using optimistic concurrency, then
+ it may make sense to use a smaller batch.
The lookup operator (operator[]
) returns NULL
+ if the element at this position has no exception. Note also that the
+ returned value is value_type*
and not
+ odb::exception*
in order to provide access to the
+ maybe
flag discussed above.
The multiple_exceptions
class also provides access
+ to the number of positions attempted (the attempted()
+ accessor) and failed (the failed()
accessor). Note
+ that the failed count includes the "maybe failed" positions.
The multiple_exceptions
exception can also be fatal.
+ If the fatal()
accessor returns true
, then
+ (some of) the exceptions were fatal. In this case, even for positions
+ that did not fail, no attempts were made to complete the operation
+ and the transaction must be aborted.
If fatal()
returns false, then the operation on the
+ elements that don't have an exception has succeeded. The application
+ can ignore the errors or try to correct the errors and re-attempt
+ the operation on the elements that did fail. In either case, the
+ transaction can be committed.
The fatal()
modifier allows you to "upgrade" an
+ exception to fatal, for example, for specific database error
+ codes.
ROWVERSION
The bulk update operation (Section 15.3, "Bulk Database
+ Operations") is not yet supported for persistent classes that use
+ ROWVERSION
-based optimistic concurrency. For such classes
+ the bulk update()
function is not available. The bulk
+ persist and erase support is still provided.
When the index
pragma (Section 14.7,
--
cgit v1.1