From 901dfde45302b8e96209a9792644ade60e80be32 Mon Sep 17 00:00:00 2001 From: Boris Kolpackov Date: Fri, 28 Nov 2014 09:23:03 +0200 Subject: Document bulk operation support --- doc/manual.xhtml | 449 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 449 insertions(+) (limited to 'doc') diff --git a/doc/manual.xhtml b/doc/manual.xhtml index c89cb28..5966bbf 100644 --- a/doc/manual.xhtml +++ b/doc/manual.xhtml @@ -536,6 +536,7 @@ for consistency. 14.1.12transient 14.1.13sectionable 14.1.14deleted + 14.1.15bulk @@ -665,6 +666,7 @@ for consistency. +
15.1Transaction Callbacks
15.2Persistent Class Template Instantiations
15.3Bulk Database Operations
@@ -838,6 +840,7 @@ for consistency. 21.5.5Affected Row Count and DDL Statements 21.5.6Long Data and Auto Object Ids, ROWVERSION 21.5.7Long Data and By-Value Accessors/Modifiers + 21.5.8Bulk Update and ROWVERSION @@ -3679,6 +3682,12 @@ cerr << "Jane's id: " << jane_id << endl; threads in your application and to other applications as soon as possible.

+

Some database systems support persisting multiple objects with a + single underlying statement execution which can result in significantly + improved performance. For such database systems ODB provides + bulk persist() functions. For details, refer to + Section 15.3, "Bulk Database Operations".

+

3.9 Loading Persistent Objects

Once an object is made persistent, and you know its object id, it @@ -3898,6 +3907,11 @@ t.commit (); update() functions for such an object will result in a compile-time error.

+

Similar to persist(), for database systems that support + this functionality, ODB provides bulk update() functions. + For details, refer to Section 15.3, "Bulk Database + Operations".

+

3.11 Deleting Persistent Objects

To delete a persistent object's state from the database we use the @@ -3966,6 +3980,11 @@ t.commit (); instead. For a more detailed discussion of optimistic concurrency, refer to Chapter 12, "Optimistic Concurrency".

+

Similar to persist() and update(), for + database systems that support this functionality, ODB provides + bulk erase() functions. For details, refer to + Section 15.3, "Bulk Database Operations".

+

The erase_query() function allows us to delete the state of multiple objects matching certain criteria. It uses the query expression of the database::query() function @@ -4452,6 +4471,16 @@ namespace odb virtual const char* what () const throw (); }; + + // Bulk operation exceptions. + // + struct multiple_exceptions: exception + { + ... + + virtual const char* + what () const throw (); + }; } @@ -4557,6 +4586,10 @@ namespace odb to the corresponding object. See Chapter 9, "Sections" for more information on these exceptions.

+

The multiple_exceptions exception is thrown by the + bulk API functions. Refer to Section 15.3, "Bulk + Database Operations" for details.

+

The odb::exception class is defined in the <odb/exception.hxx> header file. All the concrete ODB exceptions are defined in @@ -13852,6 +13885,12 @@ class person 14.1.14 + + bulk + enable bulk operations for a persistent class + 14.1.15 + +

14.1.1 table

@@ -14487,6 +14526,13 @@ class employer refer to Section 13.4, "Soft Object Model Changes".

+

14.1.15 bulk

+ +

The bulk specifier enables bulk operation support for + the persistent class. The single required argument to this specifier + is the batch size. For more information on this functionality, refer + to Section 15.3, "Bulk Database Operations".

+

14.2 View Type Pragmas

A pragma with the view qualifier declares a C++ class @@ -18188,6 +18234,401 @@ class derived: public string_base }; +

15.3 Bulk Database Operations

+ +

Some database systems supported by ODB provide a mechanism, often + called bulk or batch statement execution, that allows us to execute + the same SQL statement on multiple sets of data at once and with a + single database API call. This often results in significantly + better performance if we need to execute the same statement for a + large number of data sets (thousands to millions).

+ +

ODB translates this mechanism to bulk operations which allow + us to persist, update, or erase a range of objects in the database. + Currently, from all the database systems supported by ODB, only + Oracle and Microsoft SQL Server are capable of bulk operations. + There is also currently no emulation of the bulk API for other + databases nor dynamic multi-database support. As a result, if + you are using dynamic multi-database support, you will need to + "drop down" to static support in order to access the bulk API. + Refer to Chapter 16, "Multi-Database Support" + for details.

+ +

As we will discuss later in this section, bulk operations have + complex failure semantics that is dictated by the underlying + database API. As a result, support for bulk persist, update, + and erase is limited to persistent classes for which these + operations can be performed with a single database statement + execution. In particular, bulk operations are not available + for polymorphic objects (Section 8.2, + "Polymorphism Inheritance") or objects that have + containers (inverse containers of object pointers are an + exception). Furthermore, for objects that have sections + (Chapter 9, "Sections") the bulk update operation + will only be available if all the sections are manually-updated. + On the other hand, bulk operations are supported for objects + that use optimistic concurrency (Chapter 12, + "Optimistic Concurrency") or have no object id + (Section 14.1.6, "no_id").

+ +

To enable the generation of bulk operation support for a persistent + class we use the bulk pragma. For example:

+ +
+#pragma db object bulk(5000)
+class person
+{
+  ...
+
+  #pragma db id auto
+  unsigned long id;
+};
+  
+ +

The single argument to the bulk pragma is the batch + size. The batch size specifies the maximum number of data sets + that should be handled with a single underlying statement execution. + If the range that we want to perform the bulk operation on contains + more objects than the batch size, then ODB will split this operation + into multiple underlying statement executions (batches). To illustrate + this point with an example, suppose we want to persist 53,000 objects + and the batch size is 5,000. ODB will then execute the statement + 11 times, the first 10 times with 5,000 data sets each, and the + last time with the remaining 3,000 data sets.

+ +

The commonly used batch sizes are in the 2,000-5,000 range, though + smaller or larger batches could provide better performance, + depending on the situation. As a result, it is recommended to + experiment with different batch sizes to determine the optimum + number for a particular object and its use-cases. Note also that + you may achieve better performance by also splitting a large bulk + operation into multiple transactions (Section 3.5, + "Transactions").

+ +

For database systems that do not support bulk operations the + bulk pragma is ignored. It is also possible to + specify different batch sizes for different database systems + by using the database prefix, for example:

+ +
+#pragma db object mssql:bulk(3000) oracle:bulk(4000)
+class person
+{
+  ...
+};
+  
+ +

Once the bulk support is enabled for a particular object, we can + use the following database functions to perform bulk + operations:

+ +
+template <typename I>
+void
+persist (I begin, I end, bool continue_failed = true);
+
+template <typename I>
+void
+update (I begin, I end, bool continue_failed = true);
+
+template <typename I>
+void
+erase (I obj_begin, I obj_end, bool continue_failed = true);
+
+template <typename T, typename I>
+void
+erase (I id_begin, I id_end, bool continue_failed = true);
+  
+ +

Every bulk API function expects a range of elements, passed in + the canonical C++ form as a pair of input iterators. In case of + persist(), update(), and the first + erase() overload, we pass a range of objects, + either as references or as pointers, raw or smart. The following + example illustrates the most common scenarios using the + persist() call:

+ +
+// C array of objects.
+//
+person a[2] {{"John", "Doe"}, {"Jane", "Doe"}};
+
+db.persist (a, a + sizeof(a) / sizeof(a[0]));
+
+
+// Vector of objects.
+//
+std::vector<person> v {{"John", "Doe"}, {"Jane", "Doe"}};
+
+db.persist (v.begin (), v.end ());
+
+
+// C array of raw pointers to objects.
+//
+person p1 ("John", "Doe");
+person p2 ("Jane", "Doe");
+person* pa[2] {&p1, &p2};
+
+db.persist (pa, pa + sizeof(pa) / sizeof(pa[0]));
+
+
+// Vector of raw pointers to objects.
+//
+std::vector<person*> pv {&p1, &p2};
+
+db.persist (pv.begin (), pv.end ());
+
+
+// Vector of smart (shared) pointers to objects.
+//
+std::vector<std::shared_ptr<person>> sv {
+  std::make_shared<person> ("John", "Doe"),
+  std::make_shared<person> ("Jane", "Doe")};
+
+db.persist (sv.begin (), sv.end ());
+  
+ +

The ability to perform a bulk operation on a range of raw pointers + to objects can be especially useful when the application stores + objects in a way that does not easily conform to the pair of + iterators interface. In such cases we can create a temporary + container of shallow pointers to objects and use that to perform + the bulk operation, for example:

+ +
+struct person_entry
+{
+  person obj;
+
+  // Some additional data.
+  ...
+};
+
+typedef std::vector<person_entry> people;
+
+void
+persist (odb::database& db, people& p)
+{
+  std::vector<person*> tmp;
+  tmp.reserve (p.size ());
+  std::for_each (p.begin (),
+                 p.end (),
+                 [&tmp] (person_entry& pe)
+                 {
+                   tmp.push_back (&pe.obj);
+                 });
+
+
+  db.persist (tmp.begin (), tmp.end ());
+}
+  
+ +

The second overload of the bulk erase() function + allows us to pass a range of object ids rather than objects + themselves. As with the corresponding non-bulk version, we + have to specify the object type explicitly, for example:

+ +
+std::vector<unsigned long> ids {1, 2};
+
+db.erase<person> (ids.begin (), ids.end ());
+  
+ +

Conceptually, a bulk operation is equivalent to performing the + corresponding non-bulk version in a loop, except when it comes to the + failure semantics. Both databases that currently are capable of + bulk operations (Oracle and SQL Server) do not stop when a data + set in a batch fails (for example, because of a unique constraint + violation). Instead, they continue executing subsequent data + sets until every element in the batch has been attempted. The + continue_failed argument in the bulk functions listed + above specifies whether ODB should extend this behavior and continue + with subsequent batches if the one it has tried to execute has failed + elements. The default behavior is to continue.

+ +

The consequence of this failure semantics is that we may have + multiple elements in the range failed for different reasons. + For example, if we tried to persist a number of objects, some + of them might have failed because they are already persistent + while others — because of a unique constraint violation. + As a result, ODB uses the special odb::multiple_exceptions + exception to report failures in the bulk API functions. This + exception is thrown if one or more elements in the range have + failed and it contains the error information in the form of other + ODB exception for each failed position. The + multiple_exceptions class has the following interface:

+ +
+struct multiple_exceptions: odb::exception
+{
+  // Element type.
+  //
+  struct value_type
+  {
+    std::size_t
+    position () const;
+
+    const odb::exception&
+    exception () const;
+
+    bool
+    maybe () const;
+  };
+
+  // Iteration.
+  //
+  typedef set_type::const_iterator iterator;
+  typedef set_type::const_iterator const_iterator;
+
+  iterator
+  begin () const;
+
+  iterator
+  end () const;
+
+  // Lookup.
+  //
+  const value_type*
+  operator[] (std::size_t) const;
+
+  // Severity, failed and attempted counts.
+  //
+  std::size_t
+  attempted () const;
+
+  std::size_t
+  failed () const;
+
+  bool
+  fatal () const;
+
+  void
+  fatal (bool);
+
+  // Direct data access.
+  //
+  typedef std::set<value_type> set_type;
+
+  const set_type&
+  set () const;
+
+  // odb::exception interface.
+  //
+  virtual const char*
+  what () const throw ();
+};
+  
+ +

The multiple_exceptions class has a map-like interface + with the key being the position in the range and the value being + the exception plus the maybe flag (discussed below). + As a result, we can either iterate over the failed positions or + we can check whether a specific position in the range has failed. + The following example shows what a catch-handler for + this exception might look like:

+ +
+std::vector<person> objs {{"John", "Doe"}, {"Jane", "Doe"}};
+
+try
+{
+  db.persist (objs.begin (), objs.end ());
+}
+catch (const odb::multiple_exceptions& me)
+{
+  for (const auto& v: me)
+  {
+    size_t p (v.position ());
+
+    try
+    {
+      throw v.exception ();
+    }
+    catch (const odb::object_already_persistent&)
+    {
+      cerr << p << ": duplicate id: " << objs[p].id () << endl;
+    }
+    catch (const odb::exception& e)
+    {
+      cerr << p << ": " << e.what () << endl;
+    }
+  }
+}
+  
+ +

If, however, all we want is to show the diagnostics to the user, + then the string returned by the what() function + will contain the error information for each failed position. + Here is what it might look like (using Oracle as an example):

+ +
+multiple exceptions, 4 elements attempted, 2 failed:
+[0] object already persistent
+[3] 1: ORA-00001: unique constraint (ODB_TEST.person_last_i) violated
+  
+ +

Both databases that currently are capable of bulk operations return + a total count of affected rows rather than individual counts for + each data set. This limitation prevents ODB from being able to + always determine which elements in the batch haven't affected + any rows and, for the update and erase operations, translate + this to the object_not_persistent exceptions. As + a result, if some elements in the batch haven't affected any + rows and ODB is unable to determine exactly which ones, it will mark + all the elements in this batch as "maybe not persistent". That + is, it will insert the object_not_persistent exception + and set the maybe flag for every position in the + batch. The diagnostics string returned by what() + will also reflect this situation, for example (assuming batch + size of 3):

+ +
+multiple exceptions, 4 elements attempted, 4 failed:
+[0-2] (some) object not persistent
+[3] object not persistent
+  
+ +

The way to handle and recover from such "maybe failures" will have + to be application-specific. For example, for some applications the + fact that some objects no longer exist in the database when + performing bulk erase might be an ignorable error. If, however, + the application needs to determine exactly which elements in the batch + have failed, then a load() call will be required for each + element in the batch (or a query using a view to avoid loading all + the data members; Chapter 10, "Views"). This is also + something to keep in mind when selecting the batch size since for + larger sizes it will be more expensive (more loads to perform) to + handle such "maybe failures". If the failures are not uncommon, as + is the case, for example, when using optimistic concurrency, then + it may make sense to use a smaller batch.

+ +

The lookup operator (operator[]) returns NULL + if the element at this position has no exception. Note also that the + returned value is value_type* and not + odb::exception* in order to provide access to the + maybe flag discussed above.

+ +

The multiple_exceptions class also provides access + to the number of positions attempted (the attempted() + accessor) and failed (the failed() accessor). Note + that the failed count includes the "maybe failed" positions.

+ +

The multiple_exceptions exception can also be fatal. + If the fatal() accessor returns true, then + (some of) the exceptions were fatal. In this case, even for positions + that did not fail, no attempts were made to complete the operation + and the transaction must be aborted.

+ +

If fatal() returns false, then the operation on the + elements that don't have an exception has succeeded. The application + can ignore the errors or try to correct the errors and re-attempt + the operation on the elements that did fail. In either case, the + transaction can be committed.

+ +

The fatal() modifier allows you to "upgrade" an + exception to fatal, for example, for specific database error + codes.

+ @@ -23658,6 +24099,14 @@ namespace odb by-reference accessors and modifiers should be used for these data types.

+

21.5.8 Bulk Update and ROWVERSION

+ +

The bulk update operation (Section 15.3, "Bulk Database + Operations") is not yet supported for persistent classes that use + ROWVERSION-based optimistic concurrency. For such classes + the bulk update() function is not available. The bulk + persist and erase support is still provided.

+

21.6 SQL Server Index Definitions

When the index pragma (Section 14.7, -- cgit v1.1