Apache Ignite C++

The apacheignite-cpp Developer Hub

Welcome to the apacheignite-cpp developer hub. You'll find comprehensive guides and documentation to help you start working with apacheignite-cpp as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Distributed DML

Overview

Apache Ignite C++ SQL Grid not only allows selecting data from the Data Grid, using SQL ANSI-99 syntax, it makes it possible to modify that data with well-known DML statements like INSERT, UPDATE, or DELETE. By taking advantage of this ability, you can work with Apache Ignite In-Memory Data Fabric as with an in-memory distributed database fully relying on its SQL capabilities.

SQL ANSI-99 Compliance

DML queries, as well as all the SELECT queries, are SQL ANSI-99 compliant.

Ignite stores all the data in memory in form of key-value pairs and hence all the DML related operations are converted into corresponding cache key-value based commands like Cache.Put(...) or Cache.Invoke(...). Let's take a deep look at how the DML statements are implemented in Ignite.

DML API

In general, all the DML statements can be divided into two groups - Those that add new entries into a cache (INSERT and MERGE), and those that modify the existing data (UPDATE and DELETE).

To execute DML statements in C++ code, you should use the same Ignite API that is used for SELECT queries - SqlFieldsQuery API. This API is used by DML operations the same way it is used by read-only queries, where SqlFieldsQuery returns QueryFieldsCursor. The only difference is that as a result of a DML statement execution, QueryFieldsCursor contains a single-item QueryFieldsRow of int64_t type that signifies the number of cache items that were affected by the DML statement, whereas as a result of a SELECT statement, QueryFieldsCursor will contain a list of items retrieved from the cache.

Basic Configuration

DML does not require additional configuration on top of SQL Queries configuration.

In addition to all the fields defined with QueryEntity, there will be two special predefined fields _key and _val for every object type registered in SQL Grid. These predefined fields provide reference to key-value entries stored in a cache and can be used directly inside of DML statements.

// Preparing cache configuration.
IgniteConfiguration cfg;
Ignite grid = Ignition::Start(cfg);

// Starting the cache.
Cache<int32_t, Person> cache = grid.GetOrCreateCache<int32_t, Person>("myCache");

// Inserting a new key-value pair referring to prefedined `_key` and `_value`
// fields for Person type.
SqlFieldsQuery qry("INSERT INTO Person(_key, _val) VALUES(?, ?)");

qry.AddArgument(1);
qry.AddArgument(Person("John", "Smith"));

// Performing query.
cache.Query(qry);

If you prefer to work with concrete fields rather than the whole object value, you can execute a query like the one shown below:

Cache<int32_t, Person> cache = grid.GetOrCreateCache<int32_t, Person>("myCache");

SqlFieldsQuery qry("INSERT INTO Person(_key, firstName, lastName) VALUES(?, ?, ?)");

qry.AddArgument(1);
qry.AddArgument("John");
qry.AddArgument("Smith");

// Performing query.
cache.Query(qry);

Note that the DML engine will be able to recreate a Person object from firstName and lastName, and put it into the cache but those fields have to be defined using QueryEntity as described above.

Advanced Configuration

Custom Keys

If you use only predefined SQL data types for cache keys, as shown below, then there is no need to perform additional manipulation with DML related configuration.

Predefined SQL Data Types

  • all the signed integral types
  • bool,
  • float
  • double
  • std::string
  • ignite::Timestamp
  • ignite::Date
  • ignite::Guid
  • int8_t[]

However, once you decide to introduce a custom complex key and refer to its fields from DML statements, you have to set QueryField.IsKeyField to true in QueryEntity configuration.

<cacheConfiguration name="cars">
  <queryEntities>
	<queryEntity keyTypeName="CarKey" valueTypeName="Car">
	  <fields>
		<queryField fieldType="System.String" fieldTypeName="java.lang.String" isKeyField="true" name="VIN" />
		<queryField fieldType="System.Int32" fieldTypeName="java.lang.Integer" isKeyField="true" name="Id" />
		<queryField fieldType="System.String" fieldTypeName="java.lang.String" name="Make" />
		<queryField fieldType="System.Int32" fieldTypeName="java.lang.Integer" name="Year" />
	  </fields>
	</queryEntity>
  </queryEntities>
</cacheConfiguration>

DML Operations

MERGE

MERGE is one of the most straightforward operations because it is translated into cache.Put(...) and cache.PutAll(...) operations depending on the number of rows that need to be inserted or updated as part of the MERGE query.

The examples below show how to update the data set with a MERGE command by either providing a list of entries, or injecting a result of a subquery execution.

cache.Query(SqlFieldsQuery("MERGE INTO Person(_key, firstName, lastName)"
                           "values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
cache.QueryFields(new SqlFieldsQuery("MERGE INTO someCache.Person(_key, firstName, lastName) (SELECT _key + 1000, firstName, lastName " +
   	"FROM anotherCache.Person WHERE _key > ? AND _key < ?)", 100, 200);

INSERT

The difference between MERGE and INSERT commands is that the latter adds only those entries into a cache whose keys are not there yet.

If a single key-value pair is being added into a cache then, eventually, an INSERT statement will be converted into a cache.PutIfAbsent(...) operation. In other cases, when multiple key-value pairs are inserted, the DML engine creates an CacheEntryProcessor for each pair and uses cache.Invoke(...) to propagate the data into a cache.

The examples below show how to insert a data set with an INSERT command by either providing a list of entries or injecting a result of a subquery execution.

cache.Query(SqlFieldsQuery("INSERT INTO Person(_key, firstName, "
      "lastName) values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
SqlFieldsQuery qry("INSERT INTO someCache.Person(_key, firstName, lastName) "
                   "(SELECT _key + 1000, firstName, secondName "
                   "FROM anotherCache.Person WHERE _key > ? AND _key < ?)");

qry.AddArgument(100);
qry.AddArgument(200);
  
cache.Query(qry);

UPDATE

This operation updates values in a cache on per field basis.

Initially, SQL engine generates and executes a SELECT query based on the UPDATE WHERE clause and only after that it modifies the existing values that satisfy the clause result.

The modification is performed via cache.Invoke(...) operation. Basically, it means that once the result of the SELECT query is ready, SQL Engine will prepare a number of CacheEntryProcessors and will execute all of them using cache.invoke(...) operation. While the data is being modified using CacheEntryProcessors, additional checks are performed to make sure that nobody has interfered between the SELECT and the actual update.

The following example shows how to execute an UPDATE query in Apache Ignite.

cache.Query(SqlFieldsQuery("UPDATE Person set lastName = 'Jones' WHERE _key >= 2");

Inability to modify a key or its fields with an UPDATE query

The reason behind that is that the state of the key determines internal data layout and its consistency (key's hashing and affinity, indexes integrity). Hence, there is no way to update a key without removing it from cache. For example, the following query:

UPDATE _key = 11 where _key = 10;

may result in the following cache operations:

val = get(10);
put(11, val);
remove(10);

DELETE

DELETE statements' execution is split into two phases and is similar to the execution of UPDATE statements.

First, using a SELECT query, the SQL engine gathers those keys that satisfy the WHERE clause in the DELETE statement. Next, after having all those keys in place, it creates a number of CacheEntryProcessors and executes them with cache.Invoke(...). While the data is being deleted, additional checks are performed to make sure that nobody has interfered between the SELECT and the actual removal of the data.

The following example shows how to execute a DELETE query in Apache Ignite.

cache.Query(SqlFieldsQuery("DELETE FROM Person WHERE _key >= 2"));

Modifications Order

If a DML statement inserts/updates the whole value referring to _val field and at the same time tries to modify a field that belongs to _val, then the order in which the changes are applied is :

  • The _val is updated/inserted first.
  • The field gets updated.

The order never changes regardless of how you define it in the DML statement. For example, after the statement shown below gets executed, the final Person's value will be "Mike Smith", ignoring the fact that _val field appears after firstName in the query.

SqlFieldsQuery qry("INSERT INTO Person(_key, firstName, _val) VALUES(?, ?, ?)");

qry.AddArgument(1);
qry.AddArgument("Mike");
qry.AddArgument(Person("John", "Smith"));

cache.Query(qry);

This is similar to the execution of the query like the one below where _val appears before in the statement string.

SqlFieldsQuery qry("INSERT INTO Person(_key, _val, firstName) VALUES(?, ?, ?)");

qry.AddArgument(1);
qry.AddArgument(Person("John", "Smith"));
qry.AddArgument("Mike");

cache.Query(qry);

The order in which the changes are applied for _val and its fields is the same for INSERT, UPDATE and MERGE statements.

Concurrent Modifications

As explained above, UPDATE and DELETE statements generate SELECT queries internally in order to get a set of cache entries that have to be modified. The keys from the set are not locked and there is a chance that their values will be modified by other queries concurrently. A special technique is implemented by the DML engine that, first, avoids locking of keys and, second, guarantees that the values will be up-to-date at the time they will be updated by a DML statement.

Basically, the engine detects a subset of the cache entries which were modified concurrently and re-executes the SELECT statement limiting its scope to the modified keys only.

Let's say the following UPDATE statement is being executed.

// Adding the cache entry.
cache.Put(1, Person("John", "Smith");
          
// Updating the entry.          
cache.Query(SqlFieldsQuery("UPDATE Person set firstName = 'Mike'"
                           "WHERE lastName = 'Smith'");

Before firstName and lastName are updated, the DML engine will generate the SELECT query to get cache entries that satisfy theUPDATE statement's WHERE clause. The statement will be the following.

SELECT _key, _value, "Mike" from Person WHERE lastName = "Smith"

Right after that, the entry that was retrieved​ with the SELECT query can be updated concurrently.

cache.Put(1, Person("Sarah", "Connor"))

The DML engine will find out that the entry with key 1 was modified at the update phase of UPDATE query execution. After that, it will stop the update and will re-execute a modified version the SELECT query in order to get latest entries' values:

SELECT _key, _value, "Mike" from Person WHERE secondName = "Smith"
    AND _key IN (SELECT * FROM TABLE(KEY long = [ 1 ]))

This query will be executed only for outdated keys. In our example, there is only one key that is 1.

This process will repeat until the DML engine is sure at the update phase that all the entries, that are going to be updated, are up-to-date. The maximum number of attempts is 4. Presently there is no configuration parameter that can change this value.

DML engine does not re-execute the SELECT query for entries that are deleted concurrently​. The query is re-executed only for entries that are still in the cache.

Known Limitations

Subqueries in WHERE clause

SELECT queries used in INSERT and MERGE statements as well as SELECT queries automatically generated by UPDATE and DELETE operations will be distributed and executed in either colocated or non-colocated distributed modes if needed.

However, if there is a subquery that is executed as part of the WHERE clause, then it will not be executed in non-colocated distributed mode. The subquery will be executed in the colocated mode over the local data set all the times.

For example, in the following query:

DELETE FROM Person WHERE _key IN
    (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)

the DML engine will generate the SELECT query in order to get a list of entries that need to be deleted. The query will be distributed and executed across the cluster and will look like the one below:

SELECT _key, _val FROM Person WHERE _key IN
    (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)

However, the subquery from IN clause (SELECT personId FROM "salary".Salary ...) will not be distributed further and will be executed over the local data set present on a cluster node.

EXPLAIN support for DML statements

Presently, EXPLAIN is not supported for DML operations.

One possible approach is to execute EXPLAIN for the SELECT query that is automatically generated (UPDATE, DELETE) or used (INSERT, MERGE) by DML statements. This will give you an insight on the indexes that are used when a DML operation is executed.

Distributed DML