Apache Ignite C++ SQL Grid not only allows selecting data from the Data Grid, using SQL ANSI-99 syntax, it makes it possible to modify that data with well-known DML statements like INSERT, UPDATE, or DELETE. By taking advantage of this ability, you can work with Apache Ignite In-Memory Data Fabric as with an in-memory distributed database fully relying on its SQL capabilities.
SQL ANSI-99 Compliance
DML queries, as well as all the
SELECT queries, are SQL ANSI-99 compliant.
Ignite stores all the data in memory in form of key-value pairs and hence all the DML related operations are converted into corresponding cache key-value based commands like
Cache.Invoke(...). Let's take a deep look at how the DML statements are implemented in Ignite.
In general, all the DML statements can be divided into two groups - Those that add new entries into a cache (
MERGE), and those that modify the existing data (
To execute DML statements in C++ code, you should use the same Ignite API that is used for
SELECT queries -
SqlFieldsQuery API. This API is used by DML operations the same way it is used by read-only queries, where
QueryFieldsCursor. The only difference is that as a result of a DML statement execution,
QueryFieldsCursor contains a single-item
int64_t type that signifies the number of cache items that were affected by the DML statement, whereas as a result of a
QueryFieldsCursor will contain a list of items retrieved from the cache.
DML does not require additional configuration on top of SQL Queries configuration.
In addition to all the fields defined with
QueryEntity, there will be two special predefined fields
_val for every object type registered in SQL Grid. These predefined fields provide reference to key-value entries stored in a cache and can be used directly inside of DML statements.
// Preparing cache configuration. IgniteConfiguration cfg; Ignite grid = Ignition::Start(cfg); // Starting the cache. Cache<int32_t, Person> cache = grid.GetOrCreateCache<int32_t, Person>("myCache"); // Inserting a new key-value pair referring to prefedined `_key` and `_value` // fields for Person type. SqlFieldsQuery qry("INSERT INTO Person(_key, _val) VALUES(?, ?)"); qry.AddArgument(1); qry.AddArgument(Person("John", "Smith")); // Performing query. cache.Query(qry);
If you prefer to work with concrete fields rather than the whole object value, you can execute a query like the one shown below:
Cache<int32_t, Person> cache = grid.GetOrCreateCache<int32_t, Person>("myCache"); SqlFieldsQuery qry("INSERT INTO Person(_key, firstName, lastName) VALUES(?, ?, ?)"); qry.AddArgument(1); qry.AddArgument("John"); qry.AddArgument("Smith"); // Performing query. cache.Query(qry);
Note that the DML engine will be able to recreate a Person object from
lastName, and put it into the cache but those fields have to be defined using
QueryEntity as described above.
If you use only predefined SQL data types for cache keys, as shown below, then there is no need to perform additional manipulation with DML related configuration.
Predefined SQL Data Types
- all the signed integral types
However, once you decide to introduce a custom complex key and refer to its fields from DML statements, you have to set
<cacheConfiguration name="cars"> <queryEntities> <queryEntity keyTypeName="CarKey" valueTypeName="Car"> <fields> <queryField fieldType="System.String" fieldTypeName="java.lang.String" isKeyField="true" name="VIN" /> <queryField fieldType="System.Int32" fieldTypeName="java.lang.Integer" isKeyField="true" name="Id" /> <queryField fieldType="System.String" fieldTypeName="java.lang.String" name="Make" /> <queryField fieldType="System.Int32" fieldTypeName="java.lang.Integer" name="Year" /> </fields> </queryEntity> </queryEntities> </cacheConfiguration>
MERGE is one of the most straightforward operations because it is translated into
cache.PutAll(...) operations depending on the number of rows that need to be inserted or updated as part of the
The examples below show how to update the data set with a
MERGE command by either providing a list of entries, or injecting a result of a subquery execution.
cache.Query(SqlFieldsQuery("MERGE INTO Person(_key, firstName, lastName)" "values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
cache.QueryFields(new SqlFieldsQuery("MERGE INTO someCache.Person(_key, firstName, lastName) (SELECT _key + 1000, firstName, lastName " + "FROM anotherCache.Person WHERE _key > ? AND _key < ?)", 100, 200);
The difference between
INSERT commands is that the latter adds only those entries into a cache whose keys are not there yet.
If a single key-value pair is being added into a cache then, eventually, an
INSERT statement will be converted into a
cache.PutIfAbsent(...) operation. In other cases, when multiple key-value pairs are inserted, the DML engine creates an
CacheEntryProcessor for each pair and uses
cache.Invoke(...) to propagate the data into a cache.
The examples below show how to insert a data set with an
INSERT command by either providing a list of entries or injecting a result of a subquery execution.
cache.Query(SqlFieldsQuery("INSERT INTO Person(_key, firstName, " "lastName) values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
SqlFieldsQuery qry("INSERT INTO someCache.Person(_key, firstName, lastName) " "(SELECT _key + 1000, firstName, secondName " "FROM anotherCache.Person WHERE _key > ? AND _key < ?)"); qry.AddArgument(100); qry.AddArgument(200); cache.Query(qry);
This operation updates values in a cache on per field basis.
Initially, SQL engine generates and executes a
SELECT query based on the
UPDATE WHERE clause and only after that it modifies the existing values that satisfy the clause result.
The modification is performed via
cache.Invoke(...) operation. Basically, it means that once the result of the
SELECT query is ready, SQL Engine will prepare a number of
CacheEntryProcessors and will execute all of them using
cache.invoke(...) operation. While the data is being modified using
CacheEntryProcessors, additional checks are performed to make sure that nobody has interfered between the
SELECT and the actual update.
The following example shows how to execute an
UPDATE query in Apache Ignite.
cache.Query(SqlFieldsQuery("UPDATE Person set lastName = 'Jones' WHERE _key >= 2");
Inability to modify a key or its fields with an UPDATE query
The reason behind that is that the state of the key determines internal data layout and its consistency (key's hashing and affinity, indexes integrity). Hence, there is no way to update a key without removing it from cache. For example, the following query:
UPDATE _key = 11 where _key = 10;
may result in the following cache operations:
val = get(10);
DELETE statements' execution is split into two phases and is similar to the execution of
First, using a
SELECT query, the SQL engine gathers those keys that satisfy the
WHERE clause in the
DELETE statement. Next, after having all those keys in place, it creates a number of
CacheEntryProcessors and executes them with
cache.Invoke(...). While the data is being deleted, additional checks are performed to make sure that nobody has interfered between the
SELECT and the actual removal of the data.
The following example shows how to execute a
DELETE query in Apache Ignite.
cache.Query(SqlFieldsQuery("DELETE FROM Person WHERE _key >= 2"));
If a DML statement inserts/updates the whole value referring to
_val field and at the same time tries to modify a field that belongs to
_val, then the order in which the changes are applied is :
_valis updated/inserted first.
- The field gets updated.
The order never changes regardless of how you define it in the DML statement. For example, after the statement shown below gets executed, the final Person's value will be "Mike Smith", ignoring the fact that
_val field appears after
firstName in the query.
SqlFieldsQuery qry("INSERT INTO Person(_key, firstName, _val) VALUES(?, ?, ?)"); qry.AddArgument(1); qry.AddArgument("Mike"); qry.AddArgument(Person("John", "Smith")); cache.Query(qry);
This is similar to the execution of the query like the one below where
_val appears before in the statement string.
SqlFieldsQuery qry("INSERT INTO Person(_key, _val, firstName) VALUES(?, ?, ?)"); qry.AddArgument(1); qry.AddArgument(Person("John", "Smith")); qry.AddArgument("Mike"); cache.Query(qry);
The order in which the changes are applied for
_val and its fields is the same for
As explained above,
DELETE statements generate
SELECT queries internally in order to get a set of cache entries that have to be modified. The keys from the set are not locked and there is a chance that their values will be modified by other queries concurrently. A special technique is implemented by the DML engine that, first, avoids locking of keys and, second, guarantees that the values will be up-to-date at the time they will be updated by a DML statement.
Basically, the engine detects a subset of the cache entries which were modified concurrently and re-executes the
SELECT statement limiting its scope to the modified keys only.
Let's say the following
UPDATE statement is being executed.
// Adding the cache entry. cache.Put(1, Person("John", "Smith"); // Updating the entry. cache.Query(SqlFieldsQuery("UPDATE Person set firstName = 'Mike'" "WHERE lastName = 'Smith'");
lastName are updated, the DML engine will generate the
SELECT query to get cache entries that satisfy the
WHERE clause. The statement will be the following.
SELECT _key, _value, "Mike" from Person WHERE lastName = "Smith"
Right after that, the entry that was retrieved with the
SELECT query can be updated concurrently.
cache.Put(1, Person("Sarah", "Connor"))
The DML engine will find out that the entry with key
1 was modified at the update phase of
UPDATE query execution. After that, it will stop the update and will re-execute a modified version the
SELECT query in order to get latest entries' values:
SELECT _key, _value, "Mike" from Person WHERE secondName = "Smith" AND _key IN (SELECT * FROM TABLE(KEY long = [ 1 ]))
This query will be executed only for outdated keys. In our example, there is only one key that is
This process will repeat until the DML engine is sure at the update phase that all the entries, that are going to be updated, are up-to-date. The maximum number of attempts is
4. Presently there is no configuration parameter that can change this value.
DML engine does not re-execute the
SELECT query for entries that are deleted concurrently. The query is re-executed only for entries that are still in the cache.
SELECT queries used in
MERGE statements as well as
SELECT queries automatically generated by
DELETE operations will be distributed and executed in either colocated or non-colocated distributed modes if needed.
However, if there is a subquery that is executed as part of the
WHERE clause, then it will not be executed in non-colocated distributed mode. The subquery will be executed in the colocated mode over the local data set all the times.
For example, in the following query:
DELETE FROM Person WHERE _key IN (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)
the DML engine will generate the
SELECT query in order to get a list of entries that need to be deleted. The query will be distributed and executed across the cluster and will look like the one below:
SELECT _key, _val FROM Person WHERE _key IN (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)
However, the subquery from
IN clause (
SELECT personId FROM "salary".Salary ...) will not be distributed further and will be executed over the local data set present on a cluster node.
EXPLAIN is not supported for DML operations.
One possible approach is to execute
EXPLAIN for the
SELECT query that is automatically generated (
DELETE) or used (
MERGE) by DML statements. This will give you an insight on the indexes that are used when a DML operation is executed.