Applying data retention in MongoDB when using Siddhi

Workaround data representation in Siddhi to apply MongoDB index with TTL

Alexander Goida
3 min readJan 4, 2021

Siddhi doesn’t support date/time type and can represent date/time values in strings and big integers. In systems which use date/time values to apply TTL logic it’s not possible to use Siddhi without workarounds. For example, in MongoDB it’s possible to create indexes with TTL settings which makes entries to expire at specific time, but such indexes require ISODate type which is not supported by Siddhi. In this article I’ll explain a workaround to apply data retention in MongoDB when you use Siddhi.

Siddhi is an open source, complex event processing engine. It helps quickly build data pipelines (which have a small number of transformations) and real-time data scanners which incorporate complex patterns, like window and sequence patterns.

Approach Overview

There are two problems in our solution: #1 Siddhi cannot create ISODate fields in MongoDB collections and #2 collections might be created by Siddhi on-the-fly so that we need an automatic way of creating the field and the index. In order to achieve this we will need to create an additional routine in the ecosystem, which will be responsible for creating indexes and required fields for it. The overall solution might look as on the following picture. Its clarification further in the text.

Creating of an ISODate field

Siddhi can get current milliseconds returning long type: time:timestampInMilliseconds(). We can use it to store the value in a collection. It’s possible to create one field using another as a source in MongoDB by means of using the command db.collection.updateMany and its aggregation pipeline. We also need to skip all records which already have the new field.

For example, you have a collection named test and a field dt which is stored by Siddhi. In order to create a new field date of ISODate type you can use the following command.

db.test.updateMany(
{date: {$exists: false}},
[
{
$set: {
date: { $toDate: '$dt'}
}
}
]);

This command will skip all records which have the date field and create it in all other records, using milliseconds in dt field as a source for its values. In order to test it you can try to insert a record at first.

db.test.insert([{text: "b2", dt: new Date().getTime()}]);

Creating an index with TTL in MongoDB

MongoDB supports indexes with the TTL setting. It requires an ISODate type of the field which it’s created for. MongoDB has a background process which checks the index fields and removes expired ones.

The TTL feature relies on a background thread in mongod that reads the date-typed values in the index and removes expired documents from the collection.

If you use a MongoDB cluster with secondary nodes, the background process is running on the master node only.

In our case we need to create an index for the field date. Let’s create one which will expire records after 5 seconds.

db.test.createIndex(
{
date: 1
},
{
name: "idx_ttl_date",
expireAfterSeconds: 5
}
);

The field might be missing in the collection, this won’t stop index creation.

Automating the Process

The gap in such a system is the ability to automatically create fields in records and indexes for new collections. In order to do this we need a separate routine. For example, it might be scheduled .NET Core or Python function. It should be able to do the following:

  1. To get a list of all collections in database
  2. To check if they have index with TTL
  3. To create the field which is used in the index

We need to follow some convention here, because a generic solution would be overcomplicated. We could agree on specific names of fields and indexes. For example, Siddhi field could be called retention_timestamp, ISODate field could be called retention_date and the index could be called <collection_name>_ttl_index. There are several commands which will be helpful here as well:

db.getCollectionNames(); -- returns a list of collections
db.test.getIndexes(); -- returns indexes of a collection

The scheduled routine could take a database name and check all collections for presence of the index and create a new field, skipping already updated records.

--

--

Alexander Goida

Software Architect in Cloud Services and Data Solutions