Applying data retention in MongoDB when using Siddhi

Workaround data representation in Siddhi to apply MongoDB index with TTL

Siddhi doesn’t support date/time type and can represent date/time values in strings and big integers. In systems which use date/time values to apply TTL logic it’s not possible to use Siddhi without workarounds. For example, in MongoDB it’s possible to create indexes with TTL settings which makes entries to expire at specific time, but such indexes require ISODate type which is not supported by Siddhi. In this article I’ll explain a workaround to apply data retention in MongoDB when you use Siddhi.

Approach Overview

There are two problems in our solution: #1 Siddhi cannot create ISODate fields in MongoDB collections and #2 collections might be created by Siddhi on-the-fly so that we need an automatic way of creating the field and the index. In order to achieve this we will need to create an additional routine in the ecosystem, which will be responsible for creating indexes and required fields for it. The overall solution might look as on the following picture. Its clarification further in the text.

Creating of an ISODate field

Siddhi can get current milliseconds returning long type: time:timestampInMilliseconds(). We can use it to store the value in a collection. It’s possible to create one field using another as a source in MongoDB by means of using the command db.collection.updateMany and its aggregation pipeline. We also need to skip all records which already have the new field.

{date: {$exists: false}},
$set: {
date: { $toDate: '$dt'}
db.test.insert([{text: "b2", dt: new Date().getTime()}]);

Creating an index with TTL in MongoDB

MongoDB supports indexes with the TTL setting. It requires an ISODate type of the field which it’s created for. MongoDB has a background process which checks the index fields and removes expired ones.

date: 1
name: "idx_ttl_date",
expireAfterSeconds: 5

Automating the Process

The gap in such a system is the ability to automatically create fields in records and indexes for new collections. In order to do this we need a separate routine. For example, it might be scheduled .NET Core or Python function. It should be able to do the following:

  1. To check if they have index with TTL
  2. To create the field which is used in the index
db.getCollectionNames(); -- returns a list of collections
db.test.getIndexes(); -- returns indexes of a collection

All opinions are my own || Software Developer, learner, perfectionist and entrepreneur-kind person, nonconformist. Always seeks for the order and completeness.