Why should you consider Airflow for your data platform?

  • Cron jobs
  • .NET tools such as Hangfire and Quartz.NET
  • Jenkins
  • separated concerns of orchestration and tasks releasing;
  • providing an extensive control, logging and monitoring over tasks;
  • agnostic to programming language and capable to execute routines written on .NET, Bash, SQL, Python, Javascript/NodeJs and others;
  • short learning curve to start developing for it;
  • minimal efforts on understanding of configuration, its deployment and maintenance.
  • Jenkins as a workflow runner platform is good. But it’s no way a tool for orchestration of running of scheduled complex workflows, like those which are required for data processing. It also requires other tools to be integrated to implement your workflows.
  • Kubernetes and cron jobs offer scheduled routines. That’s it. You’ll need to develop your own subsystem to organize workflows. Almost the same problem as with Jenkins, just from the opposite side.
  • Hangfire, Quartz.NET or similar are the same things as Kubernetes jobs. They offer scheduling capability without real orchestration.

Airflow and alternatives

How we use Airflow

IaaC & Automation

LocalExecutor vs. CeleryExecutor vs. KubernetesExecutor

Airflow in few words

  • DAGs — scheduled workflows are called DAGs (Directed Acyclic Graphs), DAGs may consists of a lot of tasks and basically only resources are limiting its number (for example, in real life there are DAGs of 200+ tasks)
  • Tasks — they determine how a single routine should run in a DAG. You have full freedom of chaining tasks and controlling a data flow through them. They can produce and consume data from related tasks. Tasks are implemented in Python and usually utilize standard or custom Operators.
  • XComs — small messages (no more then 48Kb) which can be transmitted from task to task during workflow execution.
  • Operators — they determine what should be done, “a work” which needs to be executed. There are multiple standard operators which allow you to execute bash or python scripts, SQL statements, Docker images and many more. And you can write your own operators.
A sample DAG representation in Airflow with annotations what is what

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store