Integration of Kubernetes with Apache Airflow

There are two way of using Apache Airflow with Kubernetes:
By using an Operator with the KubernetesPodOperator:

  • It executes a specific task in a Kubernetes Pod where the Kubernetes cluster is external
  • It allows you to deploy arbitrary Docker images
  • You basically offload dependencies to containers (which is great!)

Or by using the KubernetesExecutor:

  • A new POD for every task instance
  • You can customise your tasks (resource allocation)
  • Like with the POD executor, you offload dependencies to containers
  • You make your Airflow cluster dynamic! No more idle nodes wasting resources like with the Celery Executor.
  • You Airflow cluster becomes fault tolerant (state recovery)
  • and so on

For a quick experiment, you can follow the tutorial I just made right here: https://marclamberti.com/blog/airflow-kubernetes-executor/

I hope it helps :)
Cheers


This is in flight right now. You just can follow along with this major jira ticket

One of the more stable branches (work is being led by a lot of this team) is located in the bloomberg fork on github in the airflow-kubernetes-executor branch though it is in the process of being rebased off of a constantly moving airflow master.

I have a branch on my fork that addresses many of the short term issues and runs well enough called frankensteins-monster. Use this at your own risk though it works for me right now. I am building a docker image using the build.sh script located in scripts/ci/kubernetes/docker.

Good luck!