Integration of Kubernetes with Apache Airflow
There are two way of using Apache Airflow with Kubernetes:
By using an Operator with the KubernetesPodOperator:
- It executes a specific task in a Kubernetes Pod where the Kubernetes cluster is external
- It allows you to deploy arbitrary Docker images
- You basically offload dependencies to containers (which is great!)
Or by using the KubernetesExecutor:
- A new POD for every task instance
- You can customise your tasks (resource allocation)
- Like with the POD executor, you offload dependencies to containers
- You make your Airflow cluster dynamic! No more idle nodes wasting resources like with the Celery Executor.
- You Airflow cluster becomes fault tolerant (state recovery)
- and so on
For a quick experiment, you can follow the tutorial I just made right here: https://marclamberti.com/blog/airflow-kubernetes-executor/
I hope it helps :)
Cheers
This is in flight right now. You just can follow along with this major jira ticket
One of the more stable branches (work is being led by a lot of this team) is located in the bloomberg fork on github in the airflow-kubernetes-executor branch though it is in the process of being rebased off of a constantly moving airflow master.
I have a branch on my fork that addresses many of the short term issues and runs well enough called frankensteins-monster. Use this at your own risk though it works for me right now. I am building a docker image using the build.sh
script located in scripts/ci/kubernetes/docker
.
Good luck!