Limit Airflow DAG Visibility By AD/LDAP Groups
I think there are two different problems posed here:
First, LDAP authentication. Airflow provides support for LDAP authentication built on ldap3. The example in the linked doc shows how to associate Airflow roles with LDAP groups (e.g., the data_profiler_filter
part).
Second, restricting DAG access by group. As of the time of this writing, the current version of Airflow (1.9), doesn't support limiting visibility of DAGs by group. The recent work on role-based access control (RBAC) changes this. I've listed 3 different options for addressing this problem below.
Option 1 - RBAC (most control, available in Airflow ≥ 1.10)
The new RBAC features add support for permissions like this and is the best for fine-grained control. It uses a permission system built on Flask App Builder. This was created by a company with a very similar use case to what you mentioned which is discussed in more detail in the Jira issue.
More info can be found in:
- RBAC proposal
- AIRFLOW-85 - Create DAGs UI
- PR #3015
The RBAC webserver UI is available on master now in airflow/www_rbac. Other features around RBAC are also being actively developed to further improve security in a multi-tenancy setup.
Note: There's also ongoing work on a new DAG-level access control (DLAC) feature in AIRFLOW-2267 which builds upon the RBAC work to introduce even more fine-grained control. More info can be found in the design doc and PR #3197.
Option 2 - Multi-tenancy with owners (simplest, available in Airflow < 1.10)
A second option you can consider for medium-grained control is a multi-tenancy setup using webserver.filter_by_owner
and setting one explicit owner
(a user, not a group) for each DAG. "With this, a user will see only the dags which it is owner of, unless it is a superuser."
Aside: A related feature you might be interested in running tasks as a specific user with impersonation using run_as_user
or core.default_impersonation
.
Option 3 - Run multiple separate Airflow instances (highest isolation)
A third option for coarse-grained control that some companies choose is to run multiple separate Airflow instances, one per team. This is probably the most practical for those looking to run multiple teams' DAGs in isolation today. If you happen to use Astronomer Enterprise, we support spinning up multiple Airflow instances.