AWS Managed Airflow + EKS Scheduling, Part 2: Scheduling DAGs on EKS

posted-in:  Kubernetes   EKS   AWS   MWAA   Airflow  

If you completed part 1 on this guide and created the Managed Airflow Environment on AWS, you are ready for the next level -

Part 2 - Enabling EKS scheduling

In Airflow we often use KubernetesPodOperator DAG class. Airflow contacts k8s API-server and asks it to spawn a pod to perform the task. In this part of the tutorial you will connect MWAA to an EKS cluster on the same VPC, and schedule an example pod on top of it.

Pre-Requests:

You need an EKS cluster. if you are familiar with terraform, you can do it in one click using my EKS module

IAM Considerations:

We will add another permission to the MWAA execution role from part 1.

locals {
  region = "xx-xxxx-x"
  account_id = "xxxxxxxxxx"
  eks_cluster_name = "xxxxxxxxxx"
}

resource "aws_iam_policy" "amazon_mwaa_eks_scheduling_policy" {
  name   = "amazon_mwaa_eks_scheduling_policy"
  path   = "/"
  policy = <<POLICY
{
    "Version": "2012-10-17",
    "Statement": [
               {
            "Effect": "Allow",
            "Action": [
                "eks:DescribeCluster"
            ],
            "Resource": "arn:aws:eks:${local.region}:${local.account_id}:cluster/${local.eks_cluster_name}"
        }     
    ]
}
POLICY
}

add the aws_iam_policy.amazon_mwaa_eks_scheduling_policy.arn to the managed_policy_arns list in the execution role:

resource "aws_iam_role" "mwaa_role" {
...
  managed_policy_arns = [
    aws_iam_policy.amazon_mwaa_policy.arn,
    # Add the new policy:
    aws_iam_policy.amazon_mwaa_eks_scheduling_policy.arn,
  ]
}

Complete TF source can be found here

EKS Considerations:

MWAA is outside the cluster scope. In order for managed Airflow to reach out for the API service, it will need an entry in the aws-auth configmap in the kube-system namespace. we are basically registering the MWAA execution role as a k8s user:

    userarn  = "arn:aws:iam::<account_id>:role/airflow-mwaa-role"
    username = "mwaa-service"
    groups   = ["system:masters"]

also you will need to kubectl apply the RBAC Role and RoleBinding:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: mwaa-role
  namespace: default
rules:
  - apiGroups:
      - ""
      - "apps"
      - "batch"
      - "extensions"
    resources:      
      - "jobs"
      - "pods"
      - "pods/attach"
      - "pods/exec"
      - "pods/log"
      - "pods/portforward"
      - "secrets"
      - "services"
    verbs:
      - "create"
      - "delete"
      - "describe"
      - "get"
      - "list"
      - "patch"
      - "update"
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: mwaa-role-binding
  namespace: default
subjects:
- kind: User
  name: mwaa-service
roleRef:
  kind: Role
  name: mwaa-role
  apiGroup: rbac.authorization.k8s.io

Final Step - Kubeconfig:

Now the last task is to give MWAA a kubeconfig to use. generate one with:

aws eks update-kubeconfig \
--region your-region \
--kubeconfig ./kube_config.yaml \
--name mwaa-eks \
--alias aws

and upload the generated file to the source bucket, to the same path of the dags:

aws s3 cp kube_config.yaml s3://my-mwaa-source/mwaa_source_example/dags

You will be adding this config file path to DAGs using KubernetesPodOperator. view example

All Done!

Head over to MWAA Airflow UI, enable and trigger the kubernetes_pod_example DAG. Once the DAG is started, inspect the created pod:

kubectl get po -n default -w

you will see the pod being spawned by airflow.

it didn’t work? troubleshoot