Deploying Airflow 2 on EKS using Terraform, Helm and ArgoCD — Part 1/2

Learn how to deploy Apache Airflow 2.x on Kubernets using ArgoCD, git-sync, Terraform and helm charts.

This is part 1/2 of the article. If you are looking for the second part, please check it here.

There are plenty of available tools to be used in Data Engineering tasks. Thanks to internet we can find several tutorials about how to install those tools and create some deployments by ourselves.

Nevertheless, integrating such tools in a complete deployment is not straightforward sometimes. The idea of this article is to show you how to deploy Apache Airflow 2.x on an AWS EKS Cluster using Helm charts and ArgoCD.

In this article I hope to help you understanding how to integrate these amazing tools and how to deploy helm charts on EKS using Terraform. Also, how to deploy other helm charts using ArgoCD with GitSync.

Prerequisites

Important

In case you follow this tutorial, you can be billed by AWS resources you use.

The Stack

Since there are many tutorials on internet about each of the tools we are going to use here, I will briefly describe each of them:

  • Terraform: it is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files.
  • AWS EKS: Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications.
  • Helm Chart: A chart is a collection of files that describe a related set of Kubernetes resources. A single chart might be used to deploy something simple, like a memcached pod, or something complex, like a full web app stack with HTTP servers, databases, caches, and so on.
  • Apache Airflow: Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.
  • ArgoCD: Argo CD automates the deployment of the desired application states in the specified target environments. Application deployments can track updates to branches, tags, or pinned to a specific version of manifests at a Git commit.

Project Structure

  • airflow: it contains the dags we want to deploy in our Apache Airflow environment;
  • infrastructure: it contains all infrastructure configuration files for Terraform and Kubernetes
  • kubernetes: it contains all the helm configuration we want to deploy using ArgoCD
  • terraform: it contains terraform configuration files to bring the cloud infra up

Terraform Setup

Inside terraform folder we will create two more folders: infra and applications

  • infra: contains general infrastructure configuration like VPC, EKS and RDS
  • applications: contains terraform configuration files for Helm Provider

Since we have a lot of files to configure, I will mention only the most important parts of the code here. The full project you can find on link below:

  • infrastructure/terraform/versions.tf

We need to set the terraform version we are going to use and which providers we want to be installed.

  • infrastructure/terraform/main.tf

Modules:

  • vpc: it creates a vpc and we use it to deploy all the resources.
  • ec2: there is a security group that we need to create in order to enable communication between EKS and RDS Postgres
  • rds: airflow needs a database to handle metadata and other configurations. Using RDS is a good choice for this purpose.
  • eks: it contains the cluster configuration and application deployments

EKS Module

Let’s talk about the EKS Cluster configuration.

We first need to setup the cluster. Terraform provides some open-source and well implemented modules in order to make complex deployments easier. If you are going to use them or not depends on your environment policies and customization level you need.

For this tutorial, we can go ahead since we need only the most common configurations.

Cluster Configuration

  • cluster_name: the name of your cluster
  • cluster_version: I am using 1.21
  • vpc_id: the VPC we created using VPC module
  • subnets: the subnets we created using VPC module
  • cluster_log_retention_in_days: how many days you want to keep the logs
  • cluster_endpoint_public_access: we are using True in this tutorial but this is not recommended in production.
  • worker_additional_security_group_ids: security group to be used among worker nodes
  • worker_groups: we set the server worker nodes configuration

This file is also setting up a security group to enable communication between working nodes and a Kubernetes Namespace for Airflow.

Below code snippet shows this configuration:

Also, a Kubernetes Secret is created to handle sensitive database connection settings.

Now it’s time to deploy an application into our Kubernetes Cluster.

As you can see in the picture above, we have the applications folder. Let’s take a look into it.

You will find a folder called kubernetes with the following content:

Both folders have terraform files and we are going to use helm provider in order to deploy the applications we want.

Let’s take argocd as an exaple:

  • We are creating a kubernetes_namespace called argocd and we will use it to deploy argocd using helm;
  • We need to configure the helm provider and set which cluster we are going to use;
  • Last but not least, we need to create a “helm_release” resource and set the helm repository from which we want to deploy the ArgoCD application;
  • There are more detailed configuration that can be changed at this point, but we are going with the default installation for ArgoCD
  • argocd: it is the application we will use to deploy other projects into our Kubernetes Cluster
  • kubedashboard: it’s an application to help us to visualize what is happening in our cluster

Deploy Terraform

In order to deploy the terraform project follow below commands inside your terraform folder (airflow-kubernetes-iac/infrastructure/terraform):

  1. terraform init
  2. terraform validate
  3. terraform apply -var-file terraform.tfvars

After sometime, you should see a message like below:

This is how your EKS cluster should look like after the deployment:

How to access the Kubernetes Dashboard

  1. First you need to configure your aws cli to access the EKS cluster

2. Get temporary token to access the Kubernetes Dashboard

3. Open another terminal and run the following script:

Leave this terminal running. Don’t close it otherwise you will interrupt the connection between your local machine and EKS cluster

4. Open the Kubernetes Dashboard on your web browser:

http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:443/proxy/#/login

5. Type the token you got on Step 2 into the “Enter token *” field

6. Click on Sign In

7. Great! Now you can see what is going on in your Kubernetes cluster running on EKS!

8. In the combobox you see the “default” namespace selected. Change it to “argocd”. Now you can see the ArgoCD which we deployed using terraform as same as the Dashboard.

How to access ArgoCD UI

  1. Open the following URL in your web browser:

http://localhost:8001/api/v1/namespaces/argocd/services/https:argocd-server:443/proxy/

2. Username: admin

3. Password:

Run in your terminal to get the password:

4. Click on Sign In

5. Great! You have just access the ArgoCD UI which we have deployed using IaC with Terraform!

Summary

Great! Now you know how to start deploying EKS and Helm charts using Terraform. This knowledge can be reproduced with any other Helm Chart you want to deploy.

There a lot of helm charts and you can find some of them on this website:

In part 2/2 of this article we are going to configure the ArgoCD to access our repository and deploy Apache Airflow 2.0 using helm chart. It will use GitOps to get any change we commit into our repo and trigger the deployment automatically.

Attention

If you let this infrastructure up and running you will be billed by the time it was kept up. So please make sure you destroy everything after your study.

--

--

I’m a Data Engineer and guitar player.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store