Provision EKS Cluster with Terraform, Terragrunt & GitHub Actions
As cloud-native architectures continue to gain momentum, Kubernetes has emerged as the de facto standard for container orchestration. Amazon Elastic Kubernetes Service (EKS) is a popular managed Kubernetes service that simplifies the deployment and management of containerized applications on AWS. To streamline the process of provisioning an EKS cluster and automate infrastructure management, developers and DevOps teams often turn to tools like Terraform, Terragrunt, and GitHub Actions.
In this article, we will explore the seamless integration of these tools to provision an EKS cluster on AWS, delving into the benefits of using them in combination, the key concepts involved, and the step-by-step process to set up an EKS cluster using infrastructure-as-code principles.
Whether you are a developer, a DevOps engineer, or an infrastructure enthusiast, this article will serve as a comprehensive guide to help you leverage the power of Terraform, Terragrunt, and GitHub Actions in provisioning and managing your EKS clusters efficiently. Before diving in though, there are a few things to note.
Write Terraform code for building blocks.
Write Terragrunt code to provision infrastructure.
Create a GitHub Actions workflow and delegate the infrastructure provisioning task to it.
Add a GitHub Actions workflow job to destroy our infrastructure when we're done.
Below is a diagram of the VPC and its components that we'll create, bearing in mind that the control plane components will be deployed in an EKS-managed VPC:
1. Write Terraform code for building blocks
Each building block will have the following files:
We'll be using version 4.x of the AWS provider for Terraform, so the provider.tf file will be the same in all building blocks:
We can see a few variables here that will also be used by all building blocks in the variables.tf file:
So when defining the building blocks in the following, these variables won't be explicitly defined, but you should have them in your variables.tf file.
a) VPC building block
b) Internet Gateway building block
c) Route Table building block
d) Subnet building block
e) Elastic IP building block
f) NAT Gateway building block
g) NACL
h) Security Group building block
i) EC2 building block
j) IAM Role building block
k) Instance Profile building block
l) EKS Cluster building block
m) EKS Add-ons building block
n) EKS Node Group building block
o) IAM OIDC building block (to allow pods to assume IAM roles)
With the building blocks defined, we can now version them into GitHub repositories and use them in the next step to develop our Terragrunt code.
2. Write Terragrunt code to provision infrastructure
Our Terragrunt code will have the following directory structure:
For our article, we'll only have a dev directory. This directory will contain directories that will represent the different specific resources we'll want to create.
Our final folder structure will be:
a) infra-live/terragrunt.hcl
Our root terragrunt.hcl file will contain the configuration for our remote Terraform state. We'll use an S3 bucket in AWS to store our Terraform state file, and the name of our S3 bucket must be unique for it to be successfully created. This bucket must be created before applying any terragrunt configuration. My S3 bucket is in the N. Virginia region (us-east-1).
Make sure you replace with the name of your own S3 bucket.
b) infra-live/dev/vpc/terragrunt.hcl
This module uses the VPC building block to create our VPC. Our VPC CIDR will be .
The values passed in the inputs section are the variables that are defined in the building blocks.
For this module and the following modules, we won't be passing the variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION since such credentials (bar the AWS_REGION variable) are sensitive. You'll have to add them as secrets in the GitHub repository you'll create to version your Terragrunt code.
c) infra-live/dev/internet-gateway/terragrunt.hcl
This module uses the Internet Gateway building block as its Terraform source to create our VPC's internet gateway.
d) infra-live/dev/public-route-table/terragrunt.hcl
This module uses the Route Table building block as its Terraform source to create our VPC's public route table to be associated with the public subnet we'll create next.
It also adds a route to direct all internet traffic to the internet gateway.
e) infra-live/dev/public-subnets/terragrunt.hcl
This module uses the Subnet building block as its Terraform source to create our VPC's public subnet and associate it with the public route table.
The CIDR for the public subnet will be 10.0.0.0/24.
f) infra-live/dev/nat-gw-eip/terragrunt.hcl
This module uses the Elastic IP building block as its Terraform source to create a static IP in our VPC which we'll associate with the NAT gateway we'll create next.
g) infra-live/dev/nat-gateway/terragrunt.hcl
This module uses the NAT Gateway building block as its Terraform to create a NAT Gateway that we'll place in our VPC's public subnet. It will have the previously created elastic IP attached to it.
h) infra-live/dev/private-route-table/terragrunt.hcl
This module uses the Route Table building block as its Terraform source to create our VPC's private route table to be associated with the private subnets we'll create next.
It also adds a route to direct all internet traffic to the NAT gateway.
i) infra-live/dev/private-subnets/terragrunt.hcl
This module uses the Subnet building block as its Terraform source to create our VPC's private subnets and associate them with the private route table.
The CIDRs for the app private subnets will be 10.0.100.0/24 (us-east-1a) and 10.0.200.0/24 (us-east-1b), and those for the DB private subnets will be 10.0.10.0/24 (us-east-1a) and 10.0.20.0/24 (us-east-1b).
j) infra-live/dev/nacl/terragrunt.hcl
This module uses the NACL building block as its Terraform source to create NACLs for our public and private subnets.
For the sake of simplicity, we'll configure very loose NACL and security group rules, but in the next blog post, we'll enforce security rules for the VPC and cluster.
Note, though, that the data subnet's NACLs only allow traffic on port 5432 from the app subnet CIDRs.
k) infra-live/dev/security-group/terragrunt.hcl
This module uses the Security Group building block as its Terraform source to create a security group for our nodes and bastion host.
Again, its rules are going to be very loose, but we'll correct that in the next article.
l) infra-live/dev/bastion-role/terragrunt.hcl
This module uses the IAM Role building block as its Terraform source to create an IAM role with the permissions that our bastion host will need to perform EKS actions and to be managed by Systems Manager.
m) infra-live/dev/bastion-instance-profile/terragrunt.hcl
This module uses the *Instance Profile building block as its Terraform source to create an IAM instance profile for our bastion host. The IAM role created in the previous step is attached to this instance profile.
n) infra-live/dev/bastion-ec2/terragrunt.hcl
This module uses the EC2 building block as its Terraform source to create an EC2 instance which we'll use as a jump box (or bastion host) to manage the worker nodes in our EKS cluster.
The bastion host will be placed in our public subnet and will have the instance profile we created in the previous step attached to it, as well as our loose security group.
It is a Linux instance of type t2.micro using the Amazon Linux 2023 AMI with a user data script configured. This script will be defined in the next step.
o) infra-live/dev/bastion-ec2/user-data.sh
This user data script installs the AWS CLI, as well as the and tools. It also configures an alias for the utility (), and bash completion for it.
p) infra-live/dev/eks-cluster-role/terragrunt.hcl
This module uses the IAM Role building block as its Terraform source to create an IAM role for the EKS cluster. It has the managed policy AmazonEKSClusterPolicy attached to it.
q) infra-live/dev/eks-cluster/terragrunt.hcl
This module uses the EKS Cluster building block as its Terraform source to create an EKS cluster which uses the IAM role created in the previous step.
The cluster will provision ENIs (Elastic Network Interfaces) in the private subnets we had created, which will be used by the EKS worker nodes.
The cluster also has various cluster log types enabled for auditing purposes.
r) infra-live/dev/eks-addons/terragrunt.hcl
This module uses the EKS Add-ons building block as its Terraform source to activate add-ons for our EKS cluster.
This is very important, given that these add-ons can help with networking within the AWS VPC using the VPC features (), cluster domain name resolution (), maintaining network connectivity between services and pods in the cluster (), managing IAM credentials in the cluster (), or allowing EKS to manage the lifecycle of EBS volumes ().
s) infra-live/dev/worker-node-role/terragrunt.hcl
This module uses the IAM Role building block as its Terraform source to create an IAM role for the EKS worker nodes.
This role grants the node group permissions to carry out its operations within the cluster, and for its nodes to be managed by Systems Manager.
t) infra-live/dev/eks-node-group/terragrunt.hcl
This module uses the EKS Node Group building block as its Terraform source to create a node group in the cluster.
The nodes in the node group will be provisioned in the VPC's private subnets, and we'll be using on-demand Linux instances of type with the AMI and disk size of . We use an instance because it supports trunking, which we'll need in the next article to deploy pods and associate security groups to them.
u) infra-live/dev/eks-pod-iam/terragrunt.hcl
This module uses the IAM OIDC building block as its Terraform source to create resources that will allow pods to assume IAM roles and communicate with other AWS services.
Having done all this, we now need to create a GitHub repository for our Terragrunt code and push our code to that repository. We should also configure repository secrets for our AWS credentials (, , ) and an SSH private key that we'll use to access the repositories with our Terraform building blocks.
Once that is done, we can proceed to create a GitHub Actions workflow to automate the provisioning of our infrastructure.
3. Create a GitHub Actions workflow for Automated Infrastructure Provisioning
Now that our code has been versioned, we can write a workflow that will be triggered whenever we push code to the main branch (use whichever branch you prefer, like master). Ideally, this workflow should only be triggered after a pull request has been approved to merge to the main branch, but we'll keep it simple for illustration purposes.
The first thing will be to create a in the root directory of your project. You can then create a YAML file within this directory called deploy.yml, for example.
We'll add the following code to our file to handle the provisioning of our infrastructure:
Let's break down what this file does:
a) The line names our workflow Deploy
b) The following lines of code tell GitHub to trigger this workflow whenever code is pushed to the main branch or a pull request is merged to the main branch:
c) Then we define our job called terraform using the lines below, telling GitHub to use a runner that runs on the latest version of Ubuntu. Think of a runner as the GitHub server executing the commands in this workflow file for us:
d) We then define a series of steps or blocks of commands that will be executed in order. The first step uses a GitHub action to checkout our infra-live repository into the runner so that we can start working with it:
The next step uses another GitHub action to help us easily set up SSH on the GitHub runner using the private key we had defined as a repository secret:
The following step uses yet another GitHub action to help us easily install Terraform on the GitHub runner, specifying the exact version that we need:
Then we use another step to execute a series of commands that install Terragrunt on the GitHub runner. We use the command to check the version of Terragrunt installed and confirm that the installation was successful:
Finally, we use a step to apply our Terraform changes, then we use a series of commands to retrieve the public IP address of our provisioned EC2 instance and save it to a file called public_ip.txt:
And that's it! We can now watch the pipeline get triggered when we push code to our main branch, and see how our EKS cluster gets provisioned.
In the next article, we'll secure our cluster then access our bastion host and get our hands dirty with real Kubernetes action!