While there are several ways to host container workloads in Azure, Azure Kubernetes Service (AKS) provides the easiest way to deploy Kubernetes for teams needing a full orchestration solution. AKS seems to gain new features every week. Depending on your needs deploying a repeatable, consistent AKS configuration can be challenging. Infrastructure-as-Code tools like Terraform bring this complexity under control (source control, that is!) Let’s take a look at spinning up an AKS cluster using Terraform.
The AKS cluster in this guide supports the following features:
- AKS-managed Azure Active Directory integration
- Azure Monitor for Containers
- Automatic AKS version upgrades
- Separate node pools for user and system workloads
- A system assigned managed cluster identity
- Autoscaling node pools
- Availability Zone Configuration
- Azure Policy for Kubernetes
Create Terraform Project
Our first step will be to configure Terraform settings and the providers we will need. (See our getting started guide for Terraform for more information). I’ll choose the latest versions of everything as of the time of this writing.
- Terraform 13.5
- AzureRM 2.33
- AzureAD 1.0
- Random 3.0
First create a file called main.tf
, then configure Terraform and the provider versions:
terraform {
# Use a recent version of Terraform
required_version = ">= 0.13"
# Map providers to thier sources, required in Terraform 13+
required_providers {
# Azure Active Directory 1.x
azuread = {
source = "hashicorp/azuread"
version = "~> 1.0"
}
# Azure Resource Manager 2.x
azurerm = {
source = "hashicorp/azurerm"
version = "~> 2.0"
}
# Random 3.x
random = {
source = "hashicorp/random"
version = "~> 3.0"
}
}
}
Next, some providers like AzureRM require additional configuration:
provider azurerm {
# v2.x required "features" block
features {}
}
Finally, I set up a few local variables, so they will be easy to update without having to change code in several places:
locals {
aks_cluster_name = "aks-${local.resource_group_name}"
location = "centralus"
resource_group_name = "b59"
}
Random Pet
HashiCorp’s random provider allows Terraform to generate random numbers, passwords, and unique identifiers. You can use these random values with various Azure resources. They are especially important for resources that require globally unique names like Log Analytics workspaces and Azure Storage accounts. The random random_pet
resource is a fun alternative to using GUIDs in resource names. Since we will need globally unique names for some of our resources, I’ll add a random_pet
instance to the bottom of main.tf
.
resource "random_pet" "primary" {}
The random pet resource has a few properties, but all are optional, so I’ve accepted the defaults.
Azure Resource Group
The configuration so far provides enough context for Terraform to initialize. But to deploy AKS, we will need a resource group to place the cluster’s Kubernetes API into. Every Azure resource needs a resource group to live in, and you should group similar resources together. Each team needs to decide what “similar” means to them. However, the best practice is to group resources by lifecycle. This means that anything I would naturally create or delete when I create or delete my AKS cluster should exist in the same resource group as my cluster.
To add a resource group to my configuration, I create a new file called resource-group.tf
. Inside the file, I describe my resource group to Terraform.
resource "azurerm_resource_group" "primary" {
location = local.location
name = local.resource_group_name
}
The AzureRM provider for Terraform exposes the azurerm_resource_group
resource type for managing Azure resource groups. This simple resource type requires only two property configurations.
name
– The name of the resource group.location
– The Azure Region to store the resource group metadata in.
Azure Log Analytics Workspace
The Azure Monitor for Containers (also known as Container Insights) feature provides performance monitoring for workloads running in the Kubernetes cluster workload. In contrast, the AKS diagnostic settings provide access to logs and metrics for the Kubernetes API component. Monitoring both will be critical to successful Kubernetes operations. Before deploying the AKS cluster, we’ll deploy a Log Analytics Workspace to support Azure Monitor for Containers.
To add the Log Analytics Workspace, create a new file called log-analytics.tf
, and make the azurerm_log_analytics_workspace
resource with the properties shown below.
resource "azurerm_log_analytics_workspace" "insights" {
name = "logs-${random_pet.primary.id}"
location = azurerm_resource_group.primary.location
resource_group_name = azurerm_resource_group.primary.name
retention_in_days = 30
}
The Log Analytics workspace configuration is as follows:
name
– The resource name is a simple value combined with our random pet name for uniqueness.location
– A reference to the resource group resource ensures the Log Analytics workspace deploys in the same region.resource_group_name
– Another resource group reference indicates which resource group should contain the workspace.retention_in_days
– Although this property is optional, I prefer setting it explicitly for clarity. A value of 30 days ensures we do not incur additional retention charges.
Note: The Azure Log Analytics workspace name must be unique across all Azure Subscriptions because it is exposed through DNS.
Azure Active Directory Group
Azure Kubernetes Service (AKS) requires that we provide an Azure Active Directory (AAD) group to enable AKS-managed AAD integration. The managed integration option dramatically simplifies the role-based access control (RBAC) setup. It also activates the Kubernetes resource viewer preview feature. Although this feature is called a “viewer,” it can change Kubernetes resources directly from the portal without using kubectl
or the Kubernetes dashboard. Azure Monitor for Containers provides a great read-only and historical view. The Kubernetes resource viewer allows direct control.
Once set up, the group will have full administrative rights to the cluster, and you can give multiple groups. You can select an existing administration group from AAD. For this guide, I will create a new, empty group and add myself to it later. I prefer the idea of tying the administrative group to the cluster and allowing Terraform to clean up the group when I decide I no longer need the associated AKS instance.
To create a new, empty group, add a new file called aks-administrators-group.tf
and add the following terraform resource:
resource "azuread_group" "aks_administrators" {
name = "${local.aks_cluster_name}-administrators"
description = "Kubernetes administrators for the ${local.aks_cluster_name} cluster."
}
Creating our administrator group introduces our third Terraform provider: azuread
. The resource to create an empty group is simple and requires one property. The description is optional but highly recommended.
name
– The group name.description
– Used to provide a meaningful comment about the group’s purpose.
Note: Azure AD resources will not appear in the Azure Resource Group alongside the rest of the Azure resources we deploy. AAD metadata is stored in the AAD tenant in a separate section inside the portal. In addition to a meaningful description, adding the cluster name to the group name will help identify its purpose in AAD.
AKS Version Information
The Terraform configuration needs information about new Azure Kubernetes Service (AKS) versions when available to automatically apply AKS version upgrades. Rather than check for this manually and update a hardcoded value, it is much nicer to program this directly into the Terraform configuration.
To query for AKS version information, add a file called aks-versions.tf
and add the contents shown below.
data "azurerm_kubernetes_service_versions" "current" {
location = azurerm_resource_group.primary.location
}
Fetching the AKS version information introduces another Terraform concept: data sources. Data providers are usually read-only siblings to resources. Often times, we use data sources when several Terraform projects are working together to manage infrastructure. For example, a dedicated networking team may build and secure all virtual networks in your organization. Other groups won’t have direct access to the virtual network resource and subnet information. However, suppose the team has the right permissions. In that case, they can use data sources to query the Azure API for networking information and use it in their own portion of the environment.
In the case of supported Kubernetes versions in Azure, this API is read-only. It’s not something we can create, so there is only a data source available in Terraform. The resource only requires one parameter.
location
– the available AKS versions vary by region, so we must provide the Azure region we are interested in.
Note: Although location is the only required property, the data source can filter according to aversion prefix. This can be useful when you are interested in automatic upgrades for patch versions but want to be more deliberate for major or minor versions. Check out the documentation for details.
Azure Kubernetes Service
We’re now ready to add our AKS cluster configuration to our Terraform project. The azurerm_kubernetes_cluster
resource has many properties, many of which consist of nested blocks. So, it will take some patience to read through them all. To be fair, you can actually deploy an AKS cluster with very few required properties. However, to get to a reasonable real-world baseline cluster with the features described at the top of this guide will take a little more effort. To make it more consumable, I’ll show the configuration one step at a time, starting with the bare minimum.
Basic Cluster
Without further ado, add a file called aks-cluster.tf
and add the basic AKS configuration shown below.
resource "azurerm_kubernetes_cluster" "aks" {
dns_prefix = local.aks_cluster_name
location = azurerm_resource_group.primary.location
name = local.aks_cluster_name
resource_group_name = azurerm_resource_group.primary.name
default_node_pool {
name = "system"
node_count = 1
vm_size = "Standard_DS2_v2"
}
identity { type = "SystemAssigned" }
}
The initial cluster setup has only a few required arguments, but two of them are embedded blocks. These are the first embedded blocks we’ve encountered outside the terraform
configuration block.
dns_prefix
– AKS uses the DNS prefix to build the hostname for the Kubernetes API endpoint it deploys on your behalf. I reuse the cluster name for this purpose, and AKS will add some random characters to the hostname to ensure the name is unique in DNS.location
– As with other resources, the location indicates the Azure region to deploy AKS in.name
– The name of the AKS cluster resource in the resource group.resource_group_name
– The resource group to place the AKS cluster resource into.default_node_pool
– This block describes the worker node options for the system node pool.name
– A name for the node pool. I’ve named the default pool “system” and will create another node pool later for user workloads.
node_count
– The number of VMs to allocate to this pool. We’ll start with one node for now.
vm_size
– The Azure VM SKU for nodes in this pool. Should you require more power, update the relatively modest two core machine shown here.
identity
– This block describes the cluster identity. The cluster needs an identity in Azure to interact with resources like storage and networking configurations.type
–SystemAssigned
is the only supported option. AKS does not currently support User Assigned managed identity.
Note: In the past, AKS only supported Service Principal credentials for cluster identity. While this option is still supported, managed identity provides a cleaner solution because we do not have to create, cleanup, or rotate credentials for the Service Principal. With managed identities, Azure takes care of all those tasks for us.
Automatic Upgrades
Earlier in the guide we setup a data source to read the available AKS versions in our region. This information enables automatic cluster upgrades. Enable that now by setting two properties as shown below.
resource "azurerm_kubernetes_cluster" "aks" {
dns_prefix = local.aks_cluster_name
kubernetes_version = data.azurerm_kubernetes_service_versions.current.latest_version
location = azurerm_resource_group.primary.location
name = local.aks_cluster_name
resource_group_name = azurerm_resource_group.primary.name
default_node_pool {
name = "system"
node_count = 1
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
vm_size = "Standard_DS2_v2"
}
identity { type = "SystemAssigned" }
}
Enable automatic upgrades by making a reference to the Kubernetes version data source.
kubernetes_version
– a top-level property used to request a specific orchestrator version of Kubernetes.orchestrator_version
– the version of Kubernetes to use in the node pool. In general, it should match the orchestrator Kubernetes version.
Note: The first time we apply this configuration, Terraform will apply whatever latest version it finds in the AKS versions data source. When new versions are available, AKS will upgrade automatically. But Azure will not allow skip-version upgrades. You may need to pin your data source to the next version, upgrade, then remove the pinning and upgrade again to get to the latest version.
Default Node Pool Configuration
Next I’ll configure some additional options on the default node pool, enabling availability zones, auto scaling, and choosing a more performant disk size.
default_node_pool {
availability_zones = [1, 2, 3]
enable_auto_scaling = true
max_count = 3
min_count = 1
name = "system"
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
os_disk_size_gb = 1024
vm_size = "Standard_DS2_v2"
}
Now, in addition to automatic upgrades, the default node pool sets the following properties.
availability_zones
– Enabling availability zones increases cluster reliability by deploying nodes to physically different data centers within the same Azure region.enable_auto_scaling
– Node autoscaling will deploy additional nodes to the cluster when existing nodes become low on resources. When workloads in the cluster decrease, the auto scaler will likewise reduce the pool’s number of nodes.max_count
– The maximum number of nodes to allow the auto scaler to deploy.min_count
– The minimum cluster size, the auto scaler, will keep at least this number of nodes in the pool.os_disk_size_gb
– Disk performance in Azure depends on size, so I’ve increased the OS disk size to receive more IOPS.
Note: Increasing the disk size may not be needed for all workloads, but the default disk size is pretty small, and expanding the disk size requires redeploying the node pool. In the case of the default node pool, redeployment, in turn, requires redeploying the entire AKS cluster.
Once enabled, the auto scaler behavior can be customized using an auto_scaler_profile block. However, I’ve accepted the defaults for these values.
AKS Add-ons
Next we’ll add an addon_profile
block which allows us to install the agents for Azure policy and Log Analytics.
addon_profile {
azure_policy { enabled = true }
oms_agent {
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.insights.id
}
}
Each add-on requires another nested property block.
azure_policy
– to add the Azure policy agent, declare the block, and set theenabled
flag to true.oms_agent
– to collect logs and ship to Log Analytics, declare this block, and set theenabled
flag to true.log_anaglytics_workspace_id
– a reference to the Log Analytics workspace we created earlier.
Note: Azure Policy for Kubernetes works with Azure Security Center to detect and deny potentially insecure configurations.
Kubernetes RBAC AAD Integration
While Kubernetes ships with an optional role-based access control solution, it does not supply an authentication system. Instead, you must integrate your AKS cluster with an external login provider. Azure Active Directory is one such provider. To enable this integration in the past, we needed to create multiple Service Principals in AAD and ensure they all had the correct rights. Some of the required rights needed tenant administrator authorization, which made managing these credentials inconvenient for anyone who was not a tenant administrator! Finally, even after jumping through these hoops, the integration still sometimes failed to work for organizations using tight conditional access policies.
Fortunately, AKS now provides a better way: managed AAD integration. With managed AAD integration, we indicate that we would like to leverage Active Directory for login. Then we let AKS know which AAD groups it should assign cluster administrator privileges to. Authorizing the connection between AAD and AKS all happens under the hood.
To enable this integration, add a role_based_access_control
block as shown below:
role_based_access_control {
enabled = true
azure_active_directory {
managed = true
admin_group_object_ids = [azuread_group.aks_administrators.object_id]
}
}
First, activate Kubernetes RBAC by setting the enabled
flag to true, then configure the azure_active_directory
nested block.
managed
– set to true to use the newer AAD integration described above.admin_group_object_ids
– a collection of groups that will receive administrator privileges on the cluster. Here we set one group, using the value of the administrators’ group we created earlier. Use commas to separate multiple group ids if needed.
Note: You must opt-in to Kubernetes RBAC at cluster creation time. However, if RBAC is already enabled, you can add AAD integration without rebuilding the cluster.
Custom Node Resource Group Name
The node resource group is a separate resource group placed by AKS into the same region as your AKS cluster resource. AKS uses this resource group to manage Azure resources on your behalf. Depending on your configuration, this group will include items like:
- The Virtual Machine Scale Sets (VMSS) for your node pools.
- The Azure Load Balancers for your external services.
- All the networking infrastructure like Virtual Network, Network Security Group, and Route Table.
AKS manages these resources, so they don’t need to clutter up the resource group you created for your AKS instance. The reality is that from time to time, you will want to inspect these resources, even though they are managed for you. The default naming convention is easy enough to figure out. I find it even easier to locate these resources if I override this convention with my own.
dns_prefix = local.aks_cluster_name
kubernetes_version = data.azurerm_kubernetes_service_versions.current.latest_version
location = azurerm_resource_group.primary.location
name = local.aks_cluster_name
node_resource_group = "${azurerm_resource_group.primary.name}-aks”
resource_group_name = azurerm_resource_group.primary.name
To customize the node resource group name, set a single top-level property in Terraform:
node_resource_group
– The name to use for the node resource group. Because I have only one AKS cluster in the primary resource group, I simply add a suffix to its name.
Note: The node resource group name cannot be changed after cluster creation. Updating this property will cause Terraform to destroy the existing cluster and create a new one.
Full AKS Configuration
After putting everything together, the contents of the aks-cluster.tf
file should look like this:
resource "azurerm_kubernetes_cluster" "aks" {
dns_prefix = local.aks_cluster_name
kubernetes_version = data.azurerm_kubernetes_service_versions.current.latest_version
location = azurerm_resource_group.primary.location
name = local.aks_cluster_name
node_resource_group = "${azurerm_resource_group.primary.name}-aks"
resource_group_name = azurerm_resource_group.primary.name
addon_profile {
azure_policy { enabled = true }
oms_agent {
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.insights.id
}
}
default_node_pool {
availability_zones = [1, 2, 3]
enable_auto_scaling = true
max_count = 3
min_count = 1
name = "system"
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
os_disk_size_gb = 1024
vm_size = "Standard_DS2_v2"
}
identity { type = "SystemAssigned" }
role_based_access_control {
enabled = true
azure_active_directory {
managed = true
admin_group_object_ids = [azuread_group.aks_administrators.object_id]
}
}
}
User Node Pool
Although AKS is now part of our configuration, there is just one more resource to add before finishing. Adding a second node pool for user workloads will give us the option to separate our pods from system workloads like CoreDNS and tunnelfront. This is a good idea because system pods are required for proper cluster operation. If our pods starve system pods for resources, our cluster can become unstable.
Another great reason to opt-in to a user node pool is the added flexibility they provide. We are limited in ways that we can modify the default node pool once we deploy the cluster. Suppose we only use the default node pool and determine that the VM size is too small, or we need larger disks for performance. In that case, we can only achieve that change by rebuilding the cluster or adding a second node pool. Some of the same restrictions apply to user node pools. However, we can delete obsolete user node pools after deploying new pools (or scale them all the way to zero), and we cannot do so for the default node pool.
To add a user node pool, create a file called aks-cluster-user-nodes.tf
and add a azurerm_kubernetes_cluster_node_pool
resource.
resource "azurerm_kubernetes_cluster_node_pool" "user" {
availability_zones = [1, 2, 3]
enable_auto_scaling = true
kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
max_count = 3
min_count = 1
mode = "User"
name = "user"
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
os_disk_size_gb = 1024
vm_size = "Standard_DS2_v2"
}
A node pool resource should look familiar because so many properties are the same as the default node pool properties.
availability_zones
– Once again, enable availability zones to increase cluster reliability.enable_auto_scaling
– Opt-in to autoscaling.kubernetes_cluster_id
– Create a reference to the cluster’s resource id to attach the new node pool to the correct cluster.max_count
– The maximum number of nodes to allow the auto scaler to deploy.min_count
– The minimum cluster size, the auto scaler, will keep at least this number of nodes in the pool.mode
– The mode can beSystem
orUser
. This one is aUser
pool.name
– A name for the node pool.orchestrator_verson
– the Kubernetes version to use in this node pool. Like before, we use a reference to the Kubernetes version data source.os_disk_size_gb
– Disk performance in Azure depends on size, so I’ve increased the OS disk size to receive more IOPS.vm_size
– The VM SKU for nodes in this pool.
Next Steps
After adding the user node pool, we’ve completed the cluster. Although this is an excellent intermediate cluster setup, there are still a few features it does not include like:
- OS Disk Encryption
- Private Cluster
- Custom Networking
- Security Center integration
That list is just the interesting AKS features. Once the cluster is up and running, the Kubernetes ecosystem includes plenty of exciting deployments inside the cluster to provide things like:
- Automatic OS patch rebooting
- Ingress Controller
- Service Mesh
Hope you enjoy using the AKS quick start as a jumping-off point to further exploration.
What would you like to see next?
This is great content covering some realistic cluster features. Really helpful . Thanks Jim
Thanks Jim, so useful to see some sample config for the new way of doing AAD.
Excellent work – High quality article, way better than anything else on the internet on AKS via TF – would love to see some setting with ingress controller incorporated as well.
This content is really great; It helped me to understand the basic things required for bootstrapping my cluster;