In HashiCorp Terraform, data sources serve as a bridge between the Terraform configuration and external systems or information. Essentially, data sources allow Terraform to query external resources, such as cloud platforms, APIs, databases, or other systems, and use the retrieved information within the configuration.
Unlike resources, which represent infrastructure components to be managed by Terraform, data sources are read-only. They provide information to the Terraform configuration but do not create or modify resources. This clear distinction between resources and data sources enables Terraform to efficiently manage infrastructure while incorporating external data seamlessly. Data sources (via the data tag) retrieve the data of existing, external resources; where Resources (via the resource tag) are used to configure resources.
How to use a Terraform Data Source
Let’s take a look at the usage of data sources with an example in the context of Microsoft Azure cloud resources managed using the azurerm provider.
Data sources are defined in Terraform using a data resource. This is a special kind of resource in Terraform for reading information from external infrastructure resources that are not managed by the Terraform project.
The following is an example of defining a Data Source in Terraform:
data "azurerm_virtual_network" "existing_vnet" {
name = "b59-existing-vnet"
resource_group_name = "b59-resources"
}
In this example, the data resource (via data) needs to define the necessary attributes to identify the resource to read information from. All the other configuration of the resource is no defined or needed to perform the lookup operation.
Next, consider a scenario where we need to provision an Azure virtual machine (VM) in a specific virtual network (VNet). We want to ensure that the VM is deployed within an existing VNet rather than creating a new one. In this case, the VNet is not configured in the same Terraform project as the VM. To achieve this, we can use a data source to fetch information about the existing VNet, and then reference the VNet data when configuring the VM.
data "azurerm_virtual_network" "existing_vnet" {
name = "b59-existing-vnet"
resource_group_name = "b59-resources"
}
resource "azurerm_virtual_machine" "example_vm" {
name = "b59-example-vm"
# reference Azure Resource Group configured elsewhere in this Terraform project
resource_group_name = azurerm_resource_group.example_rg.name
location = azurerm_resource_group.example_rg.location
...
network_interface_ids = [azurerm_network_interface.example_nic.id]
}
resource "azurerm_network_interface" "example_nic" {
name = "example-nic"
...
ip_configuration {
name = "internal"
# reference the data source for the existing VNet
subnet_id = data.azurerm_virtual_network.existing_vnet.subnets[0].id
private_ip_address_allocation = "Dynamic"
}
}
In this example:
- We define a data source
azurerm_virtual_networkto fetch information about the existing VNet named “existing-vnet” within the resource groupb59-resources. - The retrieved VNet information is then referenced within the configuration of the virtual machine’s network interface to ensure that it is deployed within the specified VNet.
While referencing resources the reference starts with the resource type, however, when referencing a data source the reference starts with data. then the resource type. The following are examples of referencing a resource configured by the Terraform project compared to a data source for a similar resource:
# reference a resource
subnet_id = azurerm_virtual_network.existing_vnet.subnets[0].id
# reference a data source
subnet_id = data.azurerm_virtual_network.existing_vnet.subnets[0].id
Difference Between Data Sources and Outputs
While both data sources and outputs facilitate the retrieval of information for use within Terraform configurations, they serve distinct purposes.
- Data Sources – Retrieve information from external systems or resources during the Terraform execution phase. They are primarily used to fetch data needed to configure resources dynamically.
- Outputs – Define values that are computed or retrieved during the Terraform execution and displayed to the user once the operation is complete. Outputs are typically used to communicate important information or provide references to resources created by Terraform.
Both Resources (via resource) and Outputs (via output) are able to reference Data Sources as needed for the Terraform configuration.
The following are examples of both a data source (via data) and an output (via output) in Terraform code:
data "azurerm_virtual_network" "existing_vnet" {
name = "b59-existing-vnet"
resource_group_name = "b59-resources"
}
output "existing_vnet_id" {
value = data.azurerm_virtual_network.existing_vnet.id
}
In this example, the Output is referencing the Data Source in a similar way to how a Resource could also be referenced by Outputs as well.
Conclusion
Terraform data sources offer the ability to integrate external data seamlessly into infrastructure configurations, enhancing flexibility and efficiency in managing cloud resources. By leveraging data sources, Terraform can configure resources dynamically based on real-time information from external systems, contributing to the agility and automation of infrastructure management workflows.
Original Article Source: Terraform: How are Data Sources used? written by Chris Pietschmann (If you're reading this somewhere other than Build5Nines.com, it was republished without permission.)
Implementing Azure Naming Conventions at Scale with Terraform and Build5Nines/naming/azure (AzureRM + Region Pairs)
Microsoft Azure Regions: Interactive Map of Global Datacenters
Create Azure Architecture Diagrams with Microsoft Visio
New Book: Build and Deploy Apps using Azure Developer CLI by Chris Pietschmann
Prompt Noise Is Killing Your AI Accuracy: How to Optimize Context for Grounded Output


