Mastering Terraform Workspaces: A Guide to Scalable State Management
Terraform has revolutionized Infrastructure as Code (IaC), but managing infrastructure state across different environments (like development, staging, and production) or features can quickly become complex. Enter Terraform Workspaces. While seemingly simple, leveraging them effectively requires understanding best practices for organization, security, and automation.
This guide dives deep into Terraform workspace management, providing practical strategies, real-world examples, and proven techniques to help you maintain clean, scalable, and secure infrastructure state. Whether you’re managing multiple environments, testing feature branches, or implementing complex deployment patterns, mastering workspaces is key.
What Are Terraform Workspaces? (And What They Aren’t)
Terraform workspaces allow you to manage multiple, distinct state files using the same Terraform configuration. Think of them as separate instances of your infrastructure definition, each with its own state data.
Key Characteristics:
- State Separation: Each workspace maintains its own independent
terraform.tfstate
file. This is the primary benefit – isolating state for different environments or purposes. - Shared Configuration: All workspaces use the same set of
.tf
configuration files. Changes to the code affect all workspaces upon the nextapply
. - Variable Differentiation: Workspaces often rely on different input variable values (e.g., instance sizes, domain names) to customize the infrastructure per environment. This is typically managed using
.tfvars
files or environment variables in CI/CD. - Backend Dependency: Workspaces are most effective when used with a remote backend (like AWS S3, Azure Blob Storage, or Terraform Cloud/Enterprise) which handles state storage, locking, and versioning.
Important Distinction: Terraform workspaces are primarily for managing state variations, not for large-scale code organization or module reuse. For separating distinct infrastructure components (e.g., networking vs. application), consider using separate Terraform directories/configurations or modules.
Common Use Cases for Workspaces:
- Multi-Environment Deployments: The most common use case – managing
dev
,staging
, andprod
environments with the same codebase but different configurations and isolated states. - Feature Branch Testing: Creating temporary workspaces to test infrastructure changes related to specific feature branches without affecting core environments.
- A/B Testing: Deploying slightly different infrastructure versions for A/B testing purposes.
- Blue-Green Deployments: Facilitating blue-green deployment strategies by managing separate states for the blue and green environments.
Setting Up Your Workspace Foundation: Backend Configuration
A robust backend configuration is the cornerstone of effective workspace management, especially in team environments. Using a remote backend like AWS S3 provides centralized state storage, locking to prevent concurrent modifications, and versioning for recovery.
Here’s a recommended S3 backend configuration incorporating security best practices:
terraform {
backend "s3" {
# Required: The name of the S3 bucket to store state files.
bucket = "your-company-terraform-states" # Choose a unique, descriptive name
# Required: The path within the bucket where the state file will be stored.
# Using terraform.workspace ensures each workspace gets its own state file path.
# Example path: dev/microservices-platform/terraform.tfstate
key = "${terraform.workspace}/${local.project_name}/terraform.tfstate"
# Required: The AWS region where the S3 bucket resides.
region = "us-west-2" # Use your desired region
# Recommended: Enable server-side encryption (SSE-S3) for the state file at rest.
encrypt = true
# Recommended: The name of the DynamoDB table used for state locking.
# This prevents multiple users from running Terraform commands simultaneously against the same state.
dynamodb_table = "your-company-terraform-locks"
# Optional but Recommended: Specify a KMS key ARN for enhanced encryption (SSE-KMS).
# Provides an additional layer of security and control over encryption keys.
# kms_key_id = "arn:aws:kms:us-west-2:ACCOUNT-ID:key/YOUR-KMS-KEY-ID"
# Optional but Recommended: Set the Access Control List (ACL) for the state file object.
# 'private' ensures only the bucket owner has access. Consider 'bucket-owner-full-control' if needed.
acl = "private"
# Optional: Explicitly enable S3 bucket versioning (must be enabled on the bucket itself too).
# While not directly configured here, ensure your S3 bucket has versioning enabled
# to recover previous state versions if needed.
# Optional: Specify an IAM role ARN for Terraform to assume when accessing the backend.
# This is highly recommended for security instead of using static AWS credentials.
# role_arn = "arn:aws:iam::ACCOUNT-ID:role/terraform-backend-access-role"
}
}
# Define local variables for consistent naming and tagging
locals {
# Use terraform.workspace to dynamically set the environment context
environment = terraform.workspace
project_name = "microservices-platform" # Define your project identifier
# Common tags applied to resources for organization and cost tracking
common_tags = {
Environment = local.environment
Project = local.project_name
ManagedBy = "Terraform"
Team = "DevOps"
}
}
Key Security Considerations for Backend Configuration:
- IAM Roles over Static Credentials: Avoid hardcoding AWS access keys. Instead, configure Terraform (especially in CI/CD) to assume an IAM role with the least privilege necessary to access the S3 bucket (GetObject, PutObject, ListBucket on the specific path) and the DynamoDB table (GetItem, PutItem, DeleteItem).
- Bucket Policies: Implement strict S3 bucket policies to further restrict access to the state files, allowing only authorized IAM principals (users, roles).
- Encryption: Always enable encryption at rest (
encrypt = true
). Using KMS (kms_key_id
) provides more granular control and auditability over encryption keys. - State Locking: The
dynamodb_table
is crucial for preventing state corruption caused by concurrentterraform apply
operations. Ensure the table exists and Terraform has permissions to use it. - Versioning: Enable versioning on your S3 bucket. This is a safety net, allowing you to revert to previous state file versions in case of accidental deletion or corruption.
Advanced Workspace Management Strategies
Leveraging workspaces effectively often involves specific strategies for environment separation, state organization, and security.
1. Environment Separation with Variable Mapping
The most frequent use of workspaces is managing distinct environments (dev
, staging
, prod
, etc.) using the same codebase. The key is to vary resource configurations based on the active workspace. A common pattern is using a map in locals
keyed by terraform.workspace
.
Example: Environment-Specific Resource Sizing
locals {
# Define configuration maps based on workspace name
env_config = {
# Configuration for the 'dev' workspace
dev = {
instance_type = "t3.micro" # Smaller instance for development
asg_min_size = 1
asg_max_size = 2
db_instance_class = "db.t3.small" # Smaller DB instance
enable_monitoring = false # Less monitoring in dev
}
# Configuration for the 'staging' workspace
staging = {
instance_type = "t3.medium" # Medium instance for staging
asg_min_size = 2
asg_max_size = 4
db_instance_class = "db.m5.large" # Production-like DB
enable_monitoring = true
}
# Configuration for the 'prod' workspace
prod = {
instance_type = "t3.large" # Larger instance for production
asg_min_size = 3
asg_max_size = 10
db_instance_class = "db.r5.large" # Robust DB instance for production
enable_monitoring = true
}
# Add other environments as needed...
}
# Select the configuration for the current workspace
# Use lookup() with a default value (e.g., 'dev') to handle unexpected workspace names gracefully
current_env_config = lookup(local.env_config, terraform.workspace, local.env_config.dev)
}
# Example EC2 Instance using the mapped configuration
resource "aws_instance" "web_server" {
ami = "ami-0abcdef1234567890" # Replace with your actual AMI ID
instance_type = local.current_env_config.instance_type
tags = merge(local.common_tags, { Name = "WebServer-${local.environment}" })
# ... other instance configurations ...
}
# Example Auto Scaling Group using the mapped configuration
resource "aws_autoscaling_group" "web_asg" {
# ... launch configuration / template ...
min_size = local.current_env_config.asg_min_size
max_size = local.current_env_config.asg_max_size
desired_capacity = local.current_env_config.asg_min_size # Start with min
tags = [
# Ensure tags propagate correctly in ASGs
for k, v in local.common_tags : {
key = k
value = v
propagate_at_launch = true
}
]
# ... other ASG configurations ...
}
# Example RDS Instance using the mapped configuration
resource "aws_db_instance" "database" {
allocated_storage = terraform.workspace == "prod" ? 100 : 20 # Example conditional storage
engine = "mysql"
engine_version = "8.0"
instance_class = local.current_env_config.db_instance_class
# ... credentials, security groups, etc. ...
skip_final_snapshot = terraform.workspace != "prod" # Don't skip snapshot in prod
tags = merge(local.common_tags, { Name = "Database-${local.environment}" })
}
Implementing Production Safeguards:
Production environments demand extra caution. Workspaces, combined with provider configurations and conditional logic, can help prevent costly mistakes.
Account ID Checks: Ensure Terraform operations target the correct AWS account for production.
# Define the expected production account ID variable "production_account_id" { description = "The AWS Account ID designated for the production environment." type = string # Sensitive = true # Consider marking as sensitive if needed } provider "aws" { region = "us-west-2" # Or your desired region # Restrict allowed account IDs based on the workspace # Only allow the production account ID if the workspace is 'prod' allowed_account_ids = terraform.workspace == "prod" ? [var.production_account_id] : null # Optional but recommended: Assume a specific role for production deployments # assume_role { # role_arn = terraform.workspace == "prod" ? "arn:aws:iam::${var.production_account_id}:role/terraform-prod-deploy-role" : null # } }
Explanation: The
allowed_account_ids
argument in the AWS provider block acts as a safety check. If the current AWS credentials belong to an account not in this list, Terraform will fail before making any changes. We make this conditional: if the workspace isprod
, only theproduction_account_id
is allowed; otherwise (null
), any account is permitted (suitable for dev/staging).Conditional Resource Creation/Deletion: Prevent accidental deletion of critical production resources.
resource "aws_db_instance" "critical_database" { # ... other configurations ... # Prevent deletion in the production workspace lifecycle { prevent_destroy = terraform.workspace == "prod" ? true : false } }
Explanation: The
prevent_destroy
lifecycle meta-argument, when set totrue
, causes Terraform to error out if a plan involves destroying this resource. We make this conditional on the workspace beingprod
.
2. Structuring Your State: Organization and Locking
As your infrastructure grows, how you organize your state files becomes critical for maintainability and collaboration.
Logical State File Structure in Your Backend
While workspaces handle state separation within a single configuration, you often need a higher-level organization in your backend storage (like S3). A common approach is to structure paths based on environment and component/project.
Example S3 Bucket Structure:
your-company-terraform-states/ # S3 Bucket Root
├── dev/ # Environment Level
│ ├── core-network/ # Component/Project Level
│ │ └── terraform.tfstate # State file for 'dev' workspace of 'core-network' config
│ ├── microservice-alpha/
│ │ └── terraform.tfstate # State file for 'dev' workspace of 'microservice-alpha' config
│ └── shared-services/
│ └── terraform.tfstate
├── staging/
│ ├── core-network/
│ │ └── terraform.tfstate
│ ├── microservice-alpha/
│ │ └── terraform.tfstate
│ └── shared-services/
│ └── terraform.tfstate
└── prod/
├── core-network/
│ └── terraform.tfstate
├── microservice-alpha/
│ └── terraform.tfstate
└── shared-services/
└── terraform.tfstate
Mapping to Backend Config: In your backend "s3"
block, the key
would typically be constructed like ${terraform.workspace}/component-name/terraform.tfstate
.
Benefits of This Structure:
- Clarity: Easily locate the state for any environment and component.
- Isolation: Reduces the “blast radius” – issues in one component’s state are less likely to affect others.
- Granular Permissions: Allows setting more specific IAM permissions per component path if needed.
- Automation: Simplifies scripting for tasks like backups, audits, or cleanup based on path prefixes.
Implementing Robust State Locking
State locking is non-negotiable in team environments. It prevents multiple users or CI/CD jobs from applying changes simultaneously, which can lead to state corruption or race conditions. DynamoDB is the standard choice for state locking with the AWS S3 backend.
# Define the DynamoDB table used for state locking
# This resource should ideally be managed outside this specific Terraform config
# (e.g., in a separate 'bootstrap' config) to avoid circular dependencies.
resource "aws_dynamodb_table" "terraform_state_locks" {
# Use a descriptive name, potentially shared across projects
name = "your-company-terraform-locks"
# Pay-per-request is often cost-effective for lock tables
billing_mode = "PAY_PER_REQUEST"
# The hash key required by Terraform's S3 backend locking mechanism
hash_key = "LockID"
attribute {
name = "LockID"
type = "S" # String type
}
tags = {
Name = "Terraform State Lock Table"
ManagedBy = "Terraform-Bootstrap" # Indicate how it's managed
Environment = "Global" # Often a global resource
}
# Optional: Enable Point-in-Time Recovery for backups
# point_in_time_recovery {
# enabled = true
# }
# Optional: Enable server-side encryption
# server_side_encryption {
# enabled = true
# # kms_key_arn = "arn:aws:kms:..." # Use KMS for enhanced security
# }
}
Key Points:
- The
name
used here must match thedynamodb_table
value in yourbackend "s3"
configuration. - Ensure the IAM role/user running Terraform has
dynamodb:GetItem
,dynamodb:PutItem
, anddynamodb:DeleteItem
permissions on this table.
Considerations for State File Cleanup (Use with Caution!)
While keeping state files indefinitely (especially with versioning) is often safest, you might need cleanup for temporary workspaces (e.g., feature branches) or cost management. Automating this requires care.
Example Lambda for Stale Workspace State Cleanup (Conceptual):
This Python example demonstrates deleting S3 objects (representing state files) older than a specified number of days, excluding core environments like prod
and staging
.
import boto3
import os
import logging
from datetime import datetime, timedelta, timezone
logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
# Read configuration from environment variables for flexibility
BUCKET_NAME = os.environ.get('STATE_BUCKET_NAME')
PROTECTED_WORKSPACES = os.environ.get('PROTECTED_WORKSPACES', 'prod,staging,default').split(',')
RETENTION_DAYS = int(os.environ.get('RETENTION_DAYS', '90'))
def lambda_handler(event, context):
if not BUCKET_NAME:
logger.error("STATE_BUCKET_NAME environment variable not set.")
return {'statusCode': 500, 'body': 'Configuration error.'}
logger.info(f"Starting state file cleanup for bucket: {BUCKET_NAME}")
logger.info(f"Protected workspaces: {PROTECTED_WORKSPACES}")
logger.info(f"Retention period: {RETENTION_DAYS} days")
cutoff_date = datetime.now(timezone.utc) - timedelta(days=RETENTION_DAYS)
deleted_count = 0
paginator = s3.get_paginator('list_objects_v2')
try:
# Paginate through all objects in the bucket
for page in paginator.paginate(Bucket=BUCKET_NAME):
if 'Contents' not in page:
continue
for obj in page['Contents']:
key = obj['Key']
last_modified = obj['LastModified']
# Extract potential workspace name from the key (assuming format like workspace/...)
# Adjust this logic based on your actual key structure
key_parts = key.split('/')
workspace_name = key_parts[0] if len(key_parts) > 1 else None
# Check if the workspace is protected
if workspace_name in PROTECTED_WORKSPACES:
logger.debug(f"Skipping protected workspace state: {key}")
continue
# Check if the object is older than the retention period
if last_modified < cutoff_date:
try:
logger.info(f"Deleting old state file: {key} (Last Modified: {last_modified})")
s3.delete_object(Bucket=BUCKET_NAME, Key=key)
deleted_count += 1
except Exception as e:
logger.error(f"Failed to delete object {key}: {e}")
else:
logger.debug(f"Skipping recent state file: {key}")
except Exception as e:
logger.error(f"Error listing objects in bucket {BUCKET_NAME}: {e}")
return {'statusCode': 500, 'body': 'Error during cleanup.'}
logger.info(f"Cleanup complete. Deleted {deleted_count} old state files.")
return {
'statusCode': 200,
'body': f'State file cleanup completed. Deleted {deleted_count} objects.'
}
Important Notes:
- Test Thoroughly: Test this script extensively in a non-production environment before deploying. Accidental state deletion is irreversible without backups/versioning.
- Refine Logic: Adapt the key parsing (
key.split('/')
) to match your exact S3 state file structure. - Permissions: The Lambda execution role needs
s3:ListBucket
ands3:DeleteObject
permissions on the state bucket. - Trigger: Schedule this Lambda using CloudWatch Events (e.g., run daily or weekly).
- Consider Alternatives: Terraform Cloud/Enterprise offer features for managing workspace lifecycles, which might be a safer alternative.
3. Securing Your Workspaces: Access Control and Encryption
Terraform state files often contain sensitive information about your infrastructure (resource IDs, IP addresses, potentially even generated secrets if not handled carefully). Protecting them is crucial.
Fine-Grained IAM Policies
Apply the principle of least privilege. The IAM role or user interacting with the Terraform backend needs only specific permissions.
Example IAM Policy for Backend Access:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowListBucket",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::your-company-terraform-states"
// Optional: Add condition to restrict listing only specific prefixes if needed
// "Condition": {
// "StringLike": {
// "s3:prefix": [
// "dev/*",
// "staging/*",
// "prod/*"
// ]
// }
// }
},
{
"Sid": "AllowStateAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject", // Read state
"s3:PutObject" // Write state
// "s3:DeleteObject" // Needed if Terraform needs to delete state (rare)
],
// Restrict access to objects within the bucket, potentially per-environment/project
"Resource": "arn:aws:s3:::your-company-terraform-states/*"
// Example: More restrictive path for a specific role
// "Resource": "arn:aws:s3:::your-company-terraform-states/prod/critical-app/*"
},
{
"Sid": "AllowLockTableAccess",
"Effect": "Allow",
"Action": [
"dynamodb:GetItem", // Read lock status
"dynamodb:PutItem", // Acquire lock
"dynamodb:DeleteItem" // Release lock
],
// Be specific with the DynamoDB table ARN
"Resource": "arn:aws:dynamodb:us-west-2:ACCOUNT-ID:table/your-company-terraform-locks"
}
]
}
Key Considerations:
- Resource Specificity: Be as specific as possible with the
Resource
ARNs. Granting access to*
is generally discouraged. - Role Separation: Consider different IAM roles for different environments (e.g., a
terraform-dev-role
vs. aterraform-prod-role
) with varying levels of access or restrictions. - CI/CD Permissions: Ensure your CI/CD system’s role has these permissions.
Enhancing Security with KMS Encryption
While S3 provides default encryption (SSE-S3), using AWS Key Management Service (KMS) keys (SSE-KMS) offers significant advantages:
- Centralized Key Management: Manage key rotation, policies, and lifecycle from KMS.
- Finer-Grained Access Control: Use KMS key policies alongside IAM policies to control who can encrypt/decrypt the state.
- Audit Trail: KMS actions are logged in CloudTrail, providing visibility into key usage.
Example: Creating and Using a KMS Key for State Encryption:
# Define a KMS key specifically for encrypting Terraform state
resource "aws_kms_key" "terraform_state_key" {
description = "KMS key for encrypting Terraform state files"
deletion_window_in_days = 7 # Minimum is 7, choose based on your recovery needs
enable_key_rotation = true # Recommended for security
# Key policy: Defines who can manage and use the key
policy = jsonencode({
Version = "2012-10-17",
Statement = [
# Statement 1: Allow root user full control over the key
{
Sid = "EnableIAMUserPermissions",
Effect = "Allow",
Principal = {
# Replace ACCOUNT-ID with your actual AWS account ID
AWS = "arn:aws:iam::ACCOUNT-ID:root"
},
Action = "kms:*",
Resource = "*"
},
# Statement 2: Allow the Terraform backend role to use the key for encryption/decryption
{
Sid = "AllowTerraformBackendRoleUsage",
Effect = "Allow",
Principal = {
# Replace with the ARN of the IAM role Terraform uses for backend access
AWS = "arn:aws:iam::ACCOUNT-ID:role/terraform-backend-access-role"
},
# Required permissions for Terraform S3 backend with SSE-KMS
Action = [
"kms:Encrypt", # Needed to write encrypted state
"kms:Decrypt", # Needed to read encrypted state
"kms:GenerateDataKey*" # Needed by S3 for SSE-KMS operations
# "kms:DescribeKey" # Optional, can be useful for validation
],
Resource = "*" # Resource is always '*' for KMS usage permissions
}
# Add other principals (e.g., administrators) as needed
]
})
tags = {
Name = "terraform-state-kms-key"
Environment = "Global"
ManagedBy = "Terraform-Bootstrap"
}
}
# Reference this key in your backend configuration:
# terraform {
# backend "s3" {
# ...
# kms_key_id = aws_kms_key.terraform_state_key.arn
# ...
# }
# }
Note: Manage the KMS key itself in a separate, foundational Terraform configuration to avoid dependencies. Reference its ARN in the backend block of your application/component configurations.
Enabling Audit Logging with CloudTrail
Tracking who accesses or modifies your state files is essential for security and compliance. AWS CloudTrail can log API calls made to S3 and DynamoDB.
Steps to Configure CloudTrail for State Auditing:
- Ensure CloudTrail is Enabled: Verify you have at least one active CloudTrail trail logging events in the region(s) where your S3 bucket and DynamoDB table reside.
- Enable Data Events: Edit your CloudTrail trail settings. Under “Data events,” choose to log:
- S3: Select “Log all current and future S3 buckets” or specify your
your-company-terraform-states
bucket. Choose both “Read” (GetObject) and “Write” (PutObject) event types. - DynamoDB: Select “Log all current and future DynamoDB tables” or specify your
your-company-terraform-locks
table.
- S3: Select “Log all current and future S3 buckets” or specify your
- Configure Log Storage: Ensure CloudTrail logs are stored securely in a designated S3 bucket (ideally separate from your state bucket) and consider enabling log file encryption and validation.
- Monitor Logs: Use tools like Amazon Athena, CloudWatch Logs Insights, or third-party SIEM systems to query and analyze CloudTrail logs for suspicious activity related to your state files or lock table. Look for unauthorized access attempts, unexpected modifications, or deletions.
Automating Workflows: CI/CD Integration
Integrating Terraform workspace management into your Continuous Integration/Continuous Deployment (CI/CD) pipelines is essential for reliable and repeatable infrastructure changes. The key is dynamically selecting the correct workspace based on the pipeline’s context (e.g., branch name, environment variable).
Common CI/CD Patterns:
- Workspace Selection: Use environment variables provided by the CI/CD system (like
CI_ENVIRONMENT_NAME
in GitLab,Build.EnvironmentName
or custom variables in Azure DevOps, context/parameters in CircleCI) to dynamically select or create the target workspace usingterraform workspace select $ENV_NAME || terraform workspace new $ENV_NAME
. - Plan and Apply Stages: Separate
terraform plan
andterraform apply
into distinct stages or jobs. Store the plan file as an artifact and require manual approval before applying changes to sensitive environments like production. - Authentication: Use secure methods for authentication, such as OIDC (OpenID Connect) with cloud providers (AWS IAM Roles for Service Accounts, Azure Managed Identity, GCP Workload Identity Federation) or securely injected temporary credentials, rather than storing static keys in the pipeline.
- Variable Injection: Pass environment-specific variables (like those defined in
locals
maps earlier) to Terraform using-var
flags or.tfvars
files generated dynamically or selected based on the target workspace/environment.
Here are conceptual examples for popular platforms:
GitLab CI Example
# .gitlab-ci.yml
stages:
- validate
- plan
- apply
variables:
# TF_ROOT: Specify the directory containing your Terraform code if not root
TF_PLAN_FILE: plan.tfplan
# Use GitLab environments (e.g., 'development', 'staging', 'production')
TF_WORKSPACE: ${CI_ENVIRONMENT_SLUG} # Slugified environment name
default:
image: hashicorp/terraform:latest
before_script:
# Configure AWS credentials securely (e.g., using OIDC or CI/CD variables)
# - export AWS_ROLE_ARN=...
# - export AWS_WEB_IDENTITY_TOKEN_FILE=...
- cd ${TF_ROOT:-.} # Navigate to Terraform code directory
- terraform --version
- terraform init -input=false # Initialize backend
validate:
stage: validate
script:
- terraform validate
plan:
stage: plan
script:
# Select workspace, create if it doesn't exist
- terraform workspace select ${TF_WORKSPACE} || terraform workspace new ${TF_WORKSPACE}
# Generate plan, potentially passing environment-specific vars
- terraform plan -out=${TF_PLAN_FILE} -input=false # -var-file="config/${TF_WORKSPACE}.tfvars"
artifacts:
paths:
- ${TF_ROOT:-.}/${TF_PLAN_FILE}
expire_in: 1 day
# Only run on merge requests or specific branches
rules:
- if: $CI_PIPELINE_SOURCE == 'merge_request_event'
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
apply:
stage: apply
script:
- terraform workspace select ${TF_WORKSPACE}
# Apply the saved plan
- terraform apply -input=false ${TF_PLAN_FILE}
dependencies:
- plan
# Protect production environment
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH && $CI_ENVIRONMENT_NAME == 'production'
when: manual # Require manual trigger for production apply
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH && $CI_ENVIRONMENT_NAME != 'production'
when: on_success # Auto-apply for non-prod on default branch
Azure DevOps Pipeline Example (YAML)
# azure-pipelines.yml
trigger:
branches:
include:
- main # Or your default branch
- release/*
pool:
vmImage: ubuntu-latest
variables:
# Define TF_WORKSPACE based on branch or pipeline variables
# Example: Use 'prod' for main branch, 'staging' for release/*, 'dev' otherwise
- name: TF_WORKSPACE
${{ if eq(variables['Build.SourceBranchName'], 'main') }}:
value: prod
${{ elseif startsWith(variables['Build.SourceBranchName'], 'release') }}:
value: staging
${{ else }}:
value: dev # Default or based on feature branch name
- name: TF_PLAN_FILE
value: '$(Pipeline.Workspace)/tfplan'
# Define service connection and backend details
- name: AWS_SERVICE_CONNECTION
value: 'Your-AWS-Service-Connection-Name' # Replace with your service connection
- name: TF_BACKEND_BUCKET
value: 'your-company-terraform-states'
- name: TF_BACKEND_KEY_PREFIX
value: 'your-project' # e.g., microservices-platform
- name: TF_BACKEND_REGION
value: 'us-west-2'
- name: TF_BACKEND_DYNAMODB
value: 'your-company-terraform-locks'
stages:
- stage: Plan
jobs:
- job: TerraformPlan
steps:
- task: TerraformInstaller@1
displayName: 'Install Terraform'
inputs:
terraformVersion: 'latest'
- task: TerraformTask@4
displayName: 'Terraform Init'
inputs:
provider: 'aws'
command: 'init'
# workingDirectory: '$(System.DefaultWorkingDirectory)/terraform' # If TF code is in subdir
backendServiceAWS: $(AWS_SERVICE_CONNECTION)
backendAWSBucketName: $(TF_BACKEND_BUCKET)
backendAWSKey: '$(TF_WORKSPACE)/$(TF_BACKEND_KEY_PREFIX)/terraform.tfstate'
backendAWSRegion: $(TF_BACKEND_REGION)
backendAWSDynamoDBTableName: $(TF_BACKEND_DYNAMODB)
- task: TerraformTask@4
displayName: 'Terraform Workspace'
inputs:
provider: 'aws'
command: 'workspace'
commandOptions: 'select $(TF_WORKSPACE) || terraform workspace new $(TF_WORKSPACE)'
environmentServiceNameAWS: $(AWS_SERVICE_CONNECTION)
- task: TerraformTask@4
displayName: 'Terraform Plan'
inputs:
provider: 'aws'
command: 'plan'
# commandOptions: '-var-file="config/$(TF_WORKSPACE).tfvars"' # Pass vars if needed
environmentServiceNameAWS: $(AWS_SERVICE_CONNECTION)
publishPlanResults: 'tfplan' # Task variable name for the plan path
- publish: $(tfplan) # Use the task variable directly
artifact: TerraformPlan
- stage: Apply
dependsOn: Plan
# Condition to only run on specific branches or after approval
condition: |
and(
succeeded('Plan'),
or(
eq(variables['Build.SourceBranchName'], 'main'),
startsWith(variables['Build.SourceBranchName'], 'release')
)
)
jobs:
- deployment: TerraformApply # Use deployment job for environments/approvals
environment: '$(TF_WORKSPACE)' # Map to Azure DevOps Environment
strategy:
runOnce:
deploy:
steps:
- checkout: self
- download: current
artifact: TerraformPlan
- task: TerraformInstaller@1
displayName: 'Install Terraform'
inputs:
terraformVersion: 'latest'
# Init is needed again in deployment jobs
- task: TerraformTask@4
displayName: 'Terraform Init'
inputs:
provider: 'aws'
command: 'init'
backendServiceAWS: $(AWS_SERVICE_CONNECTION)
backendAWSBucketName: $(TF_BACKEND_BUCKET)
backendAWSKey: '$(TF_WORKSPACE)/$(TF_BACKEND_KEY_PREFIX)/terraform.tfstate'
backendAWSRegion: $(TF_BACKEND_REGION)
backendAWSDynamoDBTableName: $(TF_BACKEND_DYNAMODB)
- task: TerraformTask@4
displayName: 'Terraform Workspace Select'
inputs:
provider: 'aws'
command: 'workspace'
commandOptions: 'select $(TF_WORKSPACE)'
environmentServiceNameAWS: $(AWS_SERVICE_CONNECTION)
- task: TerraformTask@4
displayName: 'Terraform Apply'
inputs:
provider: 'aws'
command: 'apply'
# Use the downloaded plan file path
commandOptions: '"$(Pipeline.Workspace)/TerraformPlan/tfplan"'
environmentServiceNameAWS: $(AWS_SERVICE_CONNECTION)
CircleCI Example (Using Orb)
# .circleci/config.yml
version: 2.1
orbs:
# Use the official Terraform orb
terraform: circleci/terraform@3.2 # Check for latest version
# Define reusable executor (optional)
executors:
terraform-executor:
docker:
- image: hashicorp/terraform:latest # Use desired Terraform version
# Define reusable commands (optional)
commands:
select_workspace:
parameters:
workspace_name:
type: string
steps:
- run:
name: Select or Create Workspace << parameters.workspace_name >>
command: |
terraform workspace select << parameters.workspace_name >> || terraform workspace new << parameters.workspace_name >>
workflows:
plan_and_apply:
jobs:
# Job to run terraform plan
- terraform/plan:
# Define backend config using parameters or context
backend-type: s3
backend-config: |
bucket=your-company-terraform-states
key=${CIRCLE_BRANCH:-main}/your-project/terraform.tfstate
region=us-west-2
dynamodb_table=your-company-terraform-locks
encrypt=true
# Use executor if defined
executor: terraform-executor
# Define workspace based on branch or pipeline parameters
workspace: ${CIRCLE_BRANCH:-main} # Example: use branch name as workspace
# Persist plan file to workspace
persist-plan: true
# Add context for secure credentials
context: aws-creds # Your CircleCI context name
# Job to run terraform apply, requires manual approval for main branch
- terraform/apply:
# Use executor if defined
executor: terraform-executor
# Define workspace
workspace: ${CIRCLE_BRANCH:-main}
# Requires the plan job
requires:
- terraform/plan
# Add context for secure credentials
context: aws-creds
# Filter to only run on specific branches
filters:
branches:
only:
- main # Only run apply on the main branch
# - staging # Add other branches as needed
# Optional: Add manual approval step before applying to production (main branch)
- hold-for-prod-apply:
type: approval
requires:
- terraform/plan # Depends on plan completing
filters:
branches:
only:
- main # Only hold for the main branch
# Apply job specifically for production after approval
- terraform/apply:
name: apply-prod
executor: terraform-executor
workspace: main # Explicitly set workspace for prod
requires:
- hold-for-prod-apply # Requires manual approval
context: aws-creds
# Attach the plan workspace from the plan job
attach-plan-workspace: true
filters:
branches:
only:
- main
Terraform Workspace Best Practices Checklist
Here’s a quick checklist summarizing key best practices:
- Use Remote Backends: Essential for collaboration, state locking, and versioning. (e.g., S3, Azure Blob, Terraform Cloud).
- Implement State Locking: Prevent concurrent modifications and state corruption (e.g., DynamoDB for S3).
- Consistent Workspace Naming: Use clear, predictable names (e.g.,
dev
,staging
,prod
,feat-branch-name
). - Map Variables to Workspaces: Use
.tfvars
, maps inlocals
, or CI/CD variables to manage environment-specific configurations. Don’t hardcode environment differences directly in resources. - Secure State Files: Employ encryption (SSE-S3, SSE-KMS), least-privilege IAM policies, and bucket policies.
- Enable State Versioning: Critical safety net for recovering previous state versions (e.g., enable S3 bucket versioning).
- Integrate with CI/CD: Automate
plan
andapply
workflows, using dynamic workspace selection and secure authentication. - Protect Production: Implement safeguards like manual approvals for
apply
, account ID checks, andprevent_destroy
lifecycle rules. - Audit Regularly: Use CloudTrail or similar tools to monitor access and changes to state files and lock mechanisms.
- Clean Up Temporary Workspaces: Establish a process (manual or automated with caution) for removing state associated with short-lived environments (e.g., feature branches).
- Consider Alternatives for Code Separation: Use modules or separate configurations for logically distinct infrastructure components, rather than relying solely on workspaces for code organization.
References
- Terraform: Workspaces (Official Documentation)
- Terraform: Backend Configuration (Official Documentation)
- Terraform: S3 Backend (Official Documentation)
- HashiCorp Cloud Recommended Practices
- Using Terraform with Azure DevOps
- CircleCI Terraform Orb
Conclusion
Terraform workspaces are a powerful tool for managing multiple states from a single configuration, particularly for environment separation. However, using them effectively requires careful planning around backend configuration, state organization, security, and CI/CD integration. By implementing the best practices outlined in this guide – including robust backend setup, clear naming conventions, environment-specific variable mapping, strong security measures, and automated pipelines – you can leverage workspaces to build and maintain scalable, reliable, and secure infrastructure across all your environments. Remember that workspaces manage state, while modules and separate configurations are better suited for organizing code. Choose the right tool for the job, and keep terraforming efficiently! 🚀
Comments