Skip to main content

Infrastructure Automation with Terraform: Enterprise-Scale Cloud Resource Management

Author
24 min
4901 words
--

AI Summary

This article provides comprehensive insights into "Infrastructure Automation with Terraform: Enterprise-Scale Cloud Resource Management", exploring key concepts, practical applications, and future developments to offer readers a thorough understanding of the subject matter.

Content generated by AI

Infrastructure Automation with Terraform: Enterprise-Scale Cloud Resource Management

Infrastructure as Code (IaC) has become essential for modern cloud operations, enabling teams to manage infrastructure with the same rigor and practices used for application code. Terraform, as a leading IaC tool, provides a declarative approach to infrastructure management across multiple cloud providers. This comprehensive guide explores enterprise-scale Terraform implementations, advanced patterns, and best practices for production environments.

Terraform Enterprise Architecture

Project Structure and Organization

# Directory structure for enterprise Terraform projects
terraform-infrastructure/
├── environments/
   ├── dev/
      ├── main.tf
      ├── variables.tf
      ├── outputs.tf
      └── terraform.tfvars
   ├── staging/
   └── production/
├── modules/
   ├── vpc/
   ├── eks/
   ├── rds/
   ├── security-groups/
   └── iam/
├── shared/
   ├── backend.tf
   ├── providers.tf
   └── versions.tf
├── policies/
   ├── sentinel/
   └── opa/
└── scripts/
    ├── deploy.sh
    ├── plan.sh
    └── destroy.sh

Backend Configuration with State Management

# shared/backend.tf
terraform {
  required_version = ">= 1.5.0"
  
  backend "s3" {
    bucket         = "terraform-state-company-prod"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    
    # Workspace-specific state files
    workspace_key_prefix = "workspaces"
    
    # Additional security
    kms_key_id = "arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012"
    
    # State file versioning
    versioning = true
    
    # Server-side encryption
    server_side_encryption_configuration {
      rule {
        apply_server_side_encryption_by_default {
          kms_master_key_id = "arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012"
          sse_algorithm     = "aws:kms"
        }
      }
    }
  }
}

# State locking with DynamoDB
resource "aws_dynamodb_table" "terraform_state_lock" {
  name           = "terraform-state-lock"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  server_side_encryption {
    enabled     = true
    kms_key_arn = aws_kms_key.terraform_state.arn
  }

  point_in_time_recovery {
    enabled = true
  }

  tags = {
    Name        = "terraform-state-lock"
    Environment = "shared"
    Purpose     = "terraform-state-locking"
  }
}

# KMS key for state encryption
resource "aws_kms_key" "terraform_state" {
  description             = "KMS key for Terraform state encryption"
  deletion_window_in_days = 7
  enable_key_rotation     = true

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::123456789012:root"
        }
        Action   = "kms:*"
        Resource = "*"
      },
      {
        Sid    = "Allow Terraform Service Role"
        Effect = "Allow"
        Principal = {
          AWS = [
            "arn:aws:iam::123456789012:role/TerraformExecutionRole",
            "arn:aws:iam::123456789012:role/TerraformPlanRole"
          ]
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:ReEncrypt*",
          "kms:GenerateDataKey*",
          "kms:DescribeKey"
        ]
        Resource = "*"
      }
    ]
  })

  tags = {
    Name        = "terraform-state-key"
    Environment = "shared"
    Purpose     = "terraform-state-encryption"
  }
}

resource "aws_kms_alias" "terraform_state" {
  name          = "alias/terraform-state"
  target_key_id = aws_kms_key.terraform_state.key_id
}

Provider Configuration

# shared/providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 4.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
  }
}

# AWS Provider Configuration
provider "aws" {
  region = var.aws_region
  
  # Assume role for cross-account access
  assume_role {
    role_arn     = var.aws_assume_role_arn
    session_name = "terraform-${var.environment}"
    external_id  = var.aws_external_id
  }
  
  # Default tags for all resources
  default_tags {
    tags = {
      Environment   = var.environment
      Project       = var.project_name
      ManagedBy     = "terraform"
      Owner         = var.team_name
      CostCenter    = var.cost_center
      CreatedDate   = formatdate("YYYY-MM-DD", timestamp())
    }
  }
  
  # Retry configuration
  retry_mode      = "adaptive"
  max_retries     = 3
  
  # Request timeout
  http_timeout = "30s"
}

# Azure Provider Configuration
provider "azurerm" {
  features {
    key_vault {
      purge_soft_delete_on_destroy    = true
      recover_soft_deleted_key_vaults = true
    }
    
    resource_group {
      prevent_deletion_if_contains_resources = false
    }
    
    virtual_machine {
      delete_os_disk_on_deletion     = true
      graceful_shutdown              = false
      skip_shutdown_and_force_delete = false
    }
  }
  
  # Service Principal authentication
  client_id       = var.azure_client_id
  client_secret   = var.azure_client_secret
  tenant_id       = var.azure_tenant_id
  subscription_id = var.azure_subscription_id
  
  # Skip provider registration
  skip_provider_registration = true
}

# Google Cloud Provider Configuration
provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
  zone    = var.gcp_zone
  
  # Service account key
  credentials = var.gcp_credentials_file
  
  # Request timeout
  request_timeout = "60s"
  
  # Batching configuration
  batching {
    send_after      = "10s"
    enable_batching = true
  }
}

# Kubernetes Provider Configuration
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  
  # Alternative: using kubeconfig
  # config_path = "~/.kube/config"
  # config_context = "production-cluster"
}

# Helm Provider Configuration
provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
  
  # Helm repository cache
  repository_cache = "/tmp/.helmcache"
  repository_config_path = "/tmp/.helmrc"
  
  # Debug mode
  debug = var.helm_debug
}

Advanced Terraform Modules

VPC Module with Advanced Networking

# modules/vpc/main.tf
locals {
  # Calculate subnet CIDRs automatically
  public_subnet_cidrs = [
    for i in range(var.public_subnet_count) :
    cidrsubnet(var.vpc_cidr, 8, i)
  ]
  
  private_subnet_cidrs = [
    for i in range(var.private_subnet_count) :
    cidrsubnet(var.vpc_cidr, 8, i + var.public_subnet_count)
  ]
  
  database_subnet_cidrs = [
    for i in range(var.database_subnet_count) :
    cidrsubnet(var.vpc_cidr, 8, i + var.public_subnet_count + var.private_subnet_count)
  ]
  
  # Availability zones
  azs = slice(data.aws_availability_zones.available.names, 0, max(
    var.public_subnet_count,
    var.private_subnet_count,
    var.database_subnet_count
  ))
}

# Data sources
data "aws_availability_zones" "available" {
  state = "available"
  
  filter {
    name   = "opt-in-status"
    values = ["opt-in-not-required"]
  }
}

data "aws_region" "current" {}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = var.enable_dns_hostnames
  enable_dns_support   = var.enable_dns_support
  
  # IPv6 support
  assign_generated_ipv6_cidr_block = var.enable_ipv6
  
  tags = merge(var.tags, {
    Name = "${var.name}-vpc"
    Type = "vpc"
  })
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  count = var.create_igw ? 1 : 0
  
  vpc_id = aws_vpc.main.id
  
  tags = merge(var.tags, {
    Name = "${var.name}-igw"
    Type = "internet-gateway"
  })
}

# Egress-only Internet Gateway for IPv6
resource "aws_egress_only_internet_gateway" "main" {
  count = var.enable_ipv6 && var.create_egress_only_igw ? 1 : 0
  
  vpc_id = aws_vpc.main.id
  
  tags = merge(var.tags, {
    Name = "${var.name}-eigw"
    Type = "egress-only-internet-gateway"
  })
}

# Public Subnets
resource "aws_subnet" "public" {
  count = var.public_subnet_count
  
  vpc_id                  = aws_vpc.main.id
  cidr_block              = local.public_subnet_cidrs[count.index]
  availability_zone       = local.azs[count.index]
  map_public_ip_on_launch = var.map_public_ip_on_launch
  
  # IPv6 support
  ipv6_cidr_block                 = var.enable_ipv6 ? cidrsubnet(aws_vpc.main.ipv6_cidr_block, 8, count.index) : null
  assign_ipv6_address_on_creation = var.enable_ipv6 ? var.assign_ipv6_address_on_creation : false
  
  tags = merge(var.tags, {
    Name = "${var.name}-public-${local.azs[count.index]}"
    Type = "public"
    Tier = "public"
    "kubernetes.io/role/elb" = "1"
  })
}

# Private Subnets
resource "aws_subnet" "private" {
  count = var.private_subnet_count
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.private_subnet_cidrs[count.index]
  availability_zone = local.azs[count.index]
  
  # IPv6 support
  ipv6_cidr_block                 = var.enable_ipv6 ? cidrsubnet(aws_vpc.main.ipv6_cidr_block, 8, count.index + var.public_subnet_count) : null
  assign_ipv6_address_on_creation = var.enable_ipv6 ? var.assign_ipv6_address_on_creation : false
  
  tags = merge(var.tags, {
    Name = "${var.name}-private-${local.azs[count.index]}"
    Type = "private"
    Tier = "private"
    "kubernetes.io/role/internal-elb" = "1"
  })
}

# Database Subnets
resource "aws_subnet" "database" {
  count = var.database_subnet_count
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.database_subnet_cidrs[count.index]
  availability_zone = local.azs[count.index]
  
  tags = merge(var.tags, {
    Name = "${var.name}-database-${local.azs[count.index]}"
    Type = "database"
    Tier = "database"
  })
}

# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
  count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : var.private_subnet_count) : 0
  
  domain = "vpc"
  
  depends_on = [aws_internet_gateway.main]
  
  tags = merge(var.tags, {
    Name = "${var.name}-nat-eip-${count.index + 1}"
    Type = "nat-gateway-eip"
  })
}

# NAT Gateways
resource "aws_nat_gateway" "main" {
  count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : var.private_subnet_count) : 0
  
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[var.single_nat_gateway ? 0 : count.index].id
  
  depends_on = [aws_internet_gateway.main]
  
  tags = merge(var.tags, {
    Name = "${var.name}-nat-gateway-${count.index + 1}"
    Type = "nat-gateway"
  })
}

# Route Tables
resource "aws_route_table" "public" {
  count = var.public_subnet_count > 0 ? 1 : 0
  
  vpc_id = aws_vpc.main.id
  
  tags = merge(var.tags, {
    Name = "${var.name}-public-rt"
    Type = "public-route-table"
  })
}

resource "aws_route_table" "private" {
  count = var.private_subnet_count
  
  vpc_id = aws_vpc.main.id
  
  tags = merge(var.tags, {
    Name = "${var.name}-private-rt-${count.index + 1}"
    Type = "private-route-table"
  })
}

resource "aws_route_table" "database" {
  count = var.database_subnet_count > 0 && var.create_database_subnet_route_table ? 1 : 0
  
  vpc_id = aws_vpc.main.id
  
  tags = merge(var.tags, {
    Name = "${var.name}-database-rt"
    Type = "database-route-table"
  })
}

# Routes
resource "aws_route" "public_internet_gateway" {
  count = var.create_igw && var.public_subnet_count > 0 ? 1 : 0
  
  route_table_id         = aws_route_table.public[0].id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.main[0].id
  
  timeouts {
    create = "5m"
  }
}

resource "aws_route" "public_internet_gateway_ipv6" {
  count = var.create_igw && var.enable_ipv6 && var.public_subnet_count > 0 ? 1 : 0
  
  route_table_id              = aws_route_table.public[0].id
  destination_ipv6_cidr_block = "::/0"
  gateway_id                  = aws_internet_gateway.main[0].id
  
  timeouts {
    create = "5m"
  }
}

resource "aws_route" "private_nat_gateway" {
  count = var.enable_nat_gateway ? var.private_subnet_count : 0
  
  route_table_id         = aws_route_table.private[count.index].id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.main[var.single_nat_gateway ? 0 : count.index].id
  
  timeouts {
    create = "5m"
  }
}

resource "aws_route" "private_ipv6_egress" {
  count = var.enable_ipv6 && var.create_egress_only_igw ? var.private_subnet_count : 0
  
  route_table_id              = aws_route_table.private[count.index].id
  destination_ipv6_cidr_block = "::/0"
  egress_only_gateway_id      = aws_egress_only_internet_gateway.main[0].id
  
  timeouts {
    create = "5m"
  }
}

# Route Table Associations
resource "aws_route_table_association" "public" {
  count = var.public_subnet_count
  
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public[0].id
}

resource "aws_route_table_association" "private" {
  count = var.private_subnet_count
  
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

resource "aws_route_table_association" "database" {
  count = var.database_subnet_count > 0 && var.create_database_subnet_route_table ? var.database_subnet_count : 0
  
  subnet_id      = aws_subnet.database[count.index].id
  route_table_id = aws_route_table.database[0].id
}

# Database Subnet Group
resource "aws_db_subnet_group" "database" {
  count = var.database_subnet_count > 0 && var.create_database_subnet_group ? 1 : 0
  
  name       = "${var.name}-database-subnet-group"
  subnet_ids = aws_subnet.database[*].id
  
  tags = merge(var.tags, {
    Name = "${var.name}-database-subnet-group"
    Type = "database-subnet-group"
  })
}

# VPC Flow Logs
resource "aws_flow_log" "vpc" {
  count = var.enable_flow_log ? 1 : 0
  
  iam_role_arn    = aws_iam_role.flow_log[0].arn
  log_destination = aws_cloudwatch_log_group.vpc_flow_log[0].arn
  traffic_type    = var.flow_log_traffic_type
  vpc_id          = aws_vpc.main.id
  
  tags = merge(var.tags, {
    Name = "${var.name}-vpc-flow-log"
    Type = "vpc-flow-log"
  })
}

resource "aws_cloudwatch_log_group" "vpc_flow_log" {
  count = var.enable_flow_log ? 1 : 0
  
  name              = "/aws/vpc/flow-logs/${var.name}"
  retention_in_days = var.flow_log_retention_in_days
  kms_key_id        = var.flow_log_kms_key_id
  
  tags = merge(var.tags, {
    Name = "${var.name}-vpc-flow-log-group"
    Type = "cloudwatch-log-group"
  })
}

resource "aws_iam_role" "flow_log" {
  count = var.enable_flow_log ? 1 : 0
  
  name = "${var.name}-vpc-flow-log-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "vpc-flow-logs.amazonaws.com"
        }
      }
    ]
  })
  
  tags = merge(var.tags, {
    Name = "${var.name}-vpc-flow-log-role"
    Type = "iam-role"
  })
}

resource "aws_iam_role_policy" "flow_log" {
  count = var.enable_flow_log ? 1 : 0
  
  name = "${var.name}-vpc-flow-log-policy"
  role = aws_iam_role.flow_log[0].id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents",
          "logs:DescribeLogGroups",
          "logs:DescribeLogStreams"
        ]
        Effect = "Allow"
        Resource = "*"
      }
    ]
  })
}

# VPC Endpoints
resource "aws_vpc_endpoint" "s3" {
  count = var.enable_s3_endpoint ? 1 : 0
  
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.${data.aws_region.current.name}.s3"
  
  tags = merge(var.tags, {
    Name = "${var.name}-s3-endpoint"
    Type = "vpc-endpoint"
  })
}

resource "aws_vpc_endpoint_route_table_association" "s3_private" {
  count = var.enable_s3_endpoint ? var.private_subnet_count : 0
  
  vpc_endpoint_id = aws_vpc_endpoint.s3[0].id
  route_table_id  = aws_route_table.private[count.index].id
}

resource "aws_vpc_endpoint" "dynamodb" {
  count = var.enable_dynamodb_endpoint ? 1 : 0
  
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.${data.aws_region.current.name}.dynamodb"
  
  tags = merge(var.tags, {
    Name = "${var.name}-dynamodb-endpoint"
    Type = "vpc-endpoint"
  })
}

resource "aws_vpc_endpoint_route_table_association" "dynamodb_private" {
  count = var.enable_dynamodb_endpoint ? var.private_subnet_count : 0
  
  vpc_endpoint_id = aws_vpc_endpoint.dynamodb[0].id
  route_table_id  = aws_route_table.private[count.index].id
}

# Interface VPC Endpoints
resource "aws_vpc_endpoint" "interface_endpoints" {
  for_each = var.interface_endpoints
  
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.${each.key}"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoint[0].id]
  private_dns_enabled = true
  
  policy = each.value.policy
  
  tags = merge(var.tags, {
    Name = "${var.name}-${each.key}-endpoint"
    Type = "vpc-endpoint"
  })
}

resource "aws_security_group" "vpc_endpoint" {
  count = length(var.interface_endpoints) > 0 ? 1 : 0
  
  name_prefix = "${var.name}-vpc-endpoint-"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = merge(var.tags, {
    Name = "${var.name}-vpc-endpoint-sg"
    Type = "security-group"
  })
}

EKS Module with Advanced Configuration

# modules/eks/main.tf
locals {
  cluster_name = "${var.cluster_name}-${var.environment}"
  
  # Node group configurations
  node_groups = {
    for k, v in var.node_groups : k => merge({
      instance_types = ["t3.medium"]
      capacity_type  = "ON_DEMAND"
      disk_size      = 50
      disk_type      = "gp3"
      
      scaling_config = {
        desired_size = 2
        max_size     = 10
        min_size     = 1
      }
      
      update_config = {
        max_unavailable_percentage = 25
      }
      
      # Kubernetes labels
      labels = {}
      
      # Kubernetes taints
      taints = []
      
      # Launch template
      launch_template = {}
      
      # User data
      user_data = ""
      
      # Security groups
      additional_security_group_ids = []
      
      # Subnets
      subnet_ids = []
      
      # Tags
      tags = {}
    }, v)
  }
}

# Data sources
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
data "aws_partition" "current" {}

data "aws_iam_policy_document" "cluster_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]
    
    principals {
      type        = "Service"
      identifiers = ["eks.amazonaws.com"]
    }
  }
}

data "aws_iam_policy_document" "node_group_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]
    
    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

# KMS key for EKS cluster encryption
resource "aws_kms_key" "eks" {
  count = var.create_kms_key ? 1 : 0
  
  description             = "EKS Secret Encryption Key for ${local.cluster_name}"
  deletion_window_in_days = var.kms_key_deletion_window_in_days
  enable_key_rotation     = var.enable_kms_key_rotation
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:root"
        }
        Action   = "kms:*"
        Resource = "*"
      },
      {
        Sid    = "Allow EKS Service"
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:ReEncrypt*",
          "kms:GenerateDataKey*",
          "kms:DescribeKey"
        ]
        Resource = "*"
      }
    ]
  })
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-eks-key"
    Type = "kms-key"
  })
}

resource "aws_kms_alias" "eks" {
  count = var.create_kms_key ? 1 : 0
  
  name          = "alias/${local.cluster_name}-eks"
  target_key_id = aws_kms_key.eks[0].key_id
}

# EKS Cluster IAM Role
resource "aws_iam_role" "cluster" {
  name               = "${local.cluster_name}-cluster-role"
  assume_role_policy = data.aws_iam_policy_document.cluster_assume_role_policy.json
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-cluster-role"
    Type = "iam-role"
  })
}

resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
  policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}

resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSVPCResourceController" {
  policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonEKSVPCResourceController"
  role       = aws_iam_role.cluster.name
}

# Additional cluster policies
resource "aws_iam_role_policy" "cluster_additional" {
  count = length(var.cluster_additional_policies) > 0 ? 1 : 0
  
  name = "${local.cluster_name}-cluster-additional-policy"
  role = aws_iam_role.cluster.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = var.cluster_additional_policies
  })
}

# EKS Cluster Security Group
resource "aws_security_group" "cluster" {
  name_prefix = "${local.cluster_name}-cluster-"
  vpc_id      = var.vpc_id
  description = "EKS cluster security group"
  
  # Allow all outbound traffic
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-cluster-sg"
    Type = "security-group"
  })
}

# Cluster security group rules
resource "aws_security_group_rule" "cluster_ingress_workstation_https" {
  count = length(var.cluster_endpoint_private_access_cidrs) > 0 ? 1 : 0
  
  description       = "Allow workstation to communicate with the cluster API Server"
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = var.cluster_endpoint_private_access_cidrs
  security_group_id = aws_security_group.cluster.id
}

# Node group security group
resource "aws_security_group" "node_group" {
  name_prefix = "${local.cluster_name}-node-group-"
  vpc_id      = var.vpc_id
  description = "EKS node group security group"
  
  # Allow all outbound traffic
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = merge(var.tags, {
    Name                                        = "${local.cluster_name}-node-group-sg"
    Type                                        = "security-group"
    "kubernetes.io/cluster/${local.cluster_name}" = "owned"
  })
}

# Node group security group rules
resource "aws_security_group_rule" "node_group_ingress_self" {
  description              = "Allow node to communicate with each other"
  type                     = "ingress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "-1"
  source_security_group_id = aws_security_group.node_group.id
  security_group_id        = aws_security_group.node_group.id
}

resource "aws_security_group_rule" "node_group_ingress_cluster_https" {
  description              = "Allow pods to communicate with the cluster API Server"
  type                     = "ingress"
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.cluster.id
  security_group_id        = aws_security_group.node_group.id
}

resource "aws_security_group_rule" "node_group_ingress_cluster_kubelet" {
  description              = "Allow cluster control plane to communicate with worker node kubelet"
  type                     = "ingress"
  from_port                = 10250
  to_port                  = 10250
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.cluster.id
  security_group_id        = aws_security_group.node_group.id
}

resource "aws_security_group_rule" "cluster_ingress_node_group_https" {
  description              = "Allow pods to communicate with the cluster API Server"
  type                     = "ingress"
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.node_group.id
  security_group_id        = aws_security_group.cluster.id
}

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = local.cluster_name
  role_arn = aws_iam_role.cluster.arn
  version  = var.cluster_version
  
  vpc_config {
    subnet_ids              = var.subnet_ids
    endpoint_private_access = var.cluster_endpoint_private_access
    endpoint_public_access  = var.cluster_endpoint_public_access
    public_access_cidrs     = var.cluster_endpoint_public_access_cidrs
    security_group_ids      = [aws_security_group.cluster.id]
  }
  
  # Encryption configuration
  dynamic "encryption_config" {
    for_each = var.cluster_encryption_config
    
    content {
      provider {
        key_arn = var.create_kms_key ? aws_kms_key.eks[0].arn : encryption_config.value.provider_key_arn
      }
      resources = encryption_config.value.resources
    }
  }
  
  # Logging configuration
  enabled_cluster_log_types = var.cluster_enabled_log_types
  
  # Add-ons will be managed separately
  depends_on = [
    aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.cluster_AmazonEKSVPCResourceController,
    aws_cloudwatch_log_group.cluster,
  ]
  
  tags = merge(var.tags, {
    Name = local.cluster_name
    Type = "eks-cluster"
  })
}

# CloudWatch Log Group for EKS cluster logs
resource "aws_cloudwatch_log_group" "cluster" {
  name              = "/aws/eks/${local.cluster_name}/cluster"
  retention_in_days = var.cloudwatch_log_group_retention_in_days
  kms_key_id        = var.cloudwatch_log_group_kms_key_id
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-cluster-logs"
    Type = "cloudwatch-log-group"
  })
}

# EKS Node Group IAM Role
resource "aws_iam_role" "node_group" {
  name               = "${local.cluster_name}-node-group-role"
  assume_role_policy = data.aws_iam_policy_document.node_group_assume_role_policy.json
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-node-group-role"
    Type = "iam-role"
  })
}

resource "aws_iam_role_policy_attachment" "node_group_AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.node_group.name
}

resource "aws_iam_role_policy_attachment" "node_group_AmazonEKS_CNI_Policy" {
  policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.node_group.name
}

resource "aws_iam_role_policy_attachment" "node_group_AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.node_group.name
}

resource "aws_iam_role_policy_attachment" "node_group_AmazonSSMManagedInstanceCore" {
  count = var.enable_ssm ? 1 : 0
  
  policy_arn = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore"
  role       = aws_iam_role.node_group.name
}

# Additional node group policies
resource "aws_iam_role_policy" "node_group_additional" {
  count = length(var.node_group_additional_policies) > 0 ? 1 : 0
  
  name = "${local.cluster_name}-node-group-additional-policy"
  role = aws_iam_role.node_group.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = var.node_group_additional_policies
  })
}

# Launch template for node groups
resource "aws_launch_template" "node_group" {
  for_each = local.node_groups
  
  name_prefix = "${local.cluster_name}-${each.key}-"
  
  vpc_security_group_ids = concat(
    [aws_security_group.node_group.id],
    each.value.additional_security_group_ids
  )
  
  user_data = base64encode(templatefile("${path.module}/user_data.sh", {
    cluster_name        = local.cluster_name
    cluster_endpoint    = aws_eks_cluster.main.endpoint
    cluster_ca          = aws_eks_cluster.main.certificate_authority[0].data
    bootstrap_arguments = each.value.bootstrap_arguments
    user_data_script    = each.value.user_data
  }))
  
  dynamic "block_device_mappings" {
    for_each = each.value.block_device_mappings
    
    content {
      device_name = block_device_mappings.value.device_name
      
      ebs {
        volume_size           = block_device_mappings.value.ebs.volume_size
        volume_type           = block_device_mappings.value.ebs.volume_type
        iops                  = block_device_mappings.value.ebs.iops
        throughput            = block_device_mappings.value.ebs.throughput
        encrypted             = block_device_mappings.value.ebs.encrypted
        kms_key_id            = block_device_mappings.value.ebs.kms_key_id
        delete_on_termination = block_device_mappings.value.ebs.delete_on_termination
      }
    }
  }
  
  dynamic "metadata_options" {
    for_each = each.value.metadata_options != null ? [each.value.metadata_options] : []
    
    content {
      http_endpoint               = metadata_options.value.http_endpoint
      http_tokens                 = metadata_options.value.http_tokens
      http_put_response_hop_limit = metadata_options.value.http_put_response_hop_limit
      instance_metadata_tags      = metadata_options.value.instance_metadata_tags
    }
  }
  
  dynamic "monitoring" {
    for_each = each.value.enable_monitoring ? [1] : []
    
    content {
      enabled = true
    }
  }
  
  tag_specifications {
    resource_type = "instance"
    tags = merge(var.tags, each.value.tags, {
      Name = "${local.cluster_name}-${each.key}-node"
      Type = "eks-node"
    })
  }
  
  tag_specifications {
    resource_type = "volume"
    tags = merge(var.tags, each.value.tags, {
      Name = "${local.cluster_name}-${each.key}-volume"
      Type = "eks-node-volume"
    })
  }
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-${each.key}-lt"
    Type = "launch-template"
  })
}

# EKS Node Groups
resource "aws_eks_node_group" "main" {
  for_each = local.node_groups
  
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${local.cluster_name}-${each.key}"
  node_role_arn   = aws_iam_role.node_group.arn
  subnet_ids      = length(each.value.subnet_ids) > 0 ? each.value.subnet_ids : var.subnet_ids
  
  instance_types = each.value.instance_types
  capacity_type  = each.value.capacity_type
  disk_size      = each.value.disk_size
  ami_type       = each.value.ami_type
  release_version = each.value.release_version
  version        = each.value.version
  
  scaling_config {
    desired_size = each.value.scaling_config.desired_size
    max_size     = each.value.scaling_config.max_size
    min_size     = each.value.scaling_config.min_size
  }
  
  update_config {
    max_unavailable_percentage = each.value.update_config.max_unavailable_percentage
  }
  
  # Launch template
  launch_template {
    id      = aws_launch_template.node_group[each.key].id
    version = aws_launch_template.node_group[each.key].latest_version
  }
  
  # Labels
  labels = merge(each.value.labels, {
    "node-group" = each.key
  })
  
  # Taints
  dynamic "taint" {
    for_each = each.value.taints
    
    content {
      key    = taint.value.key
      value  = taint.value.value
      effect = taint.value.effect
    }
  }
  
  depends_on = [
    aws_iam_role_policy_attachment.node_group_AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.node_group_AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.node_group_AmazonEC2ContainerRegistryReadOnly,
  ]
  
  tags = merge(var.tags, each.value.tags, {
    Name = "${local.cluster_name}-${each.key}"
    Type = "eks-node-group"
  })
}

# EKS Add-ons
resource "aws_eks_addon" "main" {
  for_each = var.cluster_addons
  
  cluster_name             = aws_eks_cluster.main.name
  addon_name               = each.key
  addon_version            = each.value.addon_version
  resolve_conflicts        = each.value.resolve_conflicts
  service_account_role_arn = each.value.service_account_role_arn
  
  depends_on = [aws_eks_node_group.main]
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-${each.key}-addon"
    Type = "eks-addon"
  })
}

# OIDC Identity Provider
data "tls_certificate" "cluster" {
  url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "cluster" {
  count = var.enable_irsa ? 1 : 0
  
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.cluster.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.main.identity[0].oidc[0].issuer
  
  tags = merge(var.tags, {
    Name = "${local.cluster_name}-oidc-provider"
    Type = "oidc-provider"
  })
}

Security and Compliance

Terraform Security Scanning

# .github/workflows/terraform-security.yml
name: Terraform Security Scan

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  security-scan:
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v3
      with:
        terraform_version: 1.5.0
        
    - name: Terraform Format Check
      run: terraform fmt -check -recursive
      
    - name: Terraform Init
      run: terraform init -backend=false
      
    - name: Terraform Validate
      run: terraform validate
      
    - name: Run Checkov
      uses: bridgecrewio/checkov-action@master
      with:
        directory: .
        framework: terraform
        output_format: sarif
        output_file_path: checkov-results.sarif
        
    - name: Upload Checkov results to GitHub Security
      uses: github/codeql-action/upload-sarif@v2
      if: always()
      with:
        sarif_file: checkov-results.sarif
        
    - name: Run TFSec
      uses: aquasecurity/tfsec-action@v1.0.3
      with:
        soft_fail: true
        
    - name: Run Terrascan
      uses: tenable/terrascan-action@main
      with:
        iac_type: terraform
        iac_version: v14
        policy_type: aws
        only_warn: true
        
    - name: Run Infracost
      uses: infracost/actions/setup@v2
      with:
        api-key: ${{ secrets.INFRACOST_API_KEY }}
        
    - name: Generate Infracost diff
      run: |
        infracost breakdown --path . \
          --format json \
          --out-file infracost-base.json
          
    - name: Post Infracost comment
      uses: infracost/actions/comment@v1
      with:
        path: infracost-base.json
        behavior: update

Policy as Code with Sentinel

# policies/sentinel/aws-security-policies.sentinel
import "tfplan/v2" as tfplan
import "strings"
import "types"

# Helper functions
get_resources = func(resource_type) {
    resources = {}
    for tfplan.resource_changes as address, rc {
        if rc.type is resource_type and
           rc.mode is "managed" and
           (rc.change.actions contains "create" or rc.change.actions contains "update") {
            resources[address] = rc
        }
    }
    return resources
}

# Policy: Ensure S3 buckets are encrypted
s3_buckets_encrypted = rule {
    all get_resources("aws_s3_bucket") as address, rc {
        rc.change.after.server_side_encryption_configuration is not null and
        length(rc.change.after.server_side_encryption_configuration) > 0
    }
}

# Policy: Ensure RDS instances are encrypted
rds_instances_encrypted = rule {
    all get_resources("aws_db_instance") as address, rc {
        rc.change.after.storage_encrypted is true
    }
}

# Policy: Ensure EBS volumes are encrypted
ebs_volumes_encrypted = rule {
    all get_resources("aws_ebs_volume") as address, rc {
        rc.change.after.encrypted is true
    }
}

# Policy: Ensure security groups don't allow 0.0.0.0/0 on port 22
no_ssh_from_anywhere = rule {
    all get_resources("aws_security_group") as address, rc {
        all rc.change.after.ingress as ingress {
            not (ingress.from_port <= 22 and ingress.to_port >= 22 and
                 ingress.protocol is "tcp" and
                 "0.0.0.0/0" in ingress.cidr_blocks)
        }
    }
}

# Policy: Ensure security groups don't allow 0.0.0.0/0 on port 3389
no_rdp_from_anywhere = rule {
    all get_resources("aws_security_group") as address, rc {
        all rc.change.after.ingress as ingress {
            not (ingress.from_port <= 3389 and ingress.to_port >= 3389 and
                 ingress.protocol is "tcp" and
                 "0.0.0.0/0" in ingress.cidr_blocks)
        }
    }
}

# Policy: Ensure IAM policies don't grant admin access
no_admin_policies = rule {
    all get_resources("aws_iam_policy") as address, rc {
        policy_doc = json.unmarshal(rc.change.after.policy)
        all policy_doc.Statement as statement {
            not (statement.Effect is "Allow" and
                 statement.Action contains "*" and
                 statement.Resource contains "*")
        }
    }
}

# Policy: Ensure resources have required tags
required_tags = ["Environment", "Project", "Owner", "CostCenter"]

resources_have_required_tags = rule {
    all get_resources("aws_instance") as address, rc {
        all required_tags as tag {
            rc.change.after.tags contains tag
        }
    } and
    all get_resources("aws_s3_bucket") as address, rc {
        all required_tags as tag {
            rc.change.after.tags contains tag
        }
    } and
    all get_resources("aws_rds_instance") as address, rc {
        all required_tags as tag {
            rc.change.after.tags contains tag
        }
    }
}

# Policy: Ensure EKS clusters have logging enabled
eks_logging_enabled = rule {
    all get_resources("aws_eks_cluster") as address, rc {
        rc.change.after.enabled_cluster_log_types is not null and
        length(rc.change.after.enabled_cluster_log_types) > 0
    }
}

# Policy: Ensure VPC flow logs are enabled
vpc_flow_logs_enabled = rule {
    vpcs = get_resources("aws_vpc")
    flow_logs = get_resources("aws_flow_log")
    
    all vpcs as vpc_address, vpc_rc {
        any flow_logs as fl_address, fl_rc {
            fl_rc.change.after.vpc_id is vpc_rc.change.after.id
        }
    }
}

# Main policy
main = rule {
    s3_buckets_encrypted and
    rds_instances_encrypted and
    ebs_volumes_encrypted and
    no_ssh_from_anywhere and
    no_rdp_from_anywhere and
    no_admin_policies and
    resources_have_required_tags and
    eks_logging_enabled and
    vpc_flow_logs_enabled
}

CI/CD Integration

GitLab CI Pipeline for Terraform

# .gitlab-ci.yml
stages:
  - validate
  - plan
  - security-scan
  - apply
  - destroy

variables:
  TF_ROOT: ${CI_PROJECT_DIR}
  TF_ADDRESS: ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${CI_ENVIRONMENT_NAME}
  TF_IN_AUTOMATION: "true"
  TF_INPUT: "false"
  TF_CLI_ARGS: "-no-color"

cache:
  key: "${CI_COMMIT_REF_SLUG}"
  paths:
    - ${TF_ROOT}/.terraform

before_script:
  - cd ${TF_ROOT}
  - terraform --version
  - terraform init -backend-config="address=${TF_ADDRESS}" -backend-config="lock_address=${TF_ADDRESS}/lock" -backend-config="unlock_address=${TF_ADDRESS}/lock" -backend-config="username=${GITLAB_USER_LOGIN}" -backend-config="password=${CI_JOB_TOKEN}" -backend-config="lock_method=POST" -backend-config="unlock_method=DELETE" -backend-config="retry_wait_min=5"

validate:
  stage: validate
  script:
    - terraform fmt -check -recursive
    - terraform validate
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

plan:
  stage: plan
  script:
    - terraform plan -var-file="environments/${CI_ENVIRONMENT_NAME}.tfvars" -out="planfile"
    - terraform show -json planfile > plan.json
  artifacts:
    name: plan
    paths:
      - ${TF_ROOT}/planfile
      - ${TF_ROOT}/plan.json
    expire_in: 1 week
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

security-scan:
  stage: security-scan
  image: bridgecrew/checkov:latest
  script:
    - checkov -f plan.json --framework terraform_plan --output cli --output junitxml --output-file-path console,checkov-report.xml
  artifacts:
    reports:
      junit: checkov-report.xml
    paths:
      - checkov-report.xml
    expire_in: 1 week
  dependencies:
    - plan
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

cost-estimation:
  stage: security-scan
  image: infracost/infracost:ci-0.10
  script:
    - infracost breakdown --path plan.json --format json --out-file infracost.json
    - infracost output --path infracost.json --format table
    - infracost output --path infracost.json --format html --out-file infracost-report.html
  artifacts:
    paths:
      - infracost.json
      - infracost-report.html
    expire_in: 1 week
  dependencies:
    - plan
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

apply:
  stage: apply
  script:
    - terraform apply -auto-approve planfile
  dependencies:
    - plan
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      when: manual
  environment:
    name: ${CI_ENVIRONMENT_NAME}
    action: start

destroy:
  stage: destroy
  script:
    - terraform destroy -var-file="environments/${CI_ENVIRONMENT_NAME}.tfvars" -auto-approve
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      when: manual
  environment:
    name: ${CI_ENVIRONMENT_NAME}
    action: stop

Automated Deployment Script

#!/bin/bash
# scripts/deploy.sh

set -e

# Configuration
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
ENVIRONMENTS_DIR="$PROJECT_ROOT/environments"
MODULES_DIR="$PROJECT_ROOT/modules"

# Default values
ENVIRONMENT=""
ACTION="plan"
AUTO_APPROVE=false
DESTROY=false
WORKSPACE=""
VAR_FILE=""
BACKEND_CONFIG=""
PARALLELISM=10
REFRESH=true

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Logging functions
log_info() {
    echo -e "${BLUE}[INFO]${NC} $1"
}

log_success() {
    echo -e "${GREEN}[SUCCESS]${NC} $1"
}

log_warning() {
    echo -e "${YELLOW}[WARNING]${NC} $1"
}

log_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

# Help function
show_help() {
    cat << EOF
Terraform Deployment Script

Usage: $0 [OPTIONS]

OPTIONS:
    -e, --environment ENVIRONMENT    Target environment (dev, staging, production)
    -a, --action ACTION             Action to perform (plan, apply, destroy)
    -w, --workspace WORKSPACE       Terraform workspace to use
    -f, --var-file FILE            Variables file to use
    -b, --backend-config FILE      Backend configuration file
    -p, --parallelism NUMBER       Number of parallel operations (default: 10)
    --auto-approve                 Auto approve apply/destroy operations
    --no-refresh                   Skip refresh during plan/apply
    -h, --help                     Show this help message

EXAMPLES:
    $0 -e dev -a plan
    $0 -e production -a apply --auto-approve
    $0 -e staging -a destroy --auto-approve
    $0 -e dev -w feature-branch -a plan

EOF
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -e|--environment)
            ENVIRONMENT="$2"
            shift 2
            ;;
        -a|--action)
            ACTION="$2"
            shift 2
            ;;
        -w|--workspace)
            WORKSPACE="$2"
            shift 2
            ;;
        -f|--var-file)
            VAR_FILE="$2"
            shift 2
            ;;
        -b|--backend-config)
            BACKEND_CONFIG="$2"
            shift 2
            ;;
        -p|--parallelism)
            PARALLELISM="$2"
            shift 2
            ;;
        --auto-approve)
            AUTO_APPROVE=true
            shift
            ;;
        --no-refresh)
            REFRESH=false
            shift
            ;;
        -h|--help)
            show_help
            exit 0
            ;;
        *)
            log_error "Unknown option: $1"
            show_help
            exit 1
            ;;
    esac
done

# Validate required parameters
if [[ -z "$ENVIRONMENT" ]]; then
    log_error "Environment is required. Use -e or --environment option."
    exit 1
fi

# Set default var file if not specified
if [[ -z "$VAR_FILE" ]]; then
    VAR_FILE="$ENVIRONMENTS_DIR/$ENVIRONMENT.tfvars"
fi

# Validate environment directory exists
ENV_DIR="$ENVIRONMENTS_DIR/$ENVIRONMENT"
if [[ ! -d "$ENV_DIR" ]]; then
    log_error "Environment directory not found: $ENV_DIR"
    exit 1
fi

# Validate var file exists
if [[ ! -f "$VAR_FILE" ]]; then
    log_error "Variables file not found: $VAR_FILE"
    exit 1
fi

# Change to environment directory
cd "$ENV_DIR"

log_info "Starting Terraform deployment for environment: $ENVIRONMENT"
log_info "Action: $ACTION"
log_info "Variables file: $VAR_FILE"

# Initialize Terraform
log_info "Initializing Terraform..."
INIT_ARGS=()
if [[ -n "$BACKEND_CONFIG" ]]; then
    INIT_ARGS+=("-backend-config=$BACKEND_CONFIG")
fi

if ! terraform init "${INIT_ARGS[@]}"; then
    log_error "Terraform initialization failed"
    exit 1
fi

# Select or create workspace
if [[ -n "$WORKSPACE" ]]; then
    log_info "Selecting workspace: $WORKSPACE"
    terraform workspace select "$WORKSPACE" || terraform workspace new "$WORKSPACE"
fi

# Perform the requested action
case $ACTION in
    plan)
        log_info "Running Terraform plan..."
        PLAN_ARGS=(
            "-var-file=$VAR_FILE"
            "-parallelism=$PARALLELISM"
            "-out=tfplan"
        )
        
        if [[ "$REFRESH" == "false" ]]; then
            PLAN_ARGS+=("-refresh=false")
        fi
        
        if terraform plan "${PLAN_ARGS[@]}"; then
            log_success "Terraform plan completed successfully"
            
            # Show plan summary
            log_info "Plan summary:"
            terraform show -json tfplan | jq -r '
                .resource_changes[] |
                select(.change.actions[] | . != "no-op") |
                "\(.change.actions | join(",")) \(.address)"
            ' | sort
        else
            log_error "Terraform plan failed"
            exit 1
        fi
        ;;
        
    apply)
        log_info "Running Terraform apply..."
        APPLY_ARGS=(
            "-var-file=$VAR_FILE"
            "-parallelism=$PARALLELISM"
        )
        
        if [[ "$REFRESH" == "false" ]]; then
            APPLY_ARGS+=("-refresh=false")
        fi
        
        if [[ "$AUTO_APPROVE" == "true" ]]; then
            APPLY_ARGS+=("-auto-approve")
        fi
        
        # Check if plan file exists
        if [[ -f "tfplan" ]]; then
            log_info "Using existing plan file"
            APPLY_ARGS=("tfplan")
        fi
        
        if terraform apply "${APPLY_ARGS[@]}"; then
            log_success "Terraform apply completed successfully"
            
            # Show outputs
            log_info "Terraform outputs:"
            terraform output
        else
            log_error "Terraform apply failed"
            exit 1
        fi
        ;;
        
    destroy)
        log_warning "This will destroy all resources in environment: $ENVIRONMENT"
        
        if [[ "$AUTO_APPROVE" != "true" ]]; then
            read -p "Are you sure you want to continue? (yes/no): " -r
            if [[ ! $REPLY =~ ^[Yy][Ee][Ss]$ ]]; then
                log_info "Destroy operation cancelled"
                exit 0
            fi
        fi
        
        log_info "Running Terraform destroy..."
        DESTROY_ARGS=(
            "-var-file=$VAR_FILE"
            "-parallelism=$PARALLELISM"
        )
        
        if [[ "$AUTO_APPROVE" == "true" ]]; then
            DESTROY_ARGS+=("-auto-approve")
        fi
        
        if terraform destroy "${DESTROY_ARGS[@]}"; then
            log_success "Terraform destroy completed successfully"
        else
            log_error "Terraform destroy failed"
            exit 1
        fi
        ;;
        
    *)
        log_error "Unknown action: $ACTION"
        log_info "Supported actions: plan, apply, destroy"
        exit 1
        ;;
esac

log_success "Deployment script completed successfully"

Best Practices and Operational Excellence

State Management Best Practices

  1. Remote State Storage: Always use remote state with encryption and versioning
  2. State Locking: Implement state locking to prevent concurrent modifications
  3. State Backup: Regular automated backups of state files
  4. Workspace Strategy: Use workspaces for environment isolation
  5. State File Security: Restrict access to state files containing sensitive data

Module Development Guidelines

  1. Single Responsibility: Each module should have a clear, single purpose
  2. Versioning: Use semantic versioning for module releases
  3. Documentation: Comprehensive README with examples and variable descriptions
  4. Testing: Implement automated testing with tools like Terratest
  5. Validation: Input validation and output consistency

Security Hardening

  1. Least Privilege: Apply principle of least privilege to IAM roles and policies
  2. Encryption: Enable encryption at rest and in transit for all resources
  3. Network Security: Implement proper network segmentation and security groups
  4. Secrets Management: Use dedicated secret management services
  5. Compliance: Regular compliance scanning and policy enforcement

Performance Optimization

  1. Parallelism: Optimize Terraform parallelism settings
  2. Resource Dependencies: Minimize unnecessary dependencies
  3. Provider Caching: Use provider plugin caching
  4. State Refresh: Optimize state refresh operations
  5. Resource Targeting: Use targeted operations when appropriate

Conclusion

Terraform provides a powerful foundation for Infrastructure as Code, enabling teams to manage complex cloud environments with consistency, reliability, and security. This comprehensive guide covers enterprise-scale implementations, from basic project structure to advanced security patterns and CI/CD integration.

Key takeaways for successful Terraform adoption:

  1. Start with solid foundations: Proper project structure, state management, and security practices
  2. Embrace modularity: Develop reusable, well-tested modules for common patterns
  3. Implement governance: Use policy as code and automated security scanning
  4. Automate everything: CI/CD pipelines, testing, and deployment processes
  5. Monitor and optimize: Continuous improvement of performance and cost efficiency

By following these practices and patterns, organizations can build robust, scalable infrastructure automation that supports their cloud-native journey and operational excellence goals.

Share Article