Hardening Security Groups for PostGIS Ports: Incident Triage & State Recovery

When application servers start returning Connection refused or Connection timed out against TCP 5432, vector tile backends stall mid-render, and spatial ETL jobs fail on ingest, the cause is almost always a security group around the PostGIS data plane that is either too open (a stray 0.0.0.0/0 ingress that fails a compliance scan) or too tight (a CIDR block that silently drops VPC-peered or transit-gateway traffic). This guide is the port-level deep dive within the Security Group Hardening workflow, itself part of the broader Network Security & Access Control framework, and walks through triaging the failure, recovering desynchronized state, applying a least-privilege baseline in Terraform or Pulumi, and encoding the fix so it cannot regress.

Symptom identification and triage

Before changing a single rule, isolate the failure domain. A connection problem that looks like a firewall issue is frequently an authentication or routing fault, and a blind rule edit can widen the attack surface while fixing nothing.

Concrete signals to read, in order:

Client-side errors. psql: error: connection to server ... failed: Connection timed out points to traffic being dropped before it reaches the instance (a security group or route-table problem). Connection refused means packets arrive but nothing is listening — Postgres is down or bound to the wrong interface, not a security group fault.
VPC Flow Logs at the database ENI. A REJECT entry for destination port 5432 confirms the perimeter is dropping the packet. If Flow Logs show ACCEPT but the client still hangs, the network boundary is fine and the issue is downstream (TLS handshake, pg_hba.conf, or the IAM Role Mapping for Geospatial Workloads configuration when IAM database authentication is in use).
CloudTrail. A recent AuthorizeSecurityGroupIngress or RevokeSecurityGroupIngress event with a console userIdentity is the fingerprint of an out-of-band change that desynchronized your declarative pipeline.

The decisive split is whether packets reach the ENI. If Flow Logs say REJECT, this guide applies. If they say ACCEPT, stop and look at host-based authentication and identity instead — re-hardening the security group will not help and risks reopening it.

Prerequisites and environment assumptions

This procedure assumes a codified data plane, not console-managed rules. You will need:

Terraform >= 1.5 with the AWS provider ~> 5.0, or Pulumi with @pulumi/aws 6.x — every snippet below pins its provider so plan/preview output stays deterministic across the promotion cycle.
A remote state backend with mandatory locking so a recovery apply cannot race a CI run and leave the group half-authorized. The locking mechanics are covered in Managing Terraform State Locks for Spatial Data; the backend choice itself in State Backend Selection.
IAM permissions to read and reconcile the boundary: ec2:DescribeSecurityGroups, ec2:DescribeSecurityGroupRules, ec2:AuthorizeSecurityGroupIngress, ec2:RevokeSecurityGroupIngress, plus logs:FilterLogEvents for the Flow Log group and cloudtrail:LookupEvents for the audit trail.
Knowledge of the legitimate callers of port 5432 — the tile-render compute group, the ETL subnet, and any read-replica consumers — expressed as security-group IDs or subnet CIDRs, never as broad ranges.

Pin the provider once at the top of the module:

terraform {
  required_version = ">= 1.5"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

Step-by-step remediation

Manual console edits that bypass the pipeline desynchronize Terraform state or the Pulumi stack from the live environment. Running a plain apply against drifted state risks destructive revocations that sever active ETL connections. Recover non-destructively first, then enforce the hardened baseline.

Pause automation. Disable the CI/CD pipeline for this stack to prevent a scheduled run from racing your reconciliation.
Capture live state read-only. Run terraform apply -refresh-only or pulumi refresh to pull the live rules into state without mutating infrastructure.
Resolve orphans. If a console edit created rules outside the module boundary, detach them with terraform state rm and re-attach via terraform import. For Pulumi, pulumi stack export, strip the unmanaged rule IDs from the JSON, then pulumi stack import.
Validate the delta. Run terraform plan -refresh-only or pulumi preview to surface the exact configuration deltas before any change is enforced.
Apply the least-privilege baseline. Only once state reflects reality, apply the hardened rules below.

The hardened baseline replaces broad CIDRs with references to the calling compute group and the ETL subnet, and decouples the variable rule from the parent group so an incremental change cannot force a full group recreation — which would momentarily detach every dependent ENI and drop tenant isolation.

Terraform implementation

resource "aws_security_group" "postgis_sg" {
  name        = "postgis-hardened-sg"
  description = "Least-privilege perimeter for PostGIS endpoints"
  vpc_id      = var.vpc_id

  # Primary ingress restricted to tile-render compute, by SG reference not CIDR
  ingress {
    description     = "PostGIS from tile servers"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [var.tile_server_sg_id]
  }

  # Egress scoped to the VPC so COG range reads use a gateway endpoint, not NAT
  egress {
    description = "Outbound HTTPS for patching/metrics"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/8"]
  }

  lifecycle {
    prevent_destroy = true
  }

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
    CostCenter  = "gis-platform"
  }
}

# Decoupled rule for independent lifecycle management
resource "aws_security_group_rule" "etl_ingress" {
  type              = "ingress"
  from_port         = 5432
  to_port           = 5432
  protocol          = "tcp"
  cidr_blocks       = [var.etl_subnet_cidr]
  security_group_id = aws_security_group.postgis_sg.id
  description       = "ETL pipeline subnet access"
}

Pulumi implementation (TypeScript)

import * as aws from "@pulumi/aws"; // pin @pulumi/aws 6.x in package.json
import * as pulumi from "@pulumi/pulumi";

const config = new pulumi.Config();

const postgisSG = new aws.ec2.SecurityGroup("postgis-hardened-sg", {
  vpcId: config.require("vpcId"),
  description: "Least-privilege perimeter for PostGIS endpoints",
  ingress: [{
    protocol: "tcp",
    fromPort: 5432,
    toPort: 5432,
    securityGroups: [config.require("tileServerSGId")],
    description: "PostGIS from tile servers",
  }],
  egress: [{
    protocol: "tcp",
    fromPort: 443,
    toPort: 443,
    cidrBlocks: ["10.0.0.0/8"],
    description: "Outbound HTTPS for patching/metrics",
  }],
  tags: { Environment: "production", ManagedBy: "pulumi" },
}, { protect: true }); // protect == prevent_destroy

// Independent rule lifecycle prevents full SG recreation
const etlIngressRule = new aws.ec2.SecurityGroupRule("etl-ingress", {
  type: "ingress",
  fromPort: 5432,
  toPort: 5432,
  protocol: "tcp",
  cidrBlocks: [config.require("etlSubnetCidr")],
  securityGroupId: postgisSG.id,
  description: "ETL pipeline subnet access",
});

Verification

Confirm the fix is live from three angles — the rule set, the network path, and an actual spatial query:

Inspect the applied rules. aws ec2 describe-security-group-rules --filters Name=group-id,Values=<sg-id> should show only the 5432 ingress entries you defined and no 0.0.0.0/0 on the database port.
Probe the port from a legitimate caller. From an instance inside tile_server_sg, nc -vz <db-host> 5432 should connect; from outside the allowed groups it should time out.
Confirm a real connection and spatial query. psql "host=<db-host> dbname=gis ..." -c "SELECT PostGIS_full_version();" proves the perimeter, TLS, and authentication all align.
Re-read Flow Logs. The previous REJECT entries for port 5432 from authorized sources should now read ACCEPT, and the public ranges that triggered the audit finding should produce no successful flows.

Preventing recurrence

A one-off fix that is not encoded will drift back the next time someone hotfixes the console. Make the hardened state structurally enforced:

Policy-as-code gate in the pull request. Run checkov or a Conftest/OPA rule that fails the build on any ingress to 5432 from 0.0.0.0/0, mirroring the gate strategy used in Pulumi IAM Policies for S3 Raster Access. An over-permissive rule should be un-mergeable, not merely discouraged.
Scheduled drift detection. A nightly terraform plan -detailed-exitcode (or pulumi preview --expect-no-changes) against production state surfaces console-applied rules as a non-zero exit before they become a standing exposure.
Collapse per-tenant ranges into a managed prefix list so a multi-tenant platform never hits RulesPerSecurityGroupLimitExceeded and never re-introduces broad CIDRs to work around the cap.
Align the network boundary with host-based authentication. Security groups decide what can reach 5432; pg_hba.conf decides who may authenticate. Map the same IP ranges to scram-sha-256 or certificate methods per the PostgreSQL Client Authentication documentation so the two layers enforce one boundary.
Keep browser clients off the database entirely. Frontends must route through an API tier governed by CORS & CSP Configuration; the database security group should never need a public ingress rule.

Security Group Hardening — the parent workflow for codified network boundaries
Pulumi IAM Policies for S3 Raster Access — the identity layer that pairs with these port rules
Terraform VPC Peering for Distributed GeoServer — the routing path that determines which CIDRs are legitimate
How to Structure Terraform Modules for PostGIS — provisioning the database these rules protect