Your Cloud Security Is Falling Apart Right Now

| 7 min read |
security cloud aws infrastructure

Everyone's scrambling to scale cloud infrastructure overnight. I've seen what happens when security gets deprioritized under pressure — at NATO exercises, at Decloud, at the fintech startup. Here's how to not become a headline.

Quick take

When you scale fast, security doesn’t degrade gradually. It collapses. Lock down IAM with permissions boundaries, make everything private by default in Terraform modules, turn on CloudTrail/Config/GuardDuty before you add a single new service, and treat 0.0.0.0/0 in a security group like a production incident.


I’m watching companies triple their AWS footprint in weeks right now. Entire engineering teams went remote overnight. New services are shipping without review. IAM policies are getting * slapped on them because someone needed to unblock a deploy at 11pm.

I’ve seen this pattern before. Not in startups — in NATO cyber defense exercises. The scenario is always the same: operational tempo increases, shortcuts happen, and adversaries walk through the gaps those shortcuts create. The difference is that in a NATO exercise, you get a debrief. In production, you get a breach notification.

At Decloud, we built cloud infrastructure for companies that were already scaling hard. The ones who survived without security incidents weren’t the ones with the biggest security teams. They were the ones who made the secure path the easy path. That’s the entire thesis of this post.

The Real Problem Isn’t Speed

Everyone wants to blame the pace. “We’re moving too fast to do security properly.” Nonsense.

The problem is that most cloud environments are configured to be permissive by default. When things are calm, humans manually tighten things up. Reviews happen. Someone catches the overly broad IAM role before it ships. But when you’re scaling fast, those manual catches disappear. And now you’re running on whatever the defaults are.

If your defaults are permissive, you’re exposed. Period.

I dealt with this at the fintech startup when we were scaling our data infrastructure. The answer wasn’t more process. It was better defaults.

IAM: Where Most Breaches Actually Start

I’ll say it bluntly: if you’re not using IAM permissions boundaries right now, you’re gambling. Permissions boundaries are the ceiling on what any role can do, regardless of what policies are attached to it. They’re the single most underused security control in AWS.

Here’s a permissions boundary I enforce on every account that creates IAM roles:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowSpecificServices",
      "Effect": "Allow",
      "Action": [
        "s3:*",
        "dynamodb:*",
        "sqs:*",
        "sns:*",
        "logs:*",
        "cloudwatch:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyIAMEscalation",
      "Effect": "Deny",
      "Action": [
        "iam:CreateUser",
        "iam:CreateRole",
        "iam:AttachRolePolicy",
        "iam:PutRolePolicy",
        "iam:CreateAccessKey"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyOrganizationChanges",
      "Effect": "Deny",
      "Action": "organizations:*",
      "Resource": "*"
    }
  ]
}

The key insight: even if a developer attaches AdministratorAccess to their role (and they will, at 11pm, to unblock a deploy), the permissions boundary prevents privilege escalation. They can’t create new roles, attach policies, or generate access keys. The blast radius is capped.

At Decloud we had a customer who gave every Lambda execution role full S3 access because “it was easier.” One compromised function later, an attacker was exfiltrating data from every bucket in the account. Permissions boundaries would have contained that to the specific service’s buckets.

Security Groups: The 0.0.0.0/0 Problem

I run a script on every new project that searches for 0.0.0.0/0 in security group ingress rules. It always finds something. Always.

Here’s the Terraform pattern I enforce:

resource "aws_security_group" "app" {
  name_prefix = "app-"
  vpc_id      = var.vpc_id

  # No inline ingress rules. Force explicit definition.
  # This is intentional — default deny.

  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS outbound only"
  }

  lifecycle {
    create_before_destroy = true
  }

  tags = {
    ManagedBy = "terraform"
    Review    = "required"
  }
}

# Separate rule resources — auditable, reviewable, deletable
resource "aws_security_group_rule" "app_from_alb" {
  type                     = "ingress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.alb.id
  security_group_id        = aws_security_group.app.id
  description              = "ALB to app on 8080"
}

Notice: no ingress defined inline. Every ingress rule is a separate aws_security_group_rule resource with a description field. When someone adds a rule, it shows up clearly in the Terraform plan. When you need to audit, you can grep your codebase for aws_security_group_rule and get a complete inventory. Try doing that with inline rules.

This pattern came directly from a NATO exercise where we had to rapidly audit network access during a simulated attack. The team that had structured, auditable firewall rules found the compromised path in minutes. The team with ad-hoc rules was still mapping their network topology when the exercise ended.

Make Encryption Non-Negotiable

Every data store module should enforce encryption. Not suggest it. Enforce it.

resource "aws_db_instance" "main" {
  publicly_accessible    = false
  storage_encrypted      = true
  deletion_protection    = true
  kms_key_id             = var.kms_key_arn
  iam_database_authentication_enabled = true

  # No default security group — force explicit assignment
  vpc_security_group_ids = [var.db_security_group_id]
}

resource "aws_s3_bucket" "data" {
  bucket_prefix = "app-data-"
}

resource "aws_s3_bucket_public_access_block" "data" {
  bucket = aws_s3_bucket.data.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
  bucket = aws_s3_bucket.data.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = var.kms_key_arn
    }
    bucket_key_enabled = true
  }
}

The S3 public access block is the one I see missing most often. People enable encryption but forget to block public access at the bucket level. Then someone uploads an object with a public ACL and you’re in the news.

Detection: Turn It On Before You Scale

This is the mistake I see repeatedly. Teams add services, then plan to “add monitoring later.” Later never comes.

Before you provision a single new service, these should be running across every account in your organization:

# AWS Config rules — deploy via StackSets across all accounts
config_rules:
  - s3-bucket-public-read-prohibited
  - s3-bucket-public-write-prohibited
  - encrypted-volumes
  - cloudtrail-enabled
  - root-account-mfa-enabled
  - iam-user-no-policies-check
  - iam-root-access-key-check
  - restricted-ssh
  - vpc-flow-logs-enabled

CloudTrail, Config, and GuardDuty. All three. Non-negotiable. GuardDuty in particular has gotten good enough that it catches IAM anomalies — unusual API calls, credential usage from unexpected IPs — with minimal tuning.

The trick is filtering. GuardDuty will generate findings. Most of them are low severity. Set up an SNS topic that only fires on HIGH and CRITICAL findings. Pipe those to a Slack channel or PagerDuty. Everything else goes to a weekly review.

At Decloud, we set this up as a baseline for every customer. The ones who actually reviewed the weekly findings caught configuration drift before it became exploitable. The ones who ignored it… well, some of them came to us specifically because they’d ignored it.

Remote Access: Kill the VPN, Use Session Manager

Everyone’s spinning up VPNs right now. I get it. But VPNs are a liability. They’re a single point of trust. Once you’re on the VPN, you’re “inside.” That model is broken.

For AWS access, use SSM Session Manager instead. It gives you shell access to EC2 instances through the AWS API. No SSH keys to manage. No inbound security group rules. Full session logging to CloudTrail and S3.

For application access, put services behind an identity-aware proxy. Verify identity and device state on every request. The network boundary isn’t your security boundary.

I learned this the hard way during a NATO exercise. The red team’s first move was always to compromise VPN credentials. Once inside, they had lateral movement across the entire network. The teams that had segmented access — per-service authentication, no implicit trust from network position — held up. The teams running flat networks behind a VPN got owned.

The Actual Checklist

I keep this short because nobody follows long checklists under pressure.

  1. IAM permissions boundaries on every account that creates roles. Deploy today.
  2. S3 public access blocks on every bucket. aws s3api put-public-access-block across all accounts. Takes five minutes.
  3. Security groups: audit for 0.0.0.0/0 ingress. Remove or justify every one.
  4. CloudTrail + Config + GuardDuty: enabled in every account, every region. Use AWS Organizations and StackSets.
  5. Encryption by default: KMS keys provisioned, Terraform modules enforce storage_encrypted = true and publicly_accessible = false.
  6. SSM Session Manager instead of SSH. Kill inbound port 22.
  7. Weekly access review: who has access to what, and why. Remove what’s not justified.

That’s it. Seven items. You can get through most of them in a day if you’re motivated.

Stop Treating Security as a Phase

The framing I keep hearing is “we’ll fix security after we stabilize.” That’s backwards. Security isn’t a phase you get to after scaling. It’s a property of how you scale.

If your Terraform modules are secure by default, every new service you deploy is secure. If your IAM boundaries are in place, every new role is constrained. If your detection is running, every new account is monitored.

The work is in the defaults. Get those right and scaling fast is just scaling fast — not scaling into a breach.