Building an AWS Security Group Auditor That Actually Works in Production

“Manually checking security groups across multiple AWS accounts is how breaches happen. You miss things. You always miss things.”

Manual security group reviews feel fine at one account. At ten, you’re genuinely flying blind — too many rules, too much context to hold in your head, no audit trail. I built this CLI tool to close that gap. This is the breakdown: what I built, the failure modes I hit, and what the production version actually looks like.

The Problem#

Security groups are the first line of network defense for anything running on AWS. Overly permissive ingress rules — especially 0.0.0.0/0 on ports like 22 and 3389 — are consistently in the top causes of cloud breaches. At scale you cannot eyeball this. You need a tool.

Requirements I locked in before writing a single line:

Scan all security groups across multiple AWS regions in one run
Identify permissive rules against a configurable risk policy
Generate reports in CSV, JSON, and HTML — different formats for different audiences
Output per-rule remediation guidance, not just a list of flags

Architecture#

Three layers: collection (AWS API interaction), analysis (risk policy), output (report formatting).

def audit_security_groups():
    if not aws_credentials_valid():
        raise ValueError("AWS credentials not configured or expired")

    Path("reports").mkdir(exist_ok=True)
    
    vulnerable = find_insecure_groups()
    save_as_csv(vulnerable)

Credential validation runs first, before any API calls. boto3 will surface auth failures mid-execution otherwise — after you’ve already burned through dozens of calls — and that’s a shit debugging experience.

Failure Modes Worth Knowing#

1. The typo that cost me 20 minutes#

I saved my requirements file as requiremnets.txt. Transposed letters, invisible at a glance. pip kept failing and I spent 20 minutes convinced it was an environment issue before I actually looked at the filename.

Not a deep lesson. Just: your linter and spell-check should cover config files too, not just code. Boring mistake, easy fix, annoying as hell.

2. pytest fixtures and why you can’t skip conftest.py#

def test_fetch_security_groups(ec2_client):
    # E: fixture 'ec2_client' not found

For security tooling specifically, fixture setup is more important than most people treat it. You need mocked AWS responses that accurately represent what the real API returns — including pagination tokens and error shapes. The fix was proper fixtures in conftest.py with moto:

# conftest.py
@pytest.fixture
def ec2_client():
    with mock_aws():
        yield boto3.client("ec2", region_name="us-east-1")

But the real issue isn’t the missing fixture — it’s that tests for security tools need to explicitly cover false negatives. A tool that misses a vulnerable rule is worse than no tool at all. You’re trusting it to tell you you’re clean. If it lies, you’re exposed and don’t know it.

3. Silent pagination failure — the one that actually matters#

The initial scan was silently dropping security groups. AWS returns paginated results with a NextToken. If you don’t handle it you get page one, assume it’s everything, and ship a report missing half your attack surface.

def get_all_security_groups(ec2_client):
    groups = []
    paginator = ec2_client.get_paginator("describe_security_groups")
    for page in paginator.paginate():
        groups.extend(page["SecurityGroups"])
    return groups

Use the built-in paginator. Managing NextToken manually is error-prone and there’s no reason to do it.

4. Never assume the filesystem#

# Crashes if 'reports/' doesn't exist
save_report("reports/output.csv")

# Correct
Path("reports").mkdir(parents=True, exist_ok=True)
save_report("reports/output.csv")

Standard infrastructure engineering — never assume the environment is in the state you expect. Idempotent setup before every write.

Key Implementation Details#

Validate credentials before you do anything:

def aws_credentials_valid() -> bool:
    try:
        boto3.client("sts").get_caller_identity()
        return True
    except Exception:
        return False

Rate limiting — AWS will throttle you on large accounts:

from botocore.config import Config

config = Config(
    retries={"max_attempts": 10, "mode": "adaptive"}
)

Exponential backoff is not optional. Add it upfront or debug a half-finished scan at 2am.

Multi-region — don’t hardcode regions:

def get_enabled_regions():
    ec2 = boto3.client("ec2", region_name="us-east-1")
    return [r["RegionName"] for r in ec2.describe_regions()["Regions"]]

Results#

Validated across multiple AWS accounts and organizations:

Metric	Result
Security Groups Audited	500+
AWS Accounts	12 (across 3 organizations)
Overly Permissive Rules Flagged	47
Report Generation Time	< 90 seconds per account

The 47 flagged rules were mostly two things: legacy groups with 0.0.0.0/0 on port 22 and 3389 that nobody had tightened since initial provisioning, and application groups where someone opened a broad range to unblock themselves during development and never went back. Classic.

What Production-Ready Actually Means for Security Tooling#

A lot of security scripts out there work once, on one account, in one region. That’s not a tool — that’s a demo. Production-ready is more specific:

No silent failures — every code path that can fail must surface a clear error. A tool that crashes silently is genuinely dangerous.
Pagination handled everywhere — any AWS list or describe call that returns multiple pages uses a paginator. No exceptions, no “it worked on my account.”
Credentials checked before execution — fail fast, fail loud, before doing any work.
Idempotent setup — directories, config files, paths — created if absent, never assumed.
Tests cover false negatives — this is the one most people skip. If your tests only check that the tool runs, they’re not testing anything that matters.

Source#

Available on GitHub . Full test suite, moto fixtures, and sample report output included.

Tags#

#AWS #Python #Security #Infrastructure #CLI

About the Author#

Elijah Udom (elijahu) is an Infrastructure & Cloud Engineer based in Lagos, Nigeria. AWS, Kubernetes, eBPF security, AI/ML infrastructure. Building in the open.

Elijah Udom

← Previous: Container Orchestration with LXD and ECS | Next: eBPF Container Security →