Serverless: What Works, What Doesn't, and What Will Bite You

Quick take

Serverless is brilliant for event-driven glue, scheduled jobs, and spiky APIs. It’s terrible for long-running work, monolithic handlers, and anything that chains functions synchronously. I’ve learned this the hard way at the fintech startup.

Serverless isn’t a product. It’s an architectural style – managed services plus event-driven compute. In 2018, that mostly means AWS Lambda behind API Gateway, wired to S3, DynamoDB, SQS, and whatever else Amazon ships this week.

The pitch is compelling: auto-scaling, pay-per-use, no servers to patch. And honestly? The pitch is mostly true. At the fintech startup we use Lambda for a bunch of our data ingestion and processing pipeline – financial news events hitting webhooks, getting enriched, fanning out to multiple downstream consumers. It’s a great fit for that kind of work. But the pitch leaves out the part where you trade one set of problems for another.

Short-lived execution. Limited local state. Tight coupling to your cloud provider’s managed services. Cold starts that make your p99 latency embarrassing.

Here’s what I’ve seen actually work, and what reliably bites.

Patterns That Work

Event processing

This is where serverless genuinely shines. One event in, one small effect out, done.

# S3 trigger - process one uploaded object

def handler(event, context):
    record = event["Records"][0]
    bucket = record["s3"]["bucket"]["name"]
    key = record["s3"]["object"]["key"]

    obj = s3.get_object(Bucket=bucket, Key=key)
    process(obj["Body"].read())

File transforms, queue consumers, webhook receivers, log pipelines, IoT ingestion. All of these map cleanly to the Lambda model because the unit of work is small and self-contained. No shared state, natural parallelism, clear inputs and outputs.

At the fintech startup, our financial data pipeline fits this pattern well. News articles and market events arrive as discrete events, each gets processed independently, and the system scales up during market hours without us thinking about capacity.

Scheduled jobs

No more EC2 instances sitting idle 23 hours a day just to run a nightly report.

# CloudWatch Events schedule
functions:
  daily_report:
    handler: reports.daily
    events:
      - schedule: cron(0 9 * * ? *)

Nightly reports, data cleanup, batch exports, periodic syncs. If the job fits within Lambda’s time limit, a schedule trigger is simpler and cheaper than anything else.

APIs with spiky traffic

This is where serverless saves you from yourself. Internal tools nobody uses until everyone uses them at once. New products where you genuinely don’t know if traffic will be 10 requests/day or 10,000. Campaign-driven spikes.

If your traffic pattern looks like a heartbeat monitor, Lambda + API Gateway handles it gracefully.

Glue logic and fan-out

The thin layer between systems. Forward events to multiple services. Transform payloads between formats. Enrich events with a quick lookup. Fire off notifications. This is plumbing work, and serverless is excellent plumbing.

Short batch with queue partitioning

Large jobs split into many small tasks, with SQS as the buffer. Reprocessing datasets in chunks, batch document conversion, ETL steps that fit the time limits. The queue does the heavy lifting on orchestration, each function just processes its piece.

Anti-Patterns That Will Hurt You

Long-running tasks

Lambda’s execution time limit is measured in minutes. If your job needs longer than that, you’re fighting the platform. I’ve watched teams try to work around this with recursive invocations and checkpoint-resume hacks. Don’t.

Split the work into small events via SQS. Orchestrate with Step Functions. Or just run it on ECS or EC2. Not everything needs to be serverless.

Monolithic handlers

One fat function that contains your entire API. Slow deploys. Inflated cold starts. Permission boundaries that make your security team cry. I’ve seen this pattern creep in because it feels easier at first – one function, one deploy, done. Then it grows. And grows.

One function per operation. Group by domain or bounded context. Share code through packages or Lambda Layers.

Synchronous call chains

Function calls function calls function. Each hop adds latency. Each hop amplifies failure modes. When the chain gets long enough, tracing a single request becomes archaeology.

Use async events (SNS, SQS). Orchestrate with Step Functions. Or just consolidate the small hops into one function – sometimes the simplest fix is admitting you over-decomposed.

VPC for everything

In 2018, putting Lambda in a VPC adds painful cold start overhead. Seconds, not milliseconds. Only do it when a function genuinely needs to reach private resources like an RDS instance.

Keep public APIs outside the VPC. Use managed services with public endpoints where you can.

Hidden state

Lambda reuses execution environments, but doesn’t guarantee it. I’ve seen bugs that only reproduce intermittently because someone cached state in a module-level variable and assumed it would persist. Sometimes it did. Sometimes it didn’t.

Treat every invocation as stateless. Persist to DynamoDB, S3, or Redis. Make handlers idempotent.

Cost and Performance – The Uncomfortable Truths

Pricing

Simple model, easy to misjudge. You pay per request and per millisecond of execution time, scaled by memory allocation. The wrinkle: more memory also means more CPU. Sometimes bumping memory from 128MB to 512MB makes a function finish 4x faster and actually costs less. Test this.

Cold starts

They’re real and they matter. Larger deployment packages and VPC-connected functions get hit hardest. Keep dependencies lean, minimize initialization, avoid the VPC unless you need it. For user-facing APIs, cold starts can push your p95 latency into uncomfortable territory.

Steady high volume is expensive

Here’s where the economics flip. If you’re running a high-throughput API with consistent traffic, containers or EC2 will be cheaper. Do the breakeven math before committing. Lambda’s per-invocation pricing adds up fast at sustained volume.

Operational Patterns Worth Getting Right

Idempotency

Event sources deliver at-least-once. Your handlers will see duplicates. Store an idempotency key in DynamoDB or Redis and check it before processing. This isn’t optional – it’s table stakes for any event-driven system.

Structured logging

You can’t SSH into a Lambda. Structured logs are how you debug production.

import json

logger.info(json.dumps({
    "event": "order_created",
    "request_id": context.aws_request_id,
    "order_id": order_id,
}))

Correlation IDs

Accept a correlation ID from the caller or generate one at the edge. Pass it through every downstream call. When something breaks across three functions, two queues, and a database, this is the thread you pull to find the problem.

Safe deployments

Use Lambda versions and aliases. Weighted aliases give you canary releases without touching client code. Ship with confidence, not with crossed fingers.

When Not To Use Serverless

Not every nail needs this hammer.

Sub-100ms latency with strict tail guarantees. Cold starts kill you.
Heavy compute or long-running batch jobs. The time limit is real.
Workloads needing large local state. Lambda gives you almost nothing.
Predictable, steady high-volume traffic. The economics don’t work.

Bottom Line

Serverless is a strong fit when the work is event-shaped, bursty, or glue-like. The operational payoff is real – we’ve seen it at the fintech startup for our data pipelines. But the constraints are design inputs, not problems to work around. Fight them, and the complexity doesn’t disappear. It just moves somewhere harder to debug.