GDPR for Engineers: What We Actually Built at a Fintech Startup

Eleven days. That’s what we’ve left before GDPR enforcement kicks in on May 25th. At the fintech startup, where we process financial news and market data tied to user watchlists, portfolios, and reading behavior, “personal data” is basically everything. Every stock alert. Every saved article. Every search query. All of it maps back to a person.

I’ve spent the last few months turning GDPR from a legal document into running code. This is the technical side of that work — what we built, what surprised us, and the stuff I wish someone had written down earlier.

The scope problem

GDPR’s definition of personal data is wide. Wider than most engineers expect.

The obvious stuff — name, email, phone number — sure. But also: IP addresses, device fingerprints, cookie IDs, location data, behavioral analytics tied to any identifier. At the fintech startup, user watchlists and reading patterns count as personal data because they reflect individual interests and financial behavior. Even aggregated engagement metrics can qualify if they’re traceable back to a user.

Before writing any code, we needed a complete inventory. Not a spreadsheet someone fills out once and forgets. A living document that gets updated every time a new service touches user data.

Data mapping: the unsexy foundation

This is the part nobody wants to do but everything depends on. We mapped every system that stores or processes personal data, what fields it holds, why, and how long we keep it.

Here’s a simplified version of the format we used:

systems:
  - name: users_service
    data_types: [email, name, password_hash, created_at]
    purpose: authentication
    retention: account_lifetime
  - name: analytics_db
    data_types: [user_id, ip_address, page_views, watchlist_interactions]
    purpose: product_improvement
    retention: 90_days
  - name: crm_vendor
    data_types: [email, name, subscription_tier]
    purpose: marketing
    processor: true

The hard part wasn’t the primary databases. It was everything else. Logs. Caches. Search indexes. Elasticsearch snapshots. Data replicated to analytics warehouses. Third-party tools we’d integrated months ago and half-forgotten about.

For each processing activity, we also recorded the legal basis. In fintech you get a mix: consent for marketing, contract necessity for core service delivery, legal obligation for financial record-keeping. Getting this wrong means you’re either over-consenting users (annoying) or under-complying (dangerous).

A consented: true column isn’t enough. You need a full record: who consented, to what, when, how, and which version of the consent text they saw.

We store:

User ID
Purpose key (e.g., marketing_emails, analytics_tracking)
Status and timestamp
The mechanism (checkbox on signup, settings toggle, etc.)
Version hash of the consent copy
IP and user agent at the time of consent

Withdrawal has to be just as easy as opting in. One action, immediate effect, propagated to every downstream system that cares about that purpose. No “we’ll process your request in 5-7 business days.”

Pre-checked boxes are dead. If you need consent for something, the user has to actively choose it. We rebuilt our onboarding flow to make this work without making it miserable.

Rights handling: the real engineering work

GDPR gives users a set of rights. Each one needs a concrete technical path from request to fulfillment.

Data export (access + portability)

Users can ask for all their data. All of it. Across every system. In a machine-readable format.

For the fintech startup, that means account info, watchlists, reading history, alert configurations, saved articles, search history. We built an async export pipeline — you request it, we queue the job across services, assemble the result, and notify you with a time-limited download link.

{
  "export_date": "2018-05-14T10:30:00Z",
  "user": {
    "email": "[email protected]",
    "name": "Jane Doe"
  },
  "watchlists": [],
  "reading_history": [],
  "alerts": [],
  "searches": []
}

The tricky bit is completeness. You have to hit every service. If one is down or slow, you need retry logic. And you need to log that the export happened without storing a copy of the exported data forever.

Rectification

Users can correct inaccurate data. Straightforward in the primary database. Less straightforward when that data has been replicated into analytics pipelines, search indexes, and third-party systems. We added a propagation layer that pushes corrections downstream.

Erasure (the hard one)

Erasure isn’t DELETE FROM users WHERE id = ?. It’s a distributed operation with exceptions.

Our erasure pipeline:

Deletes profile and preference data from the primary store
Anonymizes analytics events (replace user ID with a hash, strip IP)
Respects legal retention for financial transaction records (we’re required to keep some billing data)
Fires deletion requests to third-party processors
Flags the user ID so restored backups don’t resurrect deleted data

That last point — backup resurrection — is the one that caught us off guard. If you delete a user today and restore a backup from last week, they’re back. We solved this with an exclusion list: a set of deleted user IDs checked on any restore operation.

Restriction and objection

Restriction means “stop processing but don’t delete.” We implemented this as a per-purpose flag that downstream services check before touching the data. Objection works similarly — a user can say “stop using my data for X” and we’ve to honor it immediately for that purpose.

The backup problem

Backups deserve their own section because they’re where most GDPR implementations quietly fail.

We evaluated three approaches:

Per-user encryption keys — encrypt user data with a key unique to them. When they request erasure, destroy the key. Data becomes unreadable. Elegant in theory, complex in practice for existing systems.
Backup rotation — set backup retention windows short enough that deleted data ages out. We went with 30-day rotation.
Exclusion lists — maintain a list of erased user IDs. On any restore, filter them out before the data hits production.

We ended up combining options 2 and 3. Short rotation windows plus exclusion lists on restore. Not perfect, but defensible and implementable in the time we had.

Privacy by design (the practical version)

This phrase gets thrown around a lot. Here’s what it actually meant for our codebase:

Stop logging user emails in debug output. We caught this in four services.
Pseudonymize analytics events at collection time, not as a batch job later.
Enforce field-level access control so the marketing service can’t read financial data and vice versa.
Encrypt PII at rest. Not the whole database — the specific sensitive columns.
Collect less. We removed three analytics events that captured data we never actually used.

Operational readiness

GDPR expects you to prove accountability. That means audit trails and operational controls.

What we put in place:

Access logging for every read of personal data (who accessed what, when, why)
A processing activity register that maps to our data inventory
Vendor inventory with data categories and processor agreements
Internal SLA: fulfill data subject requests within 14 days, well inside the one-month legal window
Automated retention enforcement — cron jobs that purge expired data, with logs proving they ran

What I’d tell another engineer starting this

Do the data inventory first. Everything else is guessing without it.

Build the erasure pipeline before the export pipeline. Erasure is harder, touches more systems, and has more edge cases. If you can delete cleanly, export is straightforward by comparison.

Don’t underestimate third-party processors. We had vendors who couldn’t confirm their GDPR readiness two weeks before the deadline. That’s a risk that lands on you, not them.

Test the failure paths. What happens when one service is down during an erasure request? What about duplicate requests? Partial deletions? We found bugs in all of these during testing.

And automate from day one. Manual processing of data subject requests doesn’t scale past the first week. We learned that from our initial testing volume alone.

May 25th is coming fast. The regulation isn’t going to wait.

GDPR for Engineers: What We Actually Built at a Fintech Startup

The scope problem

Data mapping: the unsexy foundation

Rights handling: the real engineering work

Data export (access + portability)

Rectification

Erasure (the hard one)

Restriction and objection

The backup problem

Privacy by design (the practical version)

Operational readiness

What I’d tell another engineer starting this

Assumptions

Limits

References

The scope problem

Data mapping: the unsexy foundation

Consent: store the evidence, not just the boolean

Rights handling: the real engineering work

Data export (access + portability)

Rectification

Erasure (the hard one)

Restriction and objection

The backup problem

Privacy by design (the practical version)

Operational readiness

What I’d tell another engineer starting this

Assumptions

Limits

References