Practical Seed Data for Local Development: Realistic, Fast, and Respectful of Bandwidth

A practical playbook to create and manage realistic local seed data that saves dev time, respects privacy, and works on slow Indian connections.

Written by: Rohan Deshpande

A laptop screen showing code and a terminal with a developer typing
Image credit: Markus Spiske / Unsplash

If you’ve ever sat waiting ten minutes for a Postgres dump to restore before you can reproduce a bug, you know the friction bad test data introduces. Conversely, if your local DB is a tiny, sterile set of rows that never hits the real bugs, you also know the other problem: false confidence.

I want to share a practical approach I use with small teams in India to keep local development fast, realistic, and safe. It combines compact samples, anonymised snapshots, and a couple of scripts so a colleague can spin up a working environment in under two minutes—even on a home broadband connection.

Why seed data matters (and what usually goes wrong)

My constraints

The playbook

  1. Build a compact “golden snapshot” Create a trimmed production-like snapshot: a dump with enough rows per table to reproduce edge cases (think 50–200 rows for most tables, 1–2k for high-volume ones like events). The goal: preserve relationships, common failure patterns, and realistic distributions (some users with many orders, many with none).

How I make it:

This “golden snapshot” becomes our canonical seed data.

  1. Anonymise and enforce privacy Never store real personal data in a repo or a shared bucket. Replace obvious PII with deterministic fake equivalents so tests remain consistent:

A small script that runs on the snapshot to re-map PII is worth its weight; it can be deterministic so different team members get the same seeded values.

  1. Versioned, compressed artifacts stored cheaply Store the compressed snapshots in object storage with versioning. For teams in India, S3-compatible providers or a small ₹300–₹600/month VPS with MinIO works fine. Keep one “latest” and dated versions for releases.
  1. One-command restore with a small bootstrapper Make on-boarding a one-liner:

curl -sSLO https://my-bucket.company/golden.zst && zstd -d -c golden.zst | psql mydb

Wrap that in a shell script that:

I added a tiny progress bar and retry logic because many devs on 4 Mbps connections face flaky downloads.

  1. Keep a set of focused edge-case fixtures Aside from the golden snapshot, maintain a folder of single-purpose fixtures for the weird stuff: a user with 10k orders, a payment failure sequence, or a multi-tenant conflict. These are small SQL files you can apply selectively when debugging.

  2. Automate refreshes and prune drift Schedule a weekly job that refreshes the golden snapshot from staging (not prod), re-runs anonymisation, recompresses, and uploads. Add a lightweight smoke test that restores the snapshot in CI to ensure the file isn’t corrupt.

Tradeoffs and the messy reality

Tips that saved me time

When to reach for the heavy artillery If you’re debugging a production-only issue (race conditions under heavy load, rare data corruptions), you will need larger dumps or a staging environment that mirrors prod scale. Treat those as special cases—don’t make every dev deal with them locally.

Conclusion Good seed data is a balance: small enough to restore quickly on a slow connection, but rich enough to reveal real bugs. In my experience, a trimmed, anonymised golden snapshot plus a library of edge-case fixtures cuts the time to meaningful local debugging from tens of minutes to a couple of minutes. It’s not perfect, and it needs a small maintenance habit, but it elevates every developer’s day-to-day productivity—especially when your home internet decides to be temperamental.

If you want, I can share a starter anonymisation script and the sample queries I use to pick diverse rows.