A 10‑Minute Repro VM That Saves Me a Client Support Hour (and the day it lied)

How I maintain tiny, versioned VM images to reproduce client bugs fast—what I automate, the storage and bandwidth tricks I use in India, and the one failure that taught me to stop assuming.

Written by: Arjun Malhotra

Close-up of hands typing on a laptop keyboard with code visible on the screen
Photo by Christina @ wocintechchat.com on Unsplash

It was 9:15 p.m. on a Wednesday. The client’s staging server kept failing a POST to our payment callback. Logs showed a TLS error that only happened under load. They handed me a stack trace, a vague “it works on my machine”, and a promise: “Can you reproduce this locally?” I pulled the code, ran the tests, stared at a green CI, then stared at my slow home internet. Two hours later I was still trying to match the exact distro, OpenSSL version and systemd behavior. The bug was reproducible in an environment I didn’t have. I lost an hour to setup. Then I lost another hour to context switching.

After that week I decided to stop gambling time on ad-hoc local setups. I built one small habit that pays back on those nights: a tiny, versioned reproducible VM for each client or major environment. It boots in ~10 minutes on my laptop and matches the service-level environment well enough to either reproduce the bug or tell the client why the bug can’t be reproduced without more info.

Why a tiny VM, not Docker?

Containers are great. But in my experience with payment integrations, vendor drivers, and kernel-tied TLS quirks, Docker can hide the differences you actually need to test. The problems that ate my hours were OS packages, systemd service ordering, specific OpenSSL builds and the odd kernel behavior under cgroups. A minimal VM gives me an entire userspace and kernel surface to match — without spinning up a full cloud instance.

What I keep in place

How I make it actually fast

A real example: reproducing a TLS handshake failure

One client had a TLS handshake failure only under heavy concurrent requests. Booting the exact distro+OpenSSL build in the overlay reproduced it within minutes. The issue turned out to be a sysctl value (net.ipv4.tcp_tw_reuse) set differently by their sysadmin. I patched the overlay, repro’d, and pushed a small config change. No long setup, no long debug session.

The week it lied

There was one bug that broke my faith in small VMs: a race that depended on a CPU microcode quirk and a proprietary driver the client ran on their VM host. My reproducible VM was green. The client’s VM under KVM on their provider failed. I spent a day on false confidence until the client sent me /proc/cpuinfo and dmesg. The difference was in a vendor microcode and a kernel module. My tiny VM had no hope of matching that without turning into a full hardware-capture project.

That taught me two things:

The tradeoffs I accepted

How this changed my nights and meetings

I now lose far fewer evenings to environment setup. On an incident I can boot an environment and know within 10–15 minutes whether I have a reproducible case. That changes the conversation: “I can reproduce this in environment X; it fails at step Y” is a lot more useful than “I tried, can’t reproduce.”

In meetings, the ability to say “let me boot the client overlay and demo” reduces finger-pointing. It also made engineers a lot more likely to try reproducing locally because the friction is gone.

One takeaway

If you’re still spending your nights installing packages to match a client environment, stop. Build one small, versioned VM per client or environment — make it bootable in ten minutes, keep it lightweight, and automate the mundane parts (overlay creation, package caching, and sharing). It won’t match everything. But it will turn hours of setup into ten focused minutes, and that change is worth the extra SSD and ₹300 VPS.