The ₹3,800 time server that stopped our flaky auth errors

I set up a cheap local NTP server (Raspberry Pi + chrony) to stop intermittent 'expired' tokens and flaky tests. What worked, what broke, and the real cost.

Written by: Arjun Malhotra

A laptop open on a wooden desk with code on screen and a blurred office background
Photo by Claudio Schwarz on Unsplash

It was 2:17 AM. The on-call Slack had three threads: a few users seeing failed logins, an alert in PagerDuty for CI job failures with “signature expired”, and a frantic message from QA about UPI payments timing out at checkout. The stack traces all pointed to the same weird thing — token timestamps out of sync. But which clock?

We traced logs, checked TLS cert validity, re-ran the job locally. The CI runner’s VM showed the right time. A developer’s laptop was two minutes slow. One of our staging containers, restored from a snapshot that morning, was 90 seconds ahead. The outage wasn’t caused by a single server — it was a barn-door-sized class of bugs cascading from clock skew.

Why clock drift keeps winning

I’d always treated time as plumbing: servers sync to pool.ntp.org and the rest is magic. That week I learned the plumbing is porous.

A few common things were conspiring:

I could have made the tokens more lenient. I could have added bigger buffers in validation. I did neither. Those are bandaids. The real fix would be to make our environment agree on time.

What I actually built (and how much it cost)

I wanted something low-friction that solved the immediate problem for people and CI in our local network. I put together a tiny, local NTP server on a Raspberry Pi and told the team to use it as the primary time source.

What I bought:

The setup in brief:

Why chrony: it corrects large offsets faster and handles intermittent connectivity better than classic ntpd. It also has built-in tracking to avoid sudden time jumps that can break logs.

The result was visible within days. The number of auth-related failures dropped from a few per week to near zero. The “signature expired” CI flakes went away because the VMs and runners were now syncing to the same local reference immediately at boot. Developers stopped seeing sporadic payment failures on devices tethered to the office Wi‑Fi.

The failure I didn’t expect (and the tradeoff I accepted)

This is where I admit the thing that bit me.

A month in, a prolonged power cut hit my house (Bengaluru monsoon season). The Pi’s SD got corrupted — the cheap SD card I used was the weakest link. The NTP server died cleanly but unexpectedly. Because we’d made the Pi the primary time source in DHCP, some devices that had no other reliable upstream NTP drifted until they could reach pool.ntp.org directly. For a few minutes we saw a handful of flaky tests again.

The lessons:

Costs vs benefit in our setting

Total spend including UPS and a better SD card came to about ₹6,000. For a small team repeatedly debugging late-night auth flakiness, that’s cheap insurance. It buys two things that matter more than uptime numbers: deterministic debugging and fewer noisy, misleading failure modes.

If you run short-lived tokens, care about TOTP/OTP, or maintain CI jobs that boot from snapshots, local time consistency is an underrated dependency. Your logs become trustworthy again, temporal race conditions stop being ghosts, and you stop chasing timestamps in ten different places.

Takeaway

If intermittent “expired” tokens or snapshot-startup flakes have cost you late nights, set up a small local NTP server and make it the preferred source on your LAN and CI hosts. It won’t fix every clock-related problem — power, redundancy, and SD reliability matter — but it narrows the blast radius fast. My real lesson: fix the environment before loosening your security rules.