On‑Call Rotations That Don't Suck: A Practical Playbook for Small Indian Teams

Concrete practices for fair, sustainable on-call rotations for small engineering teams in India—reduce pager noise, protect deep work, and keep weekends human.

Written by: Rohan Deshpande

Developer working late at a laptop with a coffee mug, code visible on the screen
Image credit: Brooke Cagle / Unsplash

You don’t need a giant operations team to have a good on‑call culture. You do need rules, tools, and a bit of mercy.

I spent my first year of on‑call rotations treating every ping like a five‑alarm fire. Burnout followed. Over the next two years, on teams of 6–20 engineers in Bengaluru and remote across India, I helped build a lighter, more predictable approach that actually let people sleep and ship features. Here’s the practical playbook we used—what worked, what backfired, and the tradeoffs you’ll face.

Why on‑call rotations should be humane (and realistic)

Main keyword: on-call rotations (use naturally throughout).

Start with the right scope Define what “page” means. We kept three levels:

This simple triage reduced noisy pages by ~60% in our first month. Put these definitions in your runbook and enforce them in alert rules.

Invest 2 hours to write practical runbooks A runbook that says “restart service” beats 20 Slack messages. For each P0/P1 path, include:

Runbooks pay dividends in less context switching and faster resolution. The downside: they need maintenance. Treat them as living docs—update after each incident.

Reduce noise, then automate escalation Noisy alerts are morale killers. Start by setting sensible thresholds (not zero errors). Use aggregation windows and mute known flapping endpoints. Add a short delay before paging—for many transient hiccups, your monitoring will self‑recover.

Automate escalation: if on‑call doesn’t ack in 10 minutes, escalate to the backup and then to the lead. It prevents endless 3am group pings.

Pick rotation cadence based on team size (tradeoffs) Common options:

For teams of 8–20, we found two‑week rotations with a designated backup hit the sweet spot. Smaller teams (4–7 people) often prefer weekly so the load is more evenly spread. Tradeoff: longer rotations let you learn the system better; shorter rotations reduce the chance any single person gets burned out.

Compensation and boundary setting (do both) In India, the cultural default is “we’ll handle it”—that’s a trap. Pay a clear on‑call stipend or time‑off-in-lieu. Even a modest stipend (₹2,000–5,000/month) signals respect. Equally important: protect work hours. On‑call days should avoid scheduled deep‑work meetings; make them “meeting-lite.”

Practical tooling that doesn’t bankrupt you You don’t need enterprise pager tools to get started:

A realistic tradeoff: cheaper tools bring more manual glue (scripts, cron jobs). That’s fine if you accept a bit of operational debt and prioritize automation later.

Run lightweight postmortems and followups After any P0/P1, do a 30–60 minute postmortem. Focus on: what happened, why the alert fired, what will we change to prevent recurrence, and who will do it. Keep blame out. Track followups as sprint tasks. This is how reliability improves without hiring.

Cultural rules that matter

When you’ll still feel the pinch If your team is understaffed or running many critical services, on‑call will remain a stressor. Hiring, better automation, and moving riskier services to managed platforms are the real long‑term fixes. Also: not every service deserves 24x7 coverage—accept measured risk for low‑impact systems.

A final, practical checklist to implement this week

On‑call rotations don’t have to be heroic. With clear scope, decent runbooks, a few automation rules, and fair boundaries, small teams can keep services reliable without people living on adrenaline. Do the small, boring work—your sleep schedule will thank you.

If you want, I can share the runbook template we used (short, one page) or a starter alert rule set that cut our midnight pages in half.