The 15‑Minute Incident Triage Playbook for Solo On‑Call Devs in India

A compact, practical incident triage playbook for solo on‑call engineers in India—what to do in the first 15 minutes, with real tradeoffs and templates you can use today.

Written by: Rohan Deshpande

A small engineering team gathered around a laptop, discussing an incident
Image credit: Unsplash / Brooke Cagle

If you’ve ever taken a 3 a.m. page and stared at a terminal while the coffee goes cold, this one’s for you. For small teams and solo on‑call engineers in India, incidents don’t wait for perfect conditions. You need a fast, repeatable way to decide whether to escalate, fix, or shelve the problem—and do it before your phone battery dies or network connection drops.

Here’s a tight, realistic incident triage playbook I actually use. It’s designed to be executed in 15 minutes and gets you to a safe state quickly. It assumes limited tooling (PagerDuty or simple SMS, Slack/WhatsApp, basic observability), variable mobile data, and the usual constraints of small teams.

Main goal: reduce blast radius and customer impact fast. Secondary goal: buy time for proper investigation.

The 15‑minute rhythm

Minute 0–2: Read the page, set context

Minute 2–5: Quick verification

Minute 5–8: Contain

Minute 8–12: Short checklist for the underlying cause

Minute 12–15: Decide and document

Why this works (and what you give up) This playbook forces early decisions: contain or escalate, not deep diagnosis. The tradeoff is intentional. You may miss a nuanced root cause in the first 15 minutes, but you’ll stop user impact quickly and keep the blast radius small. For small teams, that’s usually the right trade.

Common real‑world constraints and how to handle them

A tiny incident log template you can copy (one line per update)

Why teams ignore this and why that’s a mistake Teams often skip early containment hoping to “fix it properly” in the first go. That mindset costs customers hours. The hardest cultural change is accepting that a temporary, visible mitigation (a rollback, a toggle) is a legitimate, responsible outcome. It’s not a failure—it’s risk management.

When not to follow the 15‑minute playbook

Wrap up like a human If you’re a solo on‑call in a small Indian startup, the first 15 minutes matter more than the next 15 hours. This incident triage playbook is intentionally short and imperfect—good enough to stop the bleeding and give you time to think. Try it for a month, tweak the steps to your stack, and keep one honest tradeoff in your pocket: speed now, root cause later.

If you want, I can share a compact Slack status template and a one‑page runbook you can paste into your repo README. It’ll save you one late‑night panic, at least.