When a 10% CPU Spike Felt Like a Mystery — How Flamegraphs Solved It

Jan 13, 2026

How I used flamegraphs to find a sneaky CPU hotspot in production, with practical commands, tool choices, and real tradeoffs for Indian teams.

Written by: Rohan Deshpande

Developer typing on a laptop showing terminal windows with code and profiling output. — Image credit: Pexels / fauxels

flamegraphs

performance

devops

A few months ago our payments service started showing a 10% CPU increase on the busiest node. Logs were clean, traces were noisy but inconclusive, and the pager was politely persistent. We tried increasing instance size, sampling traces, and adding caching — none fixed the root cause. What did work was embarrassingly simple: a 30‑second flamegraph.

If you haven’t used flamegraphs much, they feel like a detective’s magnifying glass for CPU and latency hotspots. You get a single interactive SVG that shows where time is spent, grouped and prioritized visually. For teams in India — small budgets, conservative production access, and tight SLAs — flamegraphs are a low-cost, high-signal tool you should learn to run cautiously and quickly.

Why flamegraphs first (and not flame-retina-blah)

They surface hotspots across stacks (C, Java, Python, Node) without chasing log noise.
The output is sharable: an SVG you can drop in Slack or a PR and everyone understands.
They’re sampling-based, so overhead is small and the data is statistically meaningful.

A quick, practical pipeline (Linux perf + Brendan Gregg’s scripts)

Clone the scripts locally: git clone https://github.com/brendangregg/FlameGraph.git
Sample a running process (30 seconds, 99Hz sampling): sudo perf record -F 99 -p -g — sleep 30
Convert to folded stacks and generate an SVG: sudo perf script | ./FlameGraph/stackcollapse-perf.pl > out.folded ./FlameGraph/flamegraph.pl out.folded > flame.svg Open flame.svg in a browser. Tall, wide boxes mean “this path eats CPU.” Look for unexpectedly large user-level frames (your app code), not just library noise.

Language-specific helpers that save time

Java: async-profiler (produces an SVG directly and handles JIT better) ./profiler.sh -d 30 -f cpu.svg
Python: py-spy (non-root attach for same‑user processes) py-spy record -o profile.svg —pid —duration 30
Node: 0x or Clinic.js for V8-flavored profiles

When flamegraphs are the wrong tool

They won’t show why latency spikes happen when you’re waiting on external I/O (DB/network) — use distributed traces or tcpdump for that.
For memory leaks, heap profilers are better; flamegraphs focus on CPU (though allocation flamegraphs exist).
Very short-lived, rare spikes can be missed by sampling unless you increase duration or frequency.

Real constraints and tradeoffs (what we learned)

Permissions: perf and many eBPF tools need elevated privileges. In many Indian startups, developers can’t sudo on prod. Mitigation: run brief captures on staging or ask SRE for a narrow maintenance window. For same-user processes, tools like py-spy help.
Overhead: sampling is light (a few percent typically), but on small instances (t2.micro / t3.small common in cost-conscious infra) even small overhead can matter. Always test on a staging clone or off-peak.
Symbol clarity: stripped binaries or missing debug symbols give unreadable frames. For Java, native frames can dominate unless async-profiler is used; for C/C++ you may need debug symbols. Shipping debug symbols to dev builds in staging is a good habit.
JIT and inlining: JITted frames can be confusing. Use language-aware profilers where possible (async-profiler, perftools) to get clearer results.
Human coordination: a 30s capture during business hours may still affect customers. Getting quick SRE buy-in is part of the cost.

How I used the flamegraph In our case the flamegraph quickly showed a large block inside a JSON marshaller path — not the DB or network as we’d suspected. A recent change had enabled a legacy logger to marshall entire payloads on every request. We reverted the logger change and the CPU normalized. The flamegraph saved us from an expensive horizontal scale-up and many noisy hypotheses.

Communicating results One underrated advantage: flamegraphs are persuasion-ready. Instead of “I think the marshaller is slow”, you can paste an SVG into the incident chat and show a colleague exactly which call path dominates CPU. That helped us get quick approval for the rollback.

A small checklist before you run one in production

Confirm a maintenance window or off-peak test.
Prefer staging if the bug is reproducible there.
Use short durations (10–60s) first; increase only if you need better signal.
Capture distribution: run a couple of samples at different times.
Save raw folded stacks — they’re useful for diffs later.

Takeaway (my position) Flamegraphs aren’t a silver bullet, but they ought to be your default first instrument for mysterious CPU or latency hotspots. They’re cheap to run, produce a single actionable artifact, and force you to look at where time is actually spent. The downside is coordination and a tiny runtime cost — but compared to hours of blind guessing or a needless scale-up, they pay for themselves fast.

If you haven’t used them in a production incident, try this tonight on a staging replica: clone the FlameGraph repo, run a 30s profile, and open the SVG. The first time you see the actual hotspot laid out visually, you’ll understand why I keep one in my incident toolkit.