Cut Your Cloud Bill, Not Your Signals: Practical Log Retention for Small Indian Teams
A no-fluff playbook for small teams to cut cloud log storage costs with tiered retention, indexing tricks, and an India-friendly compliance lens.
Written by: Devika Iyer
We all joke about logs being the “single source of truth”—until the bill arrives. A few years ago our small product team in Bengaluru had a predictable panic: charts showing our logging costs climbing month after month. We were shipping features, not mountains of stored JSON. The surprise: most of what we kept was noise; it just felt safer to keep everything forever.
If you run a small Indian startup or support freelance clients who expect reliability without runaway cloud bills, this is a pragmatic playbook for sensible log retention. I’ll explain what saved us money, what broke, and the tradeoffs you should plan for.
Why log retention matters (and where teams mess up)
- Logs become cheap to generate and expensive to store and query. Services like Elasticsearch, CloudWatch Logs, or managed ELK can surprise you with index and query charges.
- Default habits—keep everything for 90 days, index everything, and retain raw logs indefinitely—compound costs.
- The key problem is not how many logs you produce, but how much of them are kept hot and searchable.
The simple principle: keep what’s useful hot, move the rest to cold (or delete), and index only the metadata you actually search.
A four-step, low-friction approach
- Classify logs by utility (do this first) Spend an afternoon and map your logs into three buckets:
- Hot (0–7 days): production errors, security alerts, on-call signals.
- Warm (7–30 days): aggregated application traces, recent request logs for debugging.
- Cold (30 days+): raw access logs, debug noiseful events, metrics-compatible archives.
This classification saved us the most money. For us, hot was 5% of volume, warm 25%, cold 70%. Your ratios will differ, but the exercise forces decisions.
- Stop indexing everything—index metadata only Indexing full text of every log is expensive. Index a small set of searchable fields: timestamp, request id, user id (if permitted), error code, service name. Store the full event blob in cheaper object storage (S3/GCS) and link it from the indexed record.
Tradeoff: you lose quick free-text search across the full payload. For most incidents we could rehydrate the full JSON from object storage when needed. That extra step is worth the monthly savings.
- Use lifecycle tiers aggressively Put raw logs in object storage with lifecycle rules:
- Move to S3 Standard → S3 Standard-IA after 30 days → Glacier Deep Archive after 90 days.
- Apply similar lifecycle in GCS or Azure Blob.
You only pay a tiny retrieval fee and slower access for cold archives, which is fine for audit or rare forensic needs. For us, moving cold logs to Glacier cut storage costs by ~70%. We did keep a small rolling window in the hot index for immediate debugging.
- Automate retention decisions and alert on anomalies Make retention part of CI/CD or logging pipeline config. Maintain a living doc that explains:
- Who can change retention
- How to request an exception (e.g., for a legal hold)
- How long different log types are retained
We added an alert that fires if daily logging volume rises 30% above baseline. That caught a silent loop that would have bled our budget.
India-specific constraints and compliance If you handle finance, payments, or GST-related workflows, remember some records may need longer retention. Under Indian tax regulations, businesses must retain certain financial records for up to 6 years. I’m not a lawyer—consult your compliance team. For many SaaS teams, 90 days of hot searchable logs plus cold archives satisfies both practical debugging and regulatory needs.
What broke (so you don’t learn the hard way)
- Overzealous deletion: We once mis-tagged a “security” stream and archived it too aggressively. Retrieving took time and a small but painful retrieval cost. Always have an exception process.
- Search latency during incident war rooms: Rehydrating blobs adds latency. We solved this by keeping 48 hours of raw payloads in hot storage for services that are high-risk.
- Cost of retrievals: Glacier retrievals can be cheap if infrequent, expensive if you suddenly need a big bulk restore. Test a restore process and budget for occasional restores.
Quick checklist to implement in a week
- Audit current logs and volumes (CloudWatch/ELK usage reports).
- Identify top 3 services producing most volume.
- Configure indexing to only keep essential searchable fields.
- Move raw logs to object storage and add lifecycle rules.
- Add monitoring alert for logging volume spikes.
- Document retention policy and exception flow.
Numbers that made sense for us After implementing this, our logging bill dropped by about 65%—roughly ₹18,000/month on a modest stack (CloudWatch + S3 + some Elasticsearch). Your numbers will vary, but even modest firms can find tens of thousands INR per month by removing low-value hot storage.
Final thought: it’s a policy problem, not a tooling problem There’s no silver-bullet product that fixes runaway logging costs if you don’t decide what’s important. The most valuable step is the conversation: agree on what deserves real-time attention, what can live cold, and who owns those choices. The tooling part—lifecycle rules, indexing decisions, object storage—are straightforward once policy exists.
If you want, I can sketch a minimal retention policy you can paste into your team handbook, or walk through how we restructured our CloudWatch + S3 pipeline step by step. Either way, be deliberate: keep the signals that matter, and stop paying for the noise.