Industry benchmark
Observability as a percentage of cloud spend
Industry research puts the 2026 median observability spend at 7 to 12 percent of total cloud infrastructure cost. Below 3 percent signals under-monitoring; above 20 percent typically signals overspend. The healthy range varies meaningfully by stack shape, and the FinOps Foundation framework treats observability as a first-class cost category worth deliberate management.
TL;DR
Industry median 7 to 12 percent of cloud infrastructure cost. Monoliths typically 4 to 7 percent; microservices 8 to 15 percent; complex multi-region 15 to 25 percent. Below 3 percent signals under-monitoring; above 20 percent typically signals overspend. Sources: CNCF, Gartner, FinOps Foundation, Flexera State of the Cloud.
The benchmark
What healthy observability spend looks like
Observability spend as a percentage of cloud infrastructure cost has emerged as the most useful single benchmark for cost-management decisions. The metric is straightforward (sum of observability vendor bills divided by sum of cloud infrastructure bills, both monthly), the variance across organisations is meaningful, and the interpretation is grounded in published industry research rather than vendor marketing.
The Flexera State of the Cloud annual report has tracked observability-as-share-of-cloud since 2020, with the most recent report (2026) showing an industry median of 8 to 12 percent across surveyed enterprises. The CNCF annual survey publishes Kubernetes-specific observability patterns, with Kubernetes-heavy organisations running observability spend at 10 to 15 percent of total cloud cost on average. Gartner research on FinOps maturity, summarised in published press releases, shows that mature FinOps organisations track observability cost separately and typically run 7 to 10 percent of cloud spend on observability after structured optimisation programmes.
The variance by stack shape is meaningful. Monolithic applications running on traditional VMs typically have lower observability cost ratios (4 to 7 percent) because the monitoring footprint is straightforward (per-host metrics, application logs, basic APM). Microservices stacks have higher ratios (8 to 15 percent) because the per-service telemetry multiplies. Complex multi-region distributed systems can run 15 to 25 percent because the cross-region telemetry, distributed tracing across many services, and security observability across multiple environments all compound.
For organisations benchmarking against these ranges, the question is not just whether you fall inside the median but whether your ratio is appropriate for your stack. A 15 percent ratio on a monolithic application is high and worth investigating; the same 15 percent on a Kubernetes-heavy multi-region distributed system is within the healthy range. The benchmark frames the conversation rather than dictating the answer.
The under-monitoring risk
What 3 percent or less actually means
Observability spend below 3 percent of cloud infrastructure cost typically signals under-monitoring rather than efficient observability. The implicit costs are real but harder to measure than the visible savings: longer mean-time-to-detection of incidents, longer root-cause investigations, capacity planning by lagging billing data rather than leading utilisation metrics, and security incidents that go undetected until customer or regulator reports them.
The most common under-monitoring symptom is incidents being detected by customer reports rather than alerts. The team learns about outages from social media, support tickets, or status page submissions rather than from monitoring. The mean-time-to-detection in this regime is typically 30 to 120 minutes, compared to 1 to 5 minutes for properly instrumented production systems. The business cost of the additional detection lag is usually substantial, particularly for revenue-critical workloads.
The second under-monitoring symptom is slow root-cause investigation. Without distributed tracing, transaction-level metrics, or comprehensive logs, on-call engineers have to reconstruct what happened from incomplete data. Investigation time stretches from minutes to hours; complex incidents that should resolve in a working day stretch across multiple days. The engineering productivity impact is real but rarely measured directly.
The third under-monitoring symptom is capacity planning by billing data. Teams that do not have leading utilisation metrics are forced to react to capacity issues rather than anticipate them. Auto-scaling thresholds are set conservatively to avoid surprises, which inflates infrastructure cost. Capacity expansions happen reactively, often during incident windows when the team is also dealing with other issues. The infrastructure cost of capacity overhead under reactive planning typically exceeds the observability spend that would prevent it.
The overspend signal
What 20 percent or more typically means
Uncontrolled log volume
Custom metric explosion
Premium vendor pricing without ROI
The FinOps approach
Treating observability as a FinOps category
The FinOps Foundation framework treats observability as a first-class cost category alongside compute, storage, network, and licence. The standard FinOps practices apply: showback to consuming teams, chargeback when organisationally appropriate, quarterly per-team budget targets, and structured optimisation programmes when teams exceed targets.
The first FinOps practice is showback. Aggregate observability cost by application team or product line, publish a quarterly dashboard showing per-team consumption, and create transparency about who drives the spend. Most organisations discover that 5 to 15 percent of teams produce 60 to 80 percent of observability cost, mirroring the typical pattern in any FinOps category. Once visibility exists, the conversation about cost-management becomes structured rather than ad hoc.
The second FinOps practice is per-team budget targets. Set quarterly budget targets per team based on team size and workload complexity. Teams that come in under target are recognised; teams that exceed target by more than 20 percent face structured optimisation engagement (cardinality audit, log-filtering implementation, vendor renegotiation if at scale). The budget targets create the economic incentive for application teams to manage their own observability cost rather than treating it as someone else's problem.
The third FinOps practice is structured optimisation programmes. When a team exceeds budget, the response is engineering work (audit, filter, sample, drop labels, configure indexing exclusions) rather than budget approval increases. The programmes typically run 1 to 3 quarters and recover 40 to 70 percent of the over-budget spend through structural changes rather than vendor negotiation alone. The combined result is that mature FinOps organisations run observability at 7 to 10 percent of cloud spend rather than the 12 to 18 percent that under-managed organisations report.
Cost reduction levers
Three things to do if your ratio is too high
Source-side log filtering
Custom metric cardinality audit
APM trace sampling
Verify your ratio first
Cross-references
Related pages
/benchmarks
Cost benchmarks by team size
/reduce-monitoring-costs
Twelve cost-reduction strategies
/hidden-costs
Hidden costs that never appear on a pricing page
/aws-monitoring-cost
AWS monitoring cost: native vs third-party
/log-management-cost-1tb
Log management cost for 1 TB/day
/calculator
Multi-vendor cost calculator
/comparison
Six-vendor comparison
/kubernetes-monitoring
Kubernetes monitoring cost mechanics
/methodology
How we research pricing