Industry benchmark

Observability as a percentage of cloud spend

Verified April 2026

Industry research puts the 2026 median observability spend at 7 to 12 percent of total cloud infrastructure cost. Below 3 percent signals under-monitoring; above 20 percent typically signals overspend. The healthy range varies meaningfully by stack shape, and the FinOps Foundation framework treats observability as a first-class cost category worth deliberate management.

TL;DR

Industry median 7 to 12 percent of cloud infrastructure cost. Monoliths typically 4 to 7 percent; microservices 8 to 15 percent; complex multi-region 15 to 25 percent. Below 3 percent signals under-monitoring; above 20 percent typically signals overspend. Sources: CNCF, Gartner, FinOps Foundation, Flexera State of the Cloud.

The benchmark

What healthy observability spend looks like

Observability spend as a percentage of cloud infrastructure cost has emerged as the most useful single benchmark for cost-management decisions. The metric is straightforward (sum of observability vendor bills divided by sum of cloud infrastructure bills, both monthly), the variance across organisations is meaningful, and the interpretation is grounded in published industry research rather than vendor marketing.

The Flexera State of the Cloud annual report has tracked observability-as-share-of-cloud since 2020, with the most recent report (2026) showing an industry median of 8 to 12 percent across surveyed enterprises. The CNCF annual survey publishes Kubernetes-specific observability patterns, with Kubernetes-heavy organisations running observability spend at 10 to 15 percent of total cloud cost on average. Gartner research on FinOps maturity, summarised in published press releases, shows that mature FinOps organisations track observability cost separately and typically run 7 to 10 percent of cloud spend on observability after structured optimisation programmes.

The variance by stack shape is meaningful. Monolithic applications running on traditional VMs typically have lower observability cost ratios (4 to 7 percent) because the monitoring footprint is straightforward (per-host metrics, application logs, basic APM). Microservices stacks have higher ratios (8 to 15 percent) because the per-service telemetry multiplies. Complex multi-region distributed systems can run 15 to 25 percent because the cross-region telemetry, distributed tracing across many services, and security observability across multiple environments all compound.

For organisations benchmarking against these ranges, the question is not just whether you fall inside the median but whether your ratio is appropriate for your stack. A 15 percent ratio on a monolithic application is high and worth investigating; the same 15 percent on a Kubernetes-heavy multi-region distributed system is within the healthy range. The benchmark frames the conversation rather than dictating the answer.

The under-monitoring risk

What 3 percent or less actually means

Observability spend below 3 percent of cloud infrastructure cost typically signals under-monitoring rather than efficient observability. The implicit costs are real but harder to measure than the visible savings: longer mean-time-to-detection of incidents, longer root-cause investigations, capacity planning by lagging billing data rather than leading utilisation metrics, and security incidents that go undetected until customer or regulator reports them.

The most common under-monitoring symptom is incidents being detected by customer reports rather than alerts. The team learns about outages from social media, support tickets, or status page submissions rather than from monitoring. The mean-time-to-detection in this regime is typically 30 to 120 minutes, compared to 1 to 5 minutes for properly instrumented production systems. The business cost of the additional detection lag is usually substantial, particularly for revenue-critical workloads.

The second under-monitoring symptom is slow root-cause investigation. Without distributed tracing, transaction-level metrics, or comprehensive logs, on-call engineers have to reconstruct what happened from incomplete data. Investigation time stretches from minutes to hours; complex incidents that should resolve in a working day stretch across multiple days. The engineering productivity impact is real but rarely measured directly.

The third under-monitoring symptom is capacity planning by billing data. Teams that do not have leading utilisation metrics are forced to react to capacity issues rather than anticipate them. Auto-scaling thresholds are set conservatively to avoid surprises, which inflates infrastructure cost. Capacity expansions happen reactively, often during incident windows when the team is also dealing with other issues. The infrastructure cost of capacity overhead under reactive planning typically exceeds the observability spend that would prevent it.

The overspend signal

What 20 percent or more typically means

Uncontrolled log volume

The most common cause. A team that has not implemented source-side log filtering routinely produces 10x to 50x the log volume needed for operational purposes. The remedy is the source-side audit described in the log-management cost pages.

Custom metric explosion

Kubernetes-heavy stacks without cardinality discipline produce 10x to 100x the metric series operationally needed. On per-cardinality vendors (Grafana Cloud) the bill compounds directly. On per-host vendors (Datadog) the custom metric overage line item appears.

Premium vendor pricing without ROI

Some workloads land on Datadog or Splunk for non-cost reasons (existing dashboards, team familiarity, vendor relationships) without honestly evaluating whether the premium pricing is justified for the actual operational requirements. The remedy is the vendor-comparison exercise.

The FinOps approach

Treating observability as a FinOps category

The FinOps Foundation framework treats observability as a first-class cost category alongside compute, storage, network, and licence. The standard FinOps practices apply: showback to consuming teams, chargeback when organisationally appropriate, quarterly per-team budget targets, and structured optimisation programmes when teams exceed targets.

The first FinOps practice is showback. Aggregate observability cost by application team or product line, publish a quarterly dashboard showing per-team consumption, and create transparency about who drives the spend. Most organisations discover that 5 to 15 percent of teams produce 60 to 80 percent of observability cost, mirroring the typical pattern in any FinOps category. Once visibility exists, the conversation about cost-management becomes structured rather than ad hoc.

The second FinOps practice is per-team budget targets. Set quarterly budget targets per team based on team size and workload complexity. Teams that come in under target are recognised; teams that exceed target by more than 20 percent face structured optimisation engagement (cardinality audit, log-filtering implementation, vendor renegotiation if at scale). The budget targets create the economic incentive for application teams to manage their own observability cost rather than treating it as someone else's problem.

The third FinOps practice is structured optimisation programmes. When a team exceeds budget, the response is engineering work (audit, filter, sample, drop labels, configure indexing exclusions) rather than budget approval increases. The programmes typically run 1 to 3 quarters and recover 40 to 70 percent of the over-budget spend through structural changes rather than vendor negotiation alone. The combined result is that mature FinOps organisations run observability at 7 to 10 percent of cloud spend rather than the 12 to 18 percent that under-managed organisations report.

Cost reduction levers

Three things to do if your ratio is too high

Source-side log filtering

The single highest-impact lever. Drop DEBUG and INFO logs at the application or log shipper. Reduces log volume by 60 to 80 percent on most workloads, which is typically the largest single observability line item.

Custom metric cardinality audit

Quarterly audit of custom metric series count. Drop the top high-cardinality labels at the agent. Recovers 30 to 70 percent of custom metric cost on Datadog and similar share of active series cost on Grafana Cloud.

APM trace sampling

Sample APM traces at 5 to 10 percent rather than 100 percent. Preserves error-rate accuracy and percentile latency. Saves 90 percent of trace volume cost. Use tail-based sampling at the OpenTelemetry Collector for best operational outcome.

Verify your ratio first

Before spending engineering time on observability cost reduction, calculate your actual ratio. Sum your monthly observability vendor bills (Datadog plus New Relic plus Grafana Cloud plus Splunk plus CloudWatch as applicable). Divide by your monthly total cloud infrastructure cost (compute plus storage plus network). Compare against the benchmarks above. If you are inside the healthy range, optimisation may not be the highest-ROI engineering investment.

Cross-references

/benchmarks

Cost benchmarks by team size

/reduce-monitoring-costs

Twelve cost-reduction strategies

/hidden-costs

Hidden costs that never appear on a pricing page

/aws-monitoring-cost

AWS monitoring cost: native vs third-party

/log-management-cost-1tb

Log management cost for 1 TB/day

/calculator

Multi-vendor cost calculator

/comparison

Six-vendor comparison

/kubernetes-monitoring

Kubernetes monitoring cost mechanics

/methodology

How we research pricing

Frequently asked

What percentage of cloud spend should go to monitoring?

Industry research from CNCF, Gartner, FinOps Foundation, and Flexera puts the median observability spend at 7 to 12 percent of total cloud infrastructure cost in 2026. Healthy ranges vary by stack shape: monolithic applications typically run 4 to 7 percent, microservices and Kubernetes-heavy stacks run 8 to 15 percent, and complex multi-region distributed systems can run 15 to 25 percent. Below 3 percent typically signals under-monitoring (gaps that produce slower incident response and higher business impact). Above 20 percent often signals overspend (uncontrolled log volume, custom metric explosion, or premium vendor pricing without justification).

Why is observability cost growing as a percentage of cloud spend?

Three structural reasons. First, microservices and Kubernetes adoption multiplies the data sources to monitor; a single application that was one VM in 2018 might be 50 pods in 2026, producing 50x the telemetry volume on equivalent business workload. Second, observability vendor pricing models that bill per-host, per-metric, or per-event compound aggressively as cardinality grows; the same workload pays more on the same vendor over time. Third, regulatory and compliance pressure has expanded the scope of monitoring (security telemetry, audit logs, compliance evidence) without proportional increases in operational ROI. Across the industry, observability is taking a growing share of cloud budget despite ongoing optimisation effort.

What is the FinOps Foundation guidance on observability cost?

The FinOps Foundation framework treats observability as a first-class FinOps category with the same showback, chargeback, and optimisation rigour applied to compute and storage. The standard guidance is to allocate observability cost to the application teams that drive the consumption (rather than absorbing centrally), to publish per-team cost dashboards quarterly, and to set per-team observability budget targets. Teams that exceed targets face cardinality-reduction or sampling-discipline projects rather than simple budget approval increases. The framework treats observability cost as a shared engineering responsibility, not an infrastructure team line item.

How do I benchmark my observability spend?

Three benchmarks to anchor against. First, the Flexera State of the Cloud annual report publishes industry-wide observability spend as a percentage of cloud spend (typically 8 to 12 percent in recent reports). Second, the CNCF annual survey publishes Kubernetes-specific observability spending patterns. Third, the FinOps Foundation publishes maturity benchmarks by company size and cloud spend tier. Compare your observability-as-percent-of-cloud against these benchmarks; if you are above the 75th percentile, the cost-management opportunity is real and worth investing 1 to 3 quarters of platform engineering attention in. If you are below the 25th percentile, the under-monitoring risk is real and worth investigating before claiming victory.

Is 15 percent of cloud spend on observability too high?

It depends on the stack. For Kubernetes-heavy microservices stacks with comprehensive APM, distributed tracing, and serious log volume, 15 percent is within the healthy range. For monolithic applications on traditional VMs, 15 percent is high enough to investigate. The most useful question is not the absolute percentage but whether the observability spend is delivering operational value: are incidents being detected faster, are root-cause investigations resolving more quickly, are application teams using the data for capacity planning and performance optimisation? Spending 15 percent on observability that nobody uses is overspend; spending 15 percent on observability that drives meaningful operational outcomes is a healthy investment.

What does under-monitoring at 3 percent look like?

Three common symptoms. First, incidents are detected by customer reports rather than by alerts (the team learns about outages from social media or support tickets, not from monitoring). Second, root-cause investigations take 4 to 24 hours rather than minutes because the necessary data was not captured. Third, capacity planning relies on lagging billing data rather than leading utilisation metrics. Below 3 percent observability spend, one or more of these symptoms is typically present, and the implicit cost (longer outages, slower investigations, capacity surprises) usually exceeds the savings from under-investing in observability.