Analyze mean time to repair and incident metrics with P95 calculations. Part of the DevTools Surf developer suite. Browse more tools in the Developer Utilities collection.
Use Cases
Analyze incident history to calculate MTTR, MTTF (Mean Time to Failure), and MTTD (Mean Time to Detect) from raw data.
Identify trends in incident recovery time by team, service, or incident type.
Calculate SLA compliance percentages based on MTTR data against agreed response targets.
Generate an incident metrics report for post-mortem or quarterly engineering review.
Tips
Track MTTR by incident severity separately — P0/P1 MTTR and P3/P4 MTTR behave differently and combining them masks actionable trends.
Include detection time in MTTR — detection latency is often 50-70% of total MTTR and is frequently overlooked in incident reviews.
Calculate P95 repair time alongside mean — a single 72-hour incident can double the mean while the median (P50) stays healthy.
Fun Facts
MTTR (Mean Time to Repair or Recover) was originally an aerospace maintenance metric from the 1950s, defined in MIL-HDBK-217 for military equipment reliability analysis.
The 2022 State of DevOps Report (DORA) found that elite-performing engineering teams achieve MTTR of less than 1 hour for production incidents, while low-performers take 1-6 months — a 200x+ difference in the same metric.
MTTR is one of the four DORA metrics (along with deployment frequency, lead time for changes, and change failure rate) that have been empirically validated to correlate with organizational software delivery performance and business outcomes.
FAQ
What's the difference between MTTR and MTTF?
MTTR (repair) is the average time to restore a system after failure. MTTF (failure) is the average time between failures. MTBF (between failures) = MTTF + MTTR. Use MTTF for reliability, MTTR for operability.
Does it include detection time in MTTR?
Yes — the analyzer breaks MTTR into detection time (alert → acknowledgment), diagnosis time, and remediation time. The breakdown shows where most MTTR is spent.
What input format does it accept?
CSV or JSON with incident start/end timestamps and optional severity/service fields. Supports ISO 8601 and Unix timestamp inputs, auto-detected on paste.