The goal is calm recovery

When something breaks, the product needs a clear path: protect users, identify the failing layer, contain the risk, ship the fix, and verify production behavior. Heroics are less useful than a boring checklist that works.

Incident playbook

1
Confirm impact
Check whether the issue affects public pages, dashboard access, connector sync, recommendations, or account security.
2
Contain risk
Roll back, disable the risky route, pause a sync path, or add a temporary guard if needed.
3
Fix the cause
Patch the smallest safe surface and keep unrelated refactors out of the incident.
4
Verify production
Check the affected path in production or preview before declaring the incident resolved.

After the fix

The follow-up matters: add a test if the issue can recur, update the runbook if the response path changed, and record durable decisions in the knowledge base. A fix that teaches the system is better than a fix that only teaches the person who shipped it.

Observability Testing and releases

When something breaks, the path should be boring.

The goal is calm recovery

Incident playbook

Confirm impact

Contain risk

Fix the cause

Verify production

After the fix

Next