No integration runs at 100% uptime. What matters when an outage hits is detection, communication and recovery; the rest is largely automated.
The most expensive thing during an outage is not panic; it is the question “who is going to fix this?” A runbook answers it in advance.
4-hour runbook
- 00:00 — Incident detected (alarm or manual).
- 00:05 — On-call acknowledged, checked panel.
- 00:15 — Customer impact estimated.
- 01:00 — Temporary workaround in place (manual mode).
- 04:00 — Permanent fix plan documented.
Manual mode
If webhooks fail, the Buaze panel still works. For critical alerts, switch to a temporary manual mode: the team checks the panel hourly for low ratings from the last 4 hours. This bridges the gap until automation returns.
Communication
Customer-facing outage communication is rarely needed since Buaze talks to operators, not end customers. But for internal alignment, a status page or Slack channel removes the “who knows what” problem.
The cost of an outage is not measured in minutes; it is measured in the anxiety that comes from not knowing what is and is not working.
Postmortem
Within 24 hours of resolution, write a short postmortem. The discipline in the incident response guide applies: root cause, impact, action taken, prevention plan.
Kontrol listesi / Checklist
- On-call rotation written.
- Manual mode procedure exists.
- Status communication channel set.
- 4-hour runbook tested.
- Postmortem template ready.
- Alarm thresholds reviewed annually.
- Backup webhook destination (failover) defined.
SSS / FAQ
Is the outage on Buaze side or mine?
Buaze status page or support is the first check. If Buaze is up, check your endpoint; if no payload reaches your logs, the issue is on your side.
Can I retrieve missed events later?
The webhook retry mechanism re-tries for a defined window. Beyond that window, recover via CSV export.
Can customers still leave reviews during an outage?
The QR flow and the panel are separate layers. Side flows like webhooks can fail while review collection continues.
Do multiple webhook destinations all fail together?
Usually not. Each destination has its own delivery flow; one failing does not affect the others.