Webhooks/Events Outage 2021-02-01

anon62031496 · 2 February 2021 00:50

Hey folks,
This is Amit from the API team. We wanted to share that our webhooks and event stream infrastructure suffered a failure around 2021-02-01 6:00 am UTC. One of our Redis nodes used for storying events for webhooks was marked as dead/unreachable by our monitoring infrastructure and replaced, as a result, you might be missing about 9 hours of events from the events and webhooks endpoints, including any events not fetched previously. We recommend fetching the latest version of resources you subscribe to on a regular interval to minimize impact.

This error manifests itself as “sync_errors” to the clients. Though there is a gap in events, the webhook subscriptions themselves were not impacted. By 3:00 pm, the webhooks and events infrastructure had recovered. This was caused by the same failure that triggered our previous webhooks incident (a Redis node failing its health check) but we’re still investigating the root cause.

In order to address these issues we are taking a few steps:

A new Webhooks and Events infrastructure

In November 2020, following our previous incident, we started work on a new Webhooks and Events Infrastructure that would address the fundamental concerns of the durability of the events. We have a dedicated team that is working on this project and the stability of this infrastructure is of utmost importance to us. We know how important it is for developers and applications to have stable events and webhooks infrastructure.

Increase the Visibility and Observability

We have added Webhooks and Event Streams to our service status page. Internally we have added additional alerts to catch performance and stability related issues. While these alerts and monitors do not avoid incidents like this, they allow us to pinpoint key areas to focus in the new infrastructure.

While it gives us no joy to report on outage news, we’ll strive to maintain transparency and we’ll continue to invest in bringing to you a new and better infrastructure for webhooks.

Topic		Replies	Views
Latest on webhook improvements Platform News	1	2189	23 February 2022
A Problem with Webhooks & Events Platform News	3	2767	16 October 2020
Webhooks and Events: Possible Downtime on August 27th Platform News	5	1065	26 August 2023
Incident affecting events and webhooks on March 7th Developers & API	10	490	8 March 2024
Webhooks: Scheduled Downtime in December Platform News	31	4098	23 December 2021

Webhooks/Events Outage 2021-02-01

A new Webhooks and Events infrastructure

Increase the Visibility and Observability

Related topics