Update on upcoming improvements to our webhooks system

Hey developers,

Earlier this year, we discussed some upcoming changes to our webhooks system. Today, we’re happy to share with you all an update on our progress, as well as what you can expect in the coming months.

Background

As we mentioned in the original post, one of our goals for redesigning webhooks is to address durability concerns, especially in mitigating catastrophic incidents (e.g., dropping events).

In short, many of our issues stem from our current configuration of Redis as the primary storage for events. As configured, Redis does not provide appropriate durability, which opens the door to potential data loss. Since our current infrastructure complicates deployment of a reconfigured Redis, our needs call for a durable storage replacement solution moving forward.

MemoryDB for event storage

While we’re continuing to stay on track with our goals, we’ve taken on a more expeditious approach to prevent the risk of data loss. Our team has been focused on making impactful repairs and optimizations to our existing infrastructure in an effort to solve durability issues more promptly.

With this approach, rather than migrating Redis to AWS DynamoDB, we’ve moved forward with AWS MemoryDB for Redis. MemoryDB provides a Redis-compatible, durable managed database service for our current architecture. Implementing MemoryDB guarantees that writes are persisted across database node restarts. This allows us to achieve both durability and recoverability guarantees for webhooks.

What to expect

Our team is already in the process of implementing the new storage solution, and we expect the move to be complete in January 2022. As previously mentioned, most changes are under the hood, and won’t require you to change the way your apps are built. Once we achieve durability, we anticipate the risk of catastrophic incidents and data loss should be virtually eliminated.

Again, we appreciate your patience as we work to improve the developer experience. We look forward to sharing more updates with you soon. In the meantime, feel free to share your questions and feedback here with us in this thread!

6 Likes

Very cool, thanks for the detailed info!

Mostly just curious: what’s happening during this coming Saturday’s downtime and is it related to the above?

2 Likes

Hey Phil, just to close the loop: the downtime yesterday was a MySQL upgrade, and was unrelated to our durability goals. Thanks for checking in :+1:

2 Likes

Great work everyone! Thanks for sharing the update. :+1: