A Problem with Webhooks & Events

Hello everyone,

On September 19th, a badly behaving app caused one of our event handling nodes to run out of memory. This memory limit caused the node to start evicting keys in order to keep itself alive. The keys that got evicted included event/webhook subscriptions.

This means some webhooks may have been missing events since that date. This issue caused affected webhooks to silently fail, meaning the webhook would simply stop sending events, and /events streams would return a 412 error. From my understanding, not all webhooks that are failing started falling on Sept 19th, but were affected at a later date.

We’re sorry this happened and we’re working on improving the infrastructure around our webhooks. :frowning:

If you think your webhooks may have been hit, the best way to resolve this is to re-create the webhooks you think may have been affected.

For /events users, you’ll get an error if you’re affected and the solution to this is to get the /events endpoint again without a sync_token.

Again, we’re sorry that this occurred. We know the value of webhooks decreases dramatically if they’re unreliable and you may have got burned by this outage. Feel free to post questions or concerns below or reach out to us at devrel@asana.com if you need additional support.

3 Likes

Thanks for the info, @Ross_Grambo. I guess my hope would be twofold; that your engineers can find a way to make the event-processing infrastructure more robust, but also can make the diagnostics better such that failures like this are caught and alerted to the team when they occur.

1 Like

Happy to say this is already done. They put better monitoring in place and have a way to resolve it without anyone losing events. As long as those monitors work and the on-call engineer is able to react, no one should be affected if this happens again.

2 Likes

Excellent news, thanks!