"Sync_error" occurrences

Phil_Seeman · 20 September 2019 16:33

Hi guys,

About once a day (though sometimes I can go a few days), my Flowsana integration receives a webhook whose type is “sync_error”, with the message, “There was an error with the event queue, which may have resulted in missed events. If you are keeping resources in sync, you may need to manually re-fetch them.”

Granted Flowsana receives many thousands of webhooks a day so this is a pretty small error rate, but still, if it means someone’s workflow or rule does not work properly as a result, it’s a potential issue.

I can tell from the webhook which Flowsana user account had the error, but nothing further that I can see, and it’s not practical at that point for me to query every project and task for that account to try and figure out what might have been missed.

I’m wondering if you have any insights into what causes these sync errors and/or how we might mitigate the situation when it occurs. Thanks!

@Joe_Trollo @Matt_Bramlage @Jeff_Schneider @Ross_Grambo

Joe_Trollo · 19 February 2020 22:22

Hi Phil,

After adding logging and digging through deeper parts of the code, I can report that the source of sync errors is, unfortunately, an unavoidable occurrence: a Redis node dying. Webhooks (and event streams) are built on a multi-step, asynchronous pipeline that pushes data from where a user makes a change to where we send the webhook to your server. This pipeline involves piping temporary data about the events into and out of a Redis store. It’s unavoidable that these Redis nodes sometimes die, and when that happens they take the enqueued events with them.

It’s possible for us to change the behavior of webhook event delivery from “at most once” to “at least once” but it would involve a massive rearchitecting and reimplementation of our pipeline, which we’re unable to do at this time. However, I’m following up with other teams at Asana to see if there’s anything we can do to make these issues occur less frequently.

A recommendation I can make to help make it easier to recover: give each subscription its own URL/query parameters so that you can distinguish between different projects in the same account, e.g., receive webhooks at /webhooks/<account-id>/<resource-id> or /webhooks/<account-id>?resource=<resource-id>.

Phil_Seeman · 20 February 2020 00:15

Thanks for the info, @Joe_Trollo - not the best of news but good to know the scoop.

Follow-up question:
Does this also apply to the Events API - i.e. if one were to use that instead of (or in addition to) webhooks, would that also potentially be missing the same events; or is this a webhooks-only phenomenon?

Joe_Trollo · 20 February 2020 00:25

This applies to event streams too—you’ll also encounter sync_error events there. The difference is that events wait in Redis until they’re fetched rather than being pulled from Redis and delivered promptly. When Redis dies, the event stream is entirely lost and we stop accumulating events for it. This means that you’re likely to lose more information from an event stream when this happens. (There are likely more events to be waiting in the queue, and we won’t collect events between when Redis goes down and when you make the next request.)

Phil_Seeman · 20 February 2020 06:31

I’m going to do this. Not sure at the moment what I’ll do with that additional info, but we’ll see!

Much appreciated, that would be awesome!

I understand this is not something you can do right away, but I would think it’d be something for the API team’s pipeline? Those of us who rely on webhooks in our apps would really like to be able to count on them to be complete and accurate…

Joe_Trollo · 24 February 2020 20:14

That sort of redesign isn’t currently on our roadmap. We believe that we can get dramatically better reliability with significantly smaller investments in our infrastructure. For example, writing the events to two Redis nodes instead of one would square the probability of failure/event loss. If the failure rate of a single node is 1 in 10,000 now, the failure rate of two nodes would be 1 in 100,000,000.

Phil_Seeman · 25 February 2020 16:36

That sounds great. I’m not partial to any particular fixes, just good to know you’ll be taking some action(s) to improve it!

Phil_Seeman · 27 April 2020 13:42

Hi @Joe_Trollo @Ross_Grambo,

Not sure if it means anything in particular but I’ve been getting a lot (compared to usual) of sync errors over the past 48 hours or so.

Have you had an opportunity to make any of those infrastructure changes yet?

Phil_Seeman · 22 June 2020 13:00

Hi @Joe_Trollo, @Ross_Grambo,

Wondering if there’s been any progress on implementing these changes re. sync errors, or what the status is? I continue to get sync errors in my Flowsana webhooks daily (sometimes once a day, sometimes more than once, but pretty much every day).

Thanks for any updates you might have to provide!

Ross_Grambo · 24 June 2020 22:04

I don’t have anything to report yet. We’re looking into the cause of daily failures.

Phil_Seeman · 14 July 2020 14:27

Hi @Ross_Grambo and @Joe_Trollo,

I’ve gotten a relatively higher number of sync errors of late: 14 over the past 24 hours as of this writing. Just wanted to mention it in case it helps in your sync_error investigation.

Ross_Grambo · 14 July 2020 17:21

Hey Phil,

The user that caused it cycled their PAT and caused it again. We did a user ban this time. The API team is planning a “fix” for this, where they cap out an event stream and either trash the stream if it gets too big or stop appending events to the queue. Both don’t sound like great options, but they’re better than more outages!

Phil_Seeman · 14 July 2020 17:35

@Ross_Grambo
Wow! Guess you weren’t expecting THAT to happen.

But unless I’m missing something, I don’t think my sync errors are related to yesterday’s outage. I say that because (1) during the outage I was getting NO webhooks, valid or sync_error; and (2) I’ve gotten 17 sync_errors today alone, well after the restoration of the outage.

Ross_Grambo · 14 July 2020 18:32

Oof! Sorry, webhooks were top of mind so I assumed this was related.

I’m checking with the API team now.

Ross_Grambo · 14 July 2020 19:04

Looks like this was not on their roadmap. They just added an investigation into it with the double encoding events as the frontrunning solution.

Phil_Seeman · 10 August 2020 11:20

Hi @Ross_Grambo,

FYI my Flowsana app has gotten 21 sync-errors between 7:59 PM EST last night (8/9/20) and 6:34 AM today so far (no particular reason to think there won’t be more coming shortly).

Wondering if there has been any progress on the sync_error issue? It’s pretty critical for those of us who have apps built on top of the Asana webhook platform. Thanks!

Phil_Seeman · 10 August 2020 20:43

Hi @Ross_Grambo,

FYI still continuing to see lots more sync_errors than usual - 12 more today since my posting above, and counting.

Ross_Grambo · 11 August 2020 17:23

I pinged API oncall to take a look. Sorry for the delay!

Could I get flowsana’s app id/any other app ids having this issue?

Phil_Seeman · 11 August 2020 17:45

Thanks, @Ross_Grambo - I’ll DM you the App Id.

Phil_Seeman · 6 September 2020 17:19

Hi @Ross_Grambo,

FYI Flowsana is getting another raft of sync_errors today. It started at 6:54 am Pacific time. I’ve gotten 16 so far and still coming in, it seems. Let me know if I can provide any other info that might be helpful.

Topic		Replies	Views
Webhook sync_error Developers & API projects , asana , api , webhooks	2	1406	25 March 2025
Lots of webhook "sync_errors" occurring Developers & API	8	1687	30 May 2025
Missing events and no "sync_error" recieved on that webhook endpoint. Developers & API	40	4418	30 May 2025
Asana Webhooks always returning a 'sync_error' response. Developers & API webhooks	3	865	30 May 2025
Getting never ending sync_errors as action in webhooks Developers & API	13	1806	30 May 2025

"Sync_error" occurrences

Related topics