Getting never ending sync_errors as action in webhooks

Hi,

I want to use the asana webhook to get events of tasks. We have custom fields on tasks and calculate a priority score with these fields.
Because I had to work on something else I only created the webhooks (one for each project). After a while I looked into kafka (where we store the events for processing) and we only got the following payloads:

{
   "type": "RECEIVED_EVENT",
   "gid": "1388840432179486250",
   "data": {
      "events": [
         {
            "action": "sync_error",
            "message": "There was an error with the event queue, which may have resulted in missed events. If you are keeping resources in sync, you may need to manually re-fetch them.",
            "created_at": "2020-06-29T11:16:02.066Z"
         }
      ]
   }
}

The only threads I found to this topic said that they only received some sync_errors per day (e.g. “Sync_error” occurrences). I observed the events over a few days, maybe 1-2 weeks and as soon as we received the first sync_error, the only thing we received afterwards where sync_errors (as far as I can tell). After these few days we gathered multiple thousands of sync_errors in kafka with occasional normal payloads, probably from some webhooks that were still working). The only solution to this was to recreate these webhooks.
The only thread I could find that resembles my case is this one Webhooks are constantaly returning “sync_error” but it doesn’t look like it got resolved.

I wanted to ask whether this problem only occurred for me (and this one thread) and what I could do about it except recreating the webhooks. Also what would be the best way to recover the events that got lost in this time if there is a way? Thank you in advance!

Maybe you can check if there was some kind of issue when you created the hook? https://status.asana.com/

Oh I wasn’t aware of this page. That could very well be the reason. I will monitor the webhooks for a few days and mark this thread as solved if this won’t happen again.
Thank you for your time and quick response!

1 Like

I’ve gotten more sync_errors than usual in the past day or so, not related to the outage (I don’t think). But I get them along with many, many valid webhook events - you’re saying you’re ONLY getting sync_errors and not getting ANY valid events?

Sorry, the screenshot in the last post was a little buggy, this one is still buggy (I’m using Nimbus Screenshot Tool for this extended screenshot, but its a little off). This is a screenshot of one page in kafdrop (100 entries) and I could go on and on and it all looks like this:


But since yesterday there where no new sync_errors.

Okay damn. I thought the issue was resolved but since yesterday we only got sync_errors again. After quickly looking through the newest 700 messages they all seem to be sync_errors.
Here is a screenshot of the newest page:

Does anybody has a similar experience? I’m not quite what I should do now. I could implement that if a webhook gets a sync_error it recreates the webhook of the resource where the sync_error came from, but I think I would loose quite a few messages in this time. Is there anybody who is aware of this problem? Or am I doing something wrong? I could give you an explanation how I setup the webhooks but I then I could just copy the documentation :sweat_smile:

Hi @Ross_Grambo, any ideas?

1 Like

So since last Friday there hasn’t changed anything. I didn’t recreate the webhooks and the last 100 messages where all sync_errors again. I didn’t bother looking into the last few pages because they probably look the same. Does someone have an idea what could be causing this?

Hmm… Perhaps your event queue is filling up? When you request events, do you request all of them and/or is the has_more field set?

Oh, I just noticed. We currently have the “old version” of the script running which utilizes streams. If we oversubscribe a resource with streams will the webhook also be broken? Because than this is probably the issue :sweat_smile:

What does the has_more field do?

Starting a stream means we will prepare data in a queue for you. Everytime you request data from our api (either from /events or in our client libraries, we auto request more data once you loop to the end of a stream I believe) we return up to 100 records. The “has_more” field means there were more than 100 in the queue waiting to be delivered. Meaning you should either hit /events again or iterate through the stream fully.

It shouldn’t break unless you don’t consume the data in the queue. If you’re only using webhooks (and not the /events endpoint), you just need to keep responding 200 to our requests.