Missing events and no "sync_error" recieved on that webhook endpoint.

Confirmed @JFrentz, I’m seeing webhook issues in my app as well. I think they may be coming through but very delayed, not sure. I did one test last night - I got the event but 5 hours late. I did two more tests this morning, have not received the events yet, will see if I get them delayed as well.

cc: @Jeff_Schneider @Matt_Bramlage @Ross_Grambo

1 Like

Thanks for confirmation @Phil_Seeman :raised_hands:
We had a 4h delay this morning. Delay has been shorter and shorter during the day, just recorded a 45 min delay on some event. Impact organization’s performance for the day.

Hey @JFrentz,

I just ran a few tests and did not see any delays - how is your current experience?

1 Like

Looking good so far this morning. :+1:

2 Likes

Hi folks,

My name is Kem and I’m the newest member of the Developer Relations team here at Asana. I’m writing to provide some additional information on the webhooks issue from yesterday. As some of you have noticed, we experienced delays yesterday in sending out webhook notifications. In some cases, these delays were significant. The issue was resolved, and the webhooks started firing on a timely fashion again, by about 8:30 am PDT on Thursday.

First, I want to acknowledge and sincerely apologize for the disruptions incidents like this cause for your apps and users. Rest assured that we take reliability of webhooks very seriously, and working diligently to resolve the underlying issues. Please see this post for more information about what we’re doing in this regard.

Second, I want to make a commitment to providing regular updates as we hit major milestones in this remediation effort. We value your feedback in this process, so please feel free to drop a comment with additional perspective on how these issues have been affecting you.

Finally, I want to address the fact that our status page (status.asana.com) showed no disruptions to service while these delays took place. This was simply because no system experienced an outage: our jobs queue, which works on tasks in the background to keep Asana responsive and is shared by many Asana systems including webhooks, got backed up. So while it was processing jobs, we had delays on some jobs like webhooks. We’ve identified the root cause, and are working with our infrastructure teams to mitigate this going forward. That being said, we recognize that we can do better to inform you about the status of webhooks – down, delayed, or operating as expected – and we’re looking into ways we might be able to do just that on status.asana.com.

Thank you for being a valued part of our developer community. We appreciate your patience as we work to empower your apps with top-notch webhooks performance, and welcome all feedback in the meantime.

5 Likes

Hi @Kem_Ozbek,

Welcome and thanks for this message! This is great to hear, on all of the aspects you address…

I know you’ve already heard some of my input on the impact these webhook outages and delays have on my Flowsana app and its customers, so I won’t beat that into the ground here. :slight_smile:

It will be really valuable to be kept informed about the progress of the webhook rebuild process; much appreciated for that commitment.

And I know you’ve already heard my feelings about status.asana.com, so it’s good to know you’re looking at how to provide info on delays. I get and appreciate it’s not simple since delays don’t generate an actual error condition (at what point of event backup do you call it an issue worth reporting on the status page - 5 minutes, an hour, …?), but it’s good to know you’re looking into how you might provide some feedback there. Again, thanks!

1 Like

Hi! We still had some delayed events last Friday 2021-05-27T19:00:00Z. It strongly affects our working processes cause lags makes from hours to days.

Hi and welcome @Kem_Ozbek!
We don’t receive any webhooks this morning. 1 hour now. Nothing on status.asana.com

2 Likes

Still, 0 webhooks are sent to us.
Issue confirmed by Asana Support. Dev team troubleshooting.
Nothing is shown on the status page.

We are up and running again, ~2,5h downtime today.

1 Like

@JFrentz,

This is a tough question to know the answer to, since you don’t really know what you may have missed, but can you tell if you lost events during the outage, or if you’re receiving all the events but delayed?

1 Like

Just dropping in to confirm that we had delays with event distribution early this morning. The issue was resolved and webhooks returned to normal latencies for event delivery by about 6:25 am PDT.

I’m afraid I don’t have much more to add beyond that, but I will reiterate the commitments from my earlier post, including being clearer about latency issues on status.asana.com. Please stay tuned; we greatly appreciate your patience.

3 Likes

Today we experienced two periods of webhook silence.

UTC +2

  1. 07.04 - 07.35
  2. 08.25 - 09.16

All event seems to have been delivered since. No information on the status page.

Hi @JFrentz,

I had a delay issue day before yesterday to yesterday, Asana engineers are looking into that. Did you see anything over the past couple of days?

Not surprised to hear of more delays today.

1 Like

@Kem_Ozbek & @Matt_Bramlage
Just wake up to a majority of Sync_errors instead of successful webhooks this morning.


Something seems very broken for the past 8 hours, since midnight CET.
Nothing on the status page. What’s up? :frowning:

Sync_errors continue to come in in significantly higher volume than before. We now get sync_errors where we never got sync_errors before and the total amount for the past 8 hours is more then the total of sync_errors for the past 8 months.

1 Like

The sync_errors seems to have return to normal since 09.30 CET. The flood of sync_errors occurred for us ~9h 30min.

29,6% of all events from the past 12h have been sync_erros. How can we retrieve and process them now? How can we make sure none of these events cause operational issues for our clients?

1 Like

I’m also facing sync_error. Most part of webhooks do not work properly

1 Like

@JFrentz sorry I meant to message you about this a few weeks ago but forgot. I’m not sure the following is relevant to your current situation, but FYI, I had a conversation with a member of the Asana API team recently and discovered that the majority of sync errors aren’t actually sync errors. If a webhook lies dormant - doesn’t get triggered - for a period of 2 to 3 days, it can then randomly send a spurious webhook that’s labeled as a “sync error” but is really just an extra bogus webhook that can be ignored.

There can also be valid sync errors which can occur when the webhook cache (which as they’ve announced the API team is in the process of rewriting to be more robust) fails. So it’s possible that’s what you just got hit with. But these true sync errors are actually a lot less likely to occur than the “non sync error sync error” I described above. (The API team plans to fix the bogus sync error issue in the course of their webhook rewrite work.)

1 Like

Hey everyone, my name is Andrew and I’m part of the Developer Relations team here at Asana. Just wanted to share a quick update on the incident from last week.

Last Friday morning, some users reported sync errors affecting their webhooks. A sync error is caused at the event delivery stage when we know there are events that we need to deliver to your webhook target, but for some reason we can’t find the events we want to deliver on that webhook’s cursor.

Our team looked into the incident, and pinpointed the source to a specific deployment made earlier that day. In particular, this deployment included an unintended side effect where the cursor ID that we read events from for delivery did not sync with the ID that we wrote events to . Our team rolled back the deployment early Friday afternoon, but is still currently still investigating the issue.

I sincerely apologize for the disruptions that this incident has caused. As I receive the full report from the team (including impact and actions for further mitigation), I’ll be sure to share it with you all. Thank you for your patience, and as always, feel free to continue sharing your comments and feedback here in this thread.

3 Likes

Thank you for your patience everyone. Just swinging by again to follow up on my post from last week.

During the initial investigation, our team found that a small percentage of our webhook delivery calls gave sync errors during the incident. This meant that events weren’t delivered properly and were otherwise “missing.” Note that since we had rolled back the deployment that Friday afternoon, it’s possible that you may have seen the missing events in the next sync of each webhook.

Our team is currently having a retrospective on the incident, with the goal of creating action items and process changes to:

  1. Identify the issue sooner (and rollback any problems faster)
  2. Prevent issues like this from happening in the first place

Again, thank you everyone for your patience in all this! Please keep sharing your experiences as we work to optimize our webhooks’ durability and performance.