Has anyone set up an ETL from Asana to a data warehouse?

I’m looking to set up a method to extract all domain data from Asana and load it into an external data warehouse. I would want all users, projects, portfolios, tasks, comments, messages, and goals. I would also want all field values (don’t need the gid for enum options, just the value) for all objects. I do not need synchronous data, but do want to update it at least 1x/day.

So far, I’m thinking about two options, but curious if I am missing something or if anyone has experience setting this up:

1. Use Domain Export

I have thought about setting up a service to request, await, and then read/transform data from the domain export. My main issues with this are:

  1. Can’t send a datetime with the request to filter for new info, so I’d be processing a large file every time. This isn’t so much a concern on the transform/load side, but as our organization continues to grow, I assume the generation of the file will get slower (it’s already taking ~1-2 hrs).
  2. It seems like there’s a cooldown period for requesting the export. I’ve only done it manually in-platform so far, but it locks me from requesting again for 5 days.
  3. The data is a bit messy and I haven’t found any documentation about how to read the data (I’ve figured out some of it, but the model is different compared to what I’m used to when calling the API).

2. Listening to event stream

I’ve also thought about calling the event endpoint on our workspace once a day and just processing all the day’s events. This seems like the most efficient option (i.e., I’d do one domain export, transform/load all that data into my warehouse, and then call the event endpoint daily and process individual events that way going forward).

That said, I’m worried that I won’t get all the data I want (from my first paragraph) or I’ll have to do a lot of work to maintain this (vs. just scripting the processing of the export). E.g., will I have to listen for changes on the workspace, then take certain of those changes (e.g., project added, goal added, etc.) and chain together other calls to get events on those resources. I do see this in the documentation:

For example, a subscription to a project will contain events for tasks contained within the project.

I’m unclear if that means that I can just set up one listener for the workspace and have it return all events on any resource contained in that workspace (e.g., projects, their tasks, subtasks, sub-subtasks, etc.).

Thanks in advance for the guidance!

1 Like