Question about search api + paging

We plan to use your search api for very specific queries, like “unassigned tasks” or "floating tasks (not in any project).

My question is…

Can I trust the “sort_by” ?
You mention that we can use sort_by parameter, and the default value is “modified_at”.
I tried it, and look at some results I got:
(query 1)
“2019-07-12T06:40:56.938Z”
“2019-07-12T06:40:54.111Z”
“2019-07-12T06:40:51.856Z”
“2019-07-12T06:40:52.185Z” <---- ???
“2019-07-12T06:40:39.872Z”

So, it looks like I can’t use that method for paging.

You also mention that we should use created_at for paging, instead of modified_at.
I tried it with the same data, it looks ok.
(query 2)
“2019-07-12T06:40:54.111Z”
“2019-07-12T06:40:52.550Z”
“2019-07-12T06:40:52.185Z”
“2019-07-12T06:40:51.856Z”
“2019-07-12T06:40:49.754Z”

Is it always “safe” to use the created_at? I tried a few queries, and results looks fine, but I’m not 100% sure they are always.

I really prefer to use the modified_at, as I can compare with my local cache and stop querying when I reach an in-cache task. I can’t do that using created_at ordering.

Did I found a bug, or is it a known issue that we must live with?

1 Like

In short: yes, it is always safe to use created_at to “paginate” in the search API.

The reason there appears to be a discrepancy in the modified_at is because we actually have two “modified at” values per task. There is the raw modified_at field we expose in the API, which reflects the last time the object was updated in the database. Additionally, there’s a “user visible modification time” which reflects the last time the object was updated in a way that would appear as a change to users of the Asana web app. This is because there are certain changes to objects that are not directly visible, and so aren’t useful when web app users want to know what’s been modified recently.

Our search API operates the same way as our web app advanced search feature, and the web app search uses the “user visible modification time” which is not always in sync with the true “modified at” time. What’s happening here is that the search cluster has filtered and ordered the results by one of these timestamp fields, but the API has returned the other timestamp in the JSON object, giving the appearance of out-of-order/inconsistent results.

3 Likes

Do you plan any changes in this mechanism? I want to use modified_at for pagination, because if i would use creaed_at i wil always have to get all the tasks from the scratch - i have thousands of them.
If i can use modified_at for pagination i will always get only changed tasks from the last query date, not all for the projects.

Do you have any news here?

Hi @Nikita_Popov,

No, we do not plan to change this. However, you can both paginate by created_at and filter by modified_at in the same request. This will allow you to perform stable pagination through the search results and see only things that have changed recently.

Ok, it is very sad.
I am not sure that your option will help me - i am still need to get all the tasks. It will not return me the proper order: if i modify any task - it will not move in the order in such way - first sort will be created_at and it will stay in its first order. So, i need to get all the tasks as earlier.

Hi @Joe_Trollo,

IIUC the API generally doesn’t return the “user visible modification time” at all, but only the internal modification time? Has it been considered to add it to the task model?

I think this might make sense as at least in some cases API callers require similar behavior to that of a web client user (not being interested in internal updates but only user visible ones).

Also- I have similar a problem to the one @Nikita_Popov mentions in this thread and having such a field (which matches the search modified_at order) would solve it.

BTW- which type of “modified_at” does the search filter use? Same one as ordering (user visible)?

Please let me know if I understood anything incorrectly…

Thanks!

Hi.
Were your plans about normal pagination through modified_at changed?
It is very uncomfortable to paginate through 50000+ tasks to get the last changed in normal order.

CC’ing @Ross_Grambo here; Joe is no longer with Asana.

1 Like

Hey @Nikita_Popov,

No changes have been made here. I may be out of the loop, but it seems like using events to track which tasks have updated may be a better approach here.

Out of curiosity, what are these? Can you give an example?

cc: @Ross_Grambo as Joe is no longer with Asana.

1 Like

These would be things that are irrelevant for an API user. One example is an item’s rank, which determines how it’s ordered in a given list or board. We consider ranks to be an implementation detail.

An items rank could change when someone drags a task, or a different task is added that causes ranks to be re-shuffled. While the object was not directly edited, it was still modified by internal processes.

To close the loop on this thread, I think Joe’s suggestion of sorting by created_at and filtering on modified_at is the right approach here. Here’s some pseudo code that hopefully gives the right impression:

last_modified_at = global.previous_sync;
global.previous_sync = Date.now(); // Reset sync before waiting on requests
pagination_created_at = Date.now(); // Start paging from now

// Paginate over created_at (using the last item's created_at)
while(pagination_created_at != null) {
  modified_items = request("/search?modified_at.after="+last_modified_at+"&created_at.before="+pagination_created_at+"&sort_by=created_at");
  for(items in modified_items) {
    ...
  }

  // If there no items left, we're done paging
  if (modified_items.length == 0) {
    pagination_created_at = null;
  } else {
    pagination_created_at = modified_items[modified_items.length - 1].created_at;
  }
}

// To get unmodified items, use the same above logic but with: 
// "modified_at.before="+last_modified_at

Hope this is helpful to anyone who wasn’t sure how to page on created_at!

Sorting by created_at and filtering by modified_at is not strict and not stable.
You can get the first 100 on these conditions and process them, while processing somebody will change another task (not in the current list), that has created_at value earlier than any of this list.
So, what you will get?
The first task on these conditions will be the new one and all pagination will be moved on one. So, when you will get the second page - you will have one task, that you have already processed before. Not good.
Sorting on modified_at is the most logic, the most comfortable and useful thing that all the services have. Why you can not do this or do not want to do - i am not understand.

This would not cause an issue here, as you’re filtering by modified_at.after={a snapshot in time}. If someone edits a task, that task will be moved to modified_at.before={a snapshot in time}. You can then do a modified_at.before at the end and de_dupe/catch those live edits if you want to.

The ideal way to manually page is to ensure your filtering windows are small enough and your results are never over 100 items in the first place.

We do allow for sorting on modified_at and you’re allowed to use it if you want to (note that the sorting uses user_visible_modification_time timestamp). We’re simply saying that we don’t recommend you sort by modified_at time as it’s difficult to get stable results when people live-edit mid pagination.

I think the question is more: What API functionality do you expect if something is changed mid pagination? Different APIs handle this different ways. We haven’t done anything special to handle this use case, but I’d love to hear if you have a suggestion for what that should look like.