Hi all,
I'm implementing a data sync thing using Zendesk's Incremental Export APIs (specifically the cursor-based version), and I want to confirm that the approach I'm using is sound. I'm primarily interested in making the incremental sync recoverable. So, if everything blows up in the middle of paging over incremental requests and I lose a valid cursors, I want to be able to fallback to timestamp. My idea is to use the timestamp right before a successful end_of_stream response as a recovery point.
Here's the pattern I'm thinking:
Initial Full Sync:
Before fetching any data at all, I record a start_time timestamp. I then perform a full sync using the standard REST APIs to pull all existing records.
I record the timestamp before any requests to catch anything that may have changed during the full sync.
First incremental sync or recovery from timestamp (no cursor):
I start the first incremental sync using either last_recovery_time or start_time, and loop through the results until end_of_stream.
If no records are returned: I can retry using last_recovery_time or start_time (the API doesn’t return a cursor in this case).
If there’s an error: I can retry from the same timestamp.
If successful: I store the after_cursor and record a valid_recovery_time just before the successful end_of_stream response.
With cursor:
I use the stored cursor to continue syncing until end_of_stream.
On failure: I discard the cursor and can fall back to valid_recovery_time, or start_time if no recovery time is available.
On success: I store the new after_cursor and record a valid_recovery_time just before the successful end_of_stream response.
I will probably buffer a few extra minutes into the timestamps especially in the case of an error recovery.
Is this a correct way to recover? Are there any pitfalls or edge cases I should be aware of?
Thanks!
It does sound like you have solid understanding of the Incremental Exports API with cursor based pagination.
However, we do not recommend continuously syncing with the incremental exports API. The intended behavior is: initial request made, all data from start time to one minute from request time is grabbed, then export that data accordingly. Then periodically or when needed, make a call for more data.
Full process: Make the initial call with start_time > receive first page with 1000 results > record after_cursor value > check returned end_stream_value > if false make call with cursor/if true end loop and return
As for error handling, there also isn't a way to buffer necessarily meaning if the start_time you provide is 10/13/22 10:03:00AM and you're making that call on 4/7/25 11:06:32AM but an update was made at 11:06:00AM, you will not get that update. For both tickets and ticket events, no updates within one minute of request time will be returned. This behavior is intentional just to prevent race conditions.
If there is an issue when exporting I would recommend just using the last after cursor value. Otherwise, as you said you will have to look at the data you have successfully exported, record the time value of the last result received, then make an entirely new initial request with that as your start_time.
If you're really worried about having an issue in the middle of that 1000 results per page, you could also change how many results you receive per page. 1000 is just the default and max.