Parallelizing new cursor pagination

While I’m pretty late to the party, I’ve finally found the time to update an app that is using the API that would have broken with the 2023-10 changes specifically to items_page and pagination.

At first glance it is a nice addition, however upon closer inspection I realized my fairly fast method for getting all the items in a board will no longer work.

Before, I would spin up a worker pool and start handing out page fetch requests to the workers. I could do perhaps 8 simultaneous page fetches, with each page having its own error tracking and retry logic. Any page that came up empty or less than the limit signaled the whole worker queue to finish up or abandon any pages greater than the “last page” that was discovered.

This was fairly elegant and sent minimal junk requests (a few pages past the last could happen). With the new system, it looks like there is absolutely no way to fetch pages in parallel any more, being entirely locked into the serial interface of waiting for the next cursor.

Anyone have any ideas on how to speed this up? Should I just go for much larger pages? Before I would do anywhere from 25 to 60 items per page as more than that seemed to cause increased errors and worse overall time.

Page 1 / 1

first, this is the wrong section of the community, you want this one: monday Apps & Developers - monday Community Forum

one thing can do the first request use items_page then for subsequent requests use next_items_page which skips resolving the whole board, and returns items directly. This is faster than the resolving the board each time.

Depending on size of boards, you could also return just the item ID for each item (returning the limit of 500 items should be fine when its just the ID, low risk of errors there), then spin up worker threads that fetch batches of 100 from those 500 IDs returned with a simple items query (not inside board.items_page). You can fetch the second batch of IDs while the worker threads are fetching the items. At this point since you don’t want an infinite pool of worker threads anyway, your “cost” in performance is the short delay of fetching the first batch of IDs, I suspect getting your second batch of IDs will take less time than getting 100 items will so at this point your queue is unlikely to drain.

The advantage of the second strategy over your current is you also don’t need any of your logic for junk requests or cancelling at the end - you’ll only be fetching valid items, so it may be simpler in that regard.

This is a great idea, thanks. The primary board is not absurdly big, but still fairly large at ~4000 items. I definitely want to try out your second idea, and like that it eliminates the extraneous requests.

I much appreciate you finding this in the wrong section and helping out.

Glad that makes sense to you.

I’d combine both strategies, the next_items_page with just IDs will be faster and also uses less API complexity than items_page so you will hopefully be able to make more requests per minute. (I assume you may run into complexity limits)

Yes that is what I ended up doing, Initial board query spawns off a request for each items_page getting item ids only, marching through the cursors getting 500 item ids at a time. These requests then spawn off batches of items requests with 100 explicit item ids requested in each. All of these requests are fed into a worker pool asynchronously. The cursor requests stay just ahead of the item details requests.

Overall, it is faster than the method I had before, and I think much cleaner as well logically. Thanks for the suggestion!

was this in node.js? Just curious about platform details.

I also couldn’t tell if you used next_items_page to get subsequent pages.

It’s in python, all async/await, no threads. Yes, I’m using next_items_page. First query queries a board_id with info about groups, columns and 500 items via items_page:

boards (ids:i{boards}] ) {{

            name

            id

            columns{{

                id

                title

                type

            }}

            groups {{

                id

                title

            }}

            items_page (limit: {limit}){{

                cursor

                items {{

                    id

                }}

            }}

        }}

Then subsequence requests are spawned off via next_items_page:

next_items_page (limit:{limit}, cursor: "{cursor}") {{

            cursor

            items {{

                id

            }}

        }}

Meanwhile, the item ids retrieved in items_page from the first request, and the next_items_page requests get fed into queries directly to the items endpoint:

items (ids:i{items}], limit:{limit}) {{

            id

            name

            group {{ id }}

            column_values {{

                id

                text

                value

            }}

        }}

Ignore the double curly braces, these are the string templates I made for my queries in python, they are needed to escape the braces since python uses braces for template variables.

Yup, get the {{}} part. That said, I’d research GraphQL variables. You can created a primitive (not template) string, and just pass a variables object along with the query and the server takes care of substitution. Saves you all the heartache of trying to escape and quote things.

But nto sure your use case is big enough to do this since your only variable data is the boards and items, and boards is just a single ID.

FYI right now IDs are numeric strings from the API and can be provided as an Int in the query. But I would not count on this forever, I suspect eventually IDs will be required to be a true string. May as well write with that assumption in mind today.

Reply

Sign up

Login to monday.com

Scanning file for viruses.

This file cannot be downloaded