Schema tutorialLesson 28: Updating large sets of data
Lesson 28: Updating large sets of data
Sometimes we need to update thousands of resources in a single action, as expressed in the following comment (posted on a community group about WordPress):
I find that for a lot of clients I'm working with large sets of data (10,000+ product variations for 1 product, or 13,000+ media files) ... inevitably the clients want to be able to bulk edit lots of things at once - like tag 2000 media files with the same tag.
In this tutorial lesson we will explore ways to tackle this task.
If updating thousands of resources at once makes the system crash, the solution is simple: Instead of executing the GraphQL just once for thousands of resources, we can execute it hundreds of times for dozens of resources each time.
The following bash scripts first finds out the total number of comments via commentCount, then calculates the segments considering env var $ENTRIES_TO_PROCESS, and calculates the pagination parameters and calls the GraphQL query for each segment (simply retrieving the comments from that segment):
Because the solution above involves bash scripting, it must be executed via the CLI (or some admin panel or tool), limiting its use.
We can replicate the same logic into the GraphQL query itself, thus allowing us to execute it already within WordPress (even already storing it as a GraphQL Persisted Query).
The GraphQL query below executes itself recursively. When first invoked, it:
Divides the total number of resources to update into segments (calculated using the provided $limit variable)
Executes itself via a new HTTP request for each of the segments (passing over the corresponding $offset as a variable), thus updating only a subset of all resources at a given time
The GraphQL query is recursive by having the HTTP requests point to the same URL as the current one (plus adding the $offset variable for that segment), for which we retrieve the URL (and also the body, method and headers) from the current HTTP request (via the HTTP Request via Schema extension).
The $async argument passed to _sendHTTPRequests has been set to false, so that the HTTP requests will be executed one after the other. In addition, the optional variable $delay allows to indicate how many milliseconds to delay before sending each request.
Once all resources have been updated, the execution of the GraphQL query reaches the end and terminates: