Checkpoints
Checkpoints are a simple yet powerful way of storing state between queries in a stage, or between multiple queries.
Usage
Setting a checkpoint
... --checkpoint "api_next_page:result.nextPage"
Using a checkpoint
$CHECKPOINTS.{CHECKPOINT NAME}$
You can access any checkpoint value using the above token format. This is used with setting checkpoints using the --checkpoint
flag.
api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$
Use cases
Scheduled Queries
Checkpoints are useful for avoiding duplicate work in scheduled queries. A scheduled query can store a value from a stage's results as a checkpoint, that will then be used in the next scheduled run of the query. Examples of checkpoint values could include last visited page, timestamp, hashes, etc.
Example:
api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$ --checkpoint "api_next_page:result.nextPage"
The first time the query runs, the nextPage
value will be empty, however the checkpoint flag will set a checkpoint named api_next_page
to the value of the result.nextPage
column in the first row of results. Next time the query runs, the checkpoint will be retrieved using $CHECKPOINTS.api_next_page$
and nextPage
will use the checkpoint value.
Between Stages
Although there are ways to maintain state across stages (see --enrich
, --labelStage
, --appendStage
, etc.), checkpoints provide a simple way to set a value from a stage's results, and access that value in a later stage.
Example:
api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$ --checkpoint "api_next_page:result.nextPage"
|| ...
|| ...
|| addcolumn api_next_page $CHECKPOINTS.api_next_page$