Skip to main content

Checkpoints

Checkpoints are a simple yet powerful way of storing state between queries in a stage, or between multiple queries.

Usage

Setting a checkpoint

... --checkpoint "api_next_page:result.nextPage"

Using a checkpoint

$CHECKPOINTS.{CHECKPOINT NAME}$

You can access any checkpoint value using the above token format. This is used with setting checkpoints using the --checkpoint flag.

api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$

Use cases

Scheduled Queries

Checkpoints are useful for avoiding duplicate work in scheduled queries. A scheduled query can store a value from a stage's results as a checkpoint, that will then be used in the next scheduled run of the query. Examples of checkpoint values could include last visited page, timestamp, hashes, etc.

Example:

api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$ --checkpoint "api_next_page:result.nextPage"

The first time the query runs, the nextPage value will be empty, however the checkpoint flag will set a checkpoint named api_next_page to the value of the result.nextPage column in the first row of results. Next time the query runs, the checkpoint will be retrieved using $CHECKPOINTS.api_next_page$ and nextPage will use the checkpoint value.

Between Stages

Although there are ways to maintain state across stages (see --enrich, --labelStage, --appendStage, etc.), checkpoints provide a simple way to set a value from a stage's results, and access that value in a later stage.

Example:

api get https://www.{AUTHENTICATED-API}.com?nextPage=$CHECKPOINTS.api_next_page$ --checkpoint "api_next_page:result.nextPage"
|| ...
|| ...
|| addcolumn api_next_page $CHECKPOINTS.api_next_page$