Skip to main content

Building a data feed of GitHub issues to Cribl

Building a data feed from a query is a common use case of crul. In this example, we'll deploy a crul query that builds a dynamic data set from a REST API and delivers it to Cribl on a scheduled interval. We'll also explore how the diff command can be used to construct data feeds that only send new results.

Use case

We want to capture newly generated issues from a few Github repos and push those to Cribl once an hour.

concepts

Building our query

Our first step will be writing a query that gets back a set of issues. With the api command, this is straightforward.

api get https://api.github.com/repos/nodejs/node/issues?per_page=100

basic query

The api command is very powerful, and can handle authentication of various forms (including tokens, keys, OAuth, etc.), paginate, set checkpoints, set custom headers and dat payloads, and much more!

To now expand this query to issues from multiple repos we'll use the seed command with a few specific repositories. We could also dynamically fetch repos from a particular organization using an api request, or use the thaw command to use repositories previously described in a csv/json file and uploaded to the cellar.

seed '[
{"repository":"https://api.github.com/repos/nodejs/node"},
{"repository":"https://api.github.com/repos/python/cpython"},
{"repository":"https://api.github.com/repos/rust-lang/rust"},
]'
|| api get $repository$/issues?per_page=100

Great! Now we have the most recent 300 issues (max) from each the three provided repositories.

expansion

Advanced: Pagination

Many APIs return paginated responses, this means that not all results are available in a single request, the response however will include a pointer to the next set of results. This pointer could be a hash value, a page, an offset, or an explicit link. The crul api command can handle many types of pagination using the --pagination.* set of flags.

Advanced: Authentication

Many APIs require some form of authentication. This can be a token, an OAuth flow, or another mechanism. The api command is able to send requests with custom headers and data payloads, and also includes a --bearer flag among other auth related flags to support many forms of API authentication.

The oauth command also works for a number of supported OAuth providers, as well as custom Client Credentials flows.

Configuring a Store, Scheduling and Sending Incremental Diffs to Cribl

We now need to export this data set to Cribl and schedule it to run on a 1 hour interval. However, if we simply schedule the above query, we'll likely send many duplicate issues to Cribl as most repos will not create more than 100 issues per hour. This is why we'll need to use the diff command to only send over new issues that have not already been sent.

Since this diff/export/schedule step is so common, we can use crul's scheduled export wizard. Let's walk through each of the steps.

Step 1: Select a store or create a new one

The first step of the scheduled export wizard is to select a store, or configure a new one. Let's assume we are going to create a new one. If you have an existing store (possibly created by the export wizard previously?) you can simply select it from the dropdown.

store

Step 2: Select a schedule interval

Select a time interval for this query to run on. It will run shortly after creation and then on the set interval.

expansion

CAUTION! If you pick a short interval (less than 5 mins) you may run into issues with the crul cache. Ensure that stages in your query that you do not wish to be cached set the --cache flag to false. Example: api ... --cache false. When in doubt, set --cache false on all stages.

Step 3: Select a diff strategy

There are a few diff strategies.

diff

Rotating diff

The rotating diff is the most commonly used diff strategy. It works by comparing the current diff file to the current results, then returning the results that do not exist in the diff file. Finally, it overwrites the diff file with all the current results, including ones that already existing in the diff file.

Content of old diff:

{"col": "value1"}
{"col": "value2"}

Results prior to diff command:

{"col": "value1"}
{"col": "value2"}
{"col": "value3"}

Results after diff command:

{"col": "value3"}

Contents of diff file after diff command:

{"col": "value1"}
{"col": "value2"}
{"col": "value3"}

Appending diff

Store ALL results and append new ones. This can lead to big, growing, diff files and is NOT recommended unless the results sets are small and/or your are cleaning the diff file regularly of older entries.

No diff

Send the whole set of results each time.

Step 4: Confirm and deploy!

Check that the details look correct and hit Submit to deploy the data feed. It will start running on a schedule.

Note that crul must be continually running for the scheduled queries to run. We recommend using the docker image as a long running service when creating data feeds.

diff

Summary

We've now seen how simple it is to convert a query into a data feed, that populates one of 30+ destinations on a schedule, while maintaining a diff to ensure that only new result are sent over.

Any crul query can be turned into a feed using these steps! Have a web page that you would like turned into a data feed? No problem! Need to turn a REST API into a data feed? We got you!

Advanced

Looks like that wizard added a few stages to my query - how do those work?

diff

With the exception of the scheduling step, the wizard really only adds 2 stages to our data retrieval query.

The first added stage is the diff command, which compares the current results to previously sent results and only returns the new ones.

The second added stage is the freeze command, which delivers our data to a configured store. See exporting results.

Once you understand these commands, you can construct powerful data feeds yourself and schedule them manually, or just use the wizard!

Join our community

Come hang out and ask us any questions. Many of the features and fixes in crul come from our user requests!

Join our Discord