Skip to main content

Key Commands

The crul query language is extensive, and allows for many specific operations to help transform a data set into what you need. However there is a much smaller number of key commands that are most frequently used when authoring queries.

These key commands are broken into 4 categories, let's take a look at each of these categories and key commands in additional detail below.

Key Data Retrieval Commands​

Data retrieval commands are used to retrieve data from a web source, such as an API, web page, etc.

api​

The api command allows you to make REST requests to an endpoint, and returns the response in a tabular form. Currently this command supports XML, CSV, and JSON response formats, and will return other formats as raw data.

api get https://api.github.com/orgs/netflix/members

open​

The open command will open a web page, fulfill network requests, render javascript, etc. and process the page's content into a tabular data set.

open https://news.ycombinator.com

requests​

The requests command will open a web page, and monitor network requests. Once the page has fully loaded, the response will include a rich data set including request sources and destinations, full request and response payloads, timing data, and more.

requests https://news.ycombinator.com

addcolumn​

The addcolumn command can be used to add a new column to a data set, and can include tokens to create a new column containing the values of other columns in a row:

addcolumn newcolumn "the oldcolumn value is: $oldcolumn$"

seed​

The seed command allows you to provide an array of JSON objects, each of which will become a row to process.

seed '[{"col1": "val1", "col2": "val2"},{"col1": "val3", "col2": "val4"}]'

Key Data Filtering Commands​

Data filtering commands are used to limit our data set according to keywords and specific values.

find​

  • The find command will do a text search for the provided string in each row, a row is included in the results if the string exists somewhere in the row.
open https://news.ycombinator.com
|| find comments

filter​

The filter command allows you to run more complex expressions comparing column values in each row to one another, to specific values, with boolean logic. See the filter expressions documentation for more details on constructing filters.

open https://news.ycombinator.com
|| filter '(nodeName == "A")'

table​

The table command will only include the provided columns. This a great way to reduce a data set for better performance, or when you want a clean final set of results.

open https://news.ycombinator.com
|| table nodeName attributes.href innerText

The head command can be used only include the first N rows of results. This is helpful in developing queries as it allows you to limit expansion while testing. For example, you may have a result set with 20 links that you want to expand, but to test, you might first use ... || head 1 prior to the the api/open/requests to limit the expansion to only the first link.

devices
|| head 3

unique​

The unique command will remove duplicate rows from a particular column.

open https://news.ycombinator.com
|| unique nodeName

contains​

The contains command can be used for finer grained filtering than the find command, you can specify a column to check for a particular string, rows containing that string in the provided column will be included in the results. Also see the regex command for similar functionality with regex paterns and the excludes for the opposite use case (remove rows containing a string).

devices
|| contains name "iP"

Key Data Import/Export Commands​

Data import/export commands are used to either start a query by importing data from an existing local or external data set, or to export results to local or external stores (such as Kafka, Amazon S3, etc.)

freeze​

The freeze command can either store results locally to a file (this will overwrite if the frozen results already exist), which can be thawed (see below), or push results to a preconfigured store using the --store flag, such as a Kafka topic, an Amazon S3 Bucket, etc.

open https://news.ycombinator.com
|| freeze hn_home_page_raw

thaw​

The thaw command will fill a query pipeline with previously locally frozen results. It is also possible to thaw frozen results from read/write external stores, such as Amazon S3.

Also, if you upload your own JSON, NDJSON, or CSV file to the cellar, you will be able to fill a query pipeline with it using the thaw command.

open https://news.ycombinator.com
|| freeze hn_home_page_raw
|| thaw hn_home_page_raw

Key Data Processing Commands​

Data processing commands are used to convert/flatten, merge, or compare data sets.

normalize​

The normalize command takes a column containing an array, and expands each element of the array into its own row, while keeping the top level columns in the new rows. If you have a large number of columns with keys like data.0.thing1, data.1.thing1, data.2.thing1, (note the array index; 0, 1, 2, ...) you can normalize the data column.

api get https://pokeapi.co/api/v2/pokemon
|| normalize results

join​

The join command can be used to join two datasets based on a shared key. It is based on the concept of labeled stages, which means that you must first label an older stage before joining it with the results of the current stage.

seed '[{"shared": "hi", "unique1": "value1"}]' --labelStage "joinme"
|| seed '[{"shared": "hi", "unique2": "value2"}]'
|| join shared joinme

sort​

The sort command will sort results in the provided column. Use the --order flag to determine if the results are sorted in ascending or descending order.

devices
|| sort viewport.width

diff​

The diff command can be used to get results that don't exist in a previous set of results. This is particularly powerful in combination with the freeze command, as it allows for only new results to be pushed.

seed '[{"shared": "hi"}]'
|| freeze todiff
|| seed '[{"shared": "hi"},{"new": "hello"}]'
|| diff todiff