Query any Webpage or API
Access, crawl, enrich, and clean data from anywhere to anywhere.


Lights, camera,
'n IT! ✨
- Latest Hacker News Comments (Web)
- Latest Hacker News Comments (API)
- Shopify Product Prices (Web)
- Latest Tweets (API)
open https://news.ycombinator.com/news
|| find comments
|| filter "(nodeName == 'A') and (parentElement.attributes.class == 'subline')"
|| open $attributes.href$
|| filter "(attributes.class == 'comment')"

api get https://hacker-news.firebaseio.com/v0/maxitem.json?print=pretty
|| math "min = data - 10"
|| range $min$ $data$ item
|| api get https://hacker-news.firebaseio.com/v0/item/$item$.json
|| filter "(type == 'comment')"

open https://www.tentree.ca/collections/mens-shorts --html --hashtml
|| filter "(attributes.class == 'justify-between product-attr-container mt-2 relative')"
|| sequence
|| html innerHTML
|| filter "(_html.nodeName == 'H3') or (_html.attributes.class == 'flex' or _html.attributes.class == 'text-discount-price')"
|| excludes _html.innerHTML "line-through"
|| table _html.innerText outerHTMLHash _sequence
|| groupBy outerHTMLHash
|| rename _group.0._html.innerText product
|| rename _group.1._html.innerText price
|| sort _group.0._sequence --order "ascending"
|| addcolumn time $TIMESTAMP.ISO$
|| table product price time

api get "https://api.twitter.com/2/tweets/search/recent?query=NASA&tweet.fields=created_at,author_id,public_metrics&max_results=100"
--bearer "$CREDENTIALS.twitter_bearer$"
--pagination.max 5
--pagination.next "meta.next_token"
--pagination.url "https://api.twitter.com/2/tweets/search/recent?query=NASA&tweet.fields=created_at,author_id,public_metrics&max_results=100&next_token=$pagination.next$"
|| normalize data

Dreamy Webpage and API data feeds. 🌜
- Export Crul Results to Kafka/etc.
- Scheduled Export of Crul Results to Kafka/etc.
- Download Crul Results to CSV/JSON
- Retrieve Crul Results By API
...
|| freeze --store "kafka-prod" --kafka.topic "hn_comments"

Configure queries to run on a set interval and export to the destination of your choice. Use the diff command to maintain rolling diffs of only new content, or push entire results set on each run.

Download results to a local CSV or JSON file and further process in the data analysis tool of your choice.

Use the crul API to dispatch queries and read results.
curl -X 'POST' 'http://localhost:1968/v1/sirp/query/runner/dispatch' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H "Authorization: crul {token}" \
-d '{ "query": "devices" }'
Yeah,
can do a whole lot... 🎨
Push incremental changes
Compare results from previous results and only retain what is new.
Learn more ↗Send data to 30+ stores
Send your data to Amazon S3, Kafka, Splunk and a whole lot more.
Learn more ↗Client authentication for OAuth
Authenticate with multiple OAuth providers in order to access protected API data.
Learn more ↗Domain throttling
Control the interval of accessing a domain based on custom prescribed rate limit policies.
Learn more ↗Run queries with your favorite language
Access the query results with auth keys and the REST API.
Learn more ↗