Expanding links from a webpage (Hacker News)

A common use case for the crul query language is taking advantage of expanding stages to open many pages from a single webpage and return the results as a consolidated data set. For example, we may have a recipe site with many recipes listed in a recipe directory. We can use crul to get links to all the recipes in the directory, then expand each of those links and filter for recipe ingredients or another use case.

Example: Hacker News Comments

Full Query

open https://news.ycombinator.com/news
|| find comments
|| filter "(nodeName == 'A') and (parentElement.attributes.class == 'subline')"
|| open $attributes.href$
|| filter "(attributes.class == 'comment')"

Stage 1-3: Filtering for specific links

open https://news.ycombinator.com/news
|| find comments
|| filter "(nodeName == 'A') and (parentElement.attributes.class == 'subline')"

The first stage will open the Hacker News site and process the page into a tabular structure. Think of crul as browser that is opening this page and rendering the content, fulfilling network requests, etc., then converting that rendered content a tabular format.

Next we will find the keyword comments. This helps to narrow down our data set to only rows that contain the string comments somewhere in the row values.

We will next provide a filter expression that narrows down our result set to just links to comment sections. We now have a list of links to comments to pass into our next expanding stage.

Stage 4-5: Opening links and filtering for comments

...
|| open $attributes.href$
|| filter "(attributes.class == 'comment')"

With our list of links to comments, we will open each link asynchronously (throttled/limited by domain policy and available browser workers) and then filter the results to only include elements on the page that contain a comment.

We now have a data set of most comments from the top postings on Hacker News.

Note: There could be some missing comments due to possible expandable sections, but this is beyond the scope of this example!

Expanding links from a webpage (Hacker News)

Example: Hacker News Comments​

Full Query​

Stage 1-3: Filtering for specific links​

Stage 4-5: Opening links and filtering for comments​

Example: Hacker News Comments

Full Query

Stage 1-3: Filtering for specific links

Stage 4-5: Opening links and filtering for comments