Skip to main content

parseArticle

parseArticle [column]

Extracts an article from raw HTML.

arguments:

column

The column to extract the html table from. Ensure that the html contains the <table></table> tags. If using the open command with the --html or --hashtml flag, the valid html will be contained in the outerHTML column (type: string)

examples:

Query

open https://www.crul.com/blog/2023-03-07-tales-hn-front-page --html
|| filter "(nodeName == 'HTML')"
|| parseArticle outerHTML

Results prior to parseArticle stage:

...outerHTML
...<html>....</html>

Results after parseArticle stage:

article.titlearticle.byline...article.textContent...
Tales from the other side of the Hacker News front pagecrul team...First off, we're so grateful to all of you who took the time to read our Hacker News post and try......

flags:

--hash

Add a hash column to the results

support

AMI_ENTERPRISE AMI_FREE AMI_PRO BINARY_ENTERPRISE BINARY_FREE BINARY_PRO DESKTOP_ENTERPRISE DESKTOP_FREE DESKTOP_PRO DOCKER_ENTERPRISE DOCKER_FREE DOCKER_PRO