Skip to main content

· 2 min read
crul team

Hi HN, we’re Carl and Nic, the creators of crul, and we’ve been hard at work for the last year and a half building our dream of turning the web into a dataset. In a nutshell crul is a tool for querying and building web and api data feeds from anywhere to anywhere.

With crul you can crawl and transform web pages into csv tables, explore and dynamically query APIs, filter and organize data, and push data sets to third party data lakes and analytics tools. Here’s a demo video, we’ve been told Nic sounds like John Mayer (lol)

We’ve personally struggled wrangling data from the web using puppeteer/playwright/selenium, jq or cobbling together python scripts, client libraries, and schedulers to consume APIs. The reality is that shit is hard, doesn’t scale (classic blocking for-loop or async saturation), and thorny maintenance/security issues. The tools we love to hate.

Crul’s value prop is simple: Query any Webpage or API for free.

At its core, crul is based on the foundational linked nature of Web/API content. It consists of a purpose built map/expand/reduce engine for hierarchical Web/API content (kind of like postman but with a membership to Gold's Gym) with a familiar parser expression grammar that naturally gets the job done (and layered caching to make it quick to fix when it doesn’t on the first try). There’s a boatload of other features like domain policies, scheduler, checkpoints, templates, REST API, Web UI, vault, OAuth for third parties and 20+ stores to send your data to.

Our goal is to open source crul as time and resources permit. At the end of the day it’s just the two of us trying to figure things out as we go! We’re just getting started.

Crul is one bad mother#^@%*& and the web is finally yours!

Download crul for free as a Mac OS desktop application or as a Docker image and let us know if you love it or hate it. And come say hello to us on our slack channel - we’re a friendly bunch!

Nic and Carl - (Crul early days)

P.S. Every download helps so give it a spin!

· 4 min read
crul team

Hi xsplunk, we’re Carl Yestrau and Nic Stone, the creators of crul. We have been hard at work for the last year and a half building our dream of turning the web into a dataset. We couldn’t think of a more close to home community to first spill the beans with than our xsplunk friends.

We are greatly sorry for those of you that have been impacted by the industry wide reduction in force and the ensuing psychological and financial trauma. Remember, time heals and we are always here to listen and you don’t have to be alone. ❤️‍ Nic and I’s journey has not been easy. To be frank, it’s been hella rough.

We appreciate having the chance to reach out to our closest and trusted crew. We'd love for you to take a look at what we've been up to by downloading crul for free and letting us know if you love it or hate it. If you've got use cases ideas or feedback, we'd love to hear it. Maybe we can work together again.

In a nutshell crul is a tool for building web and api data feeds from anywhere to anywhere. The name crul comes from a mashup of the word "crawl" (as in web crawling), and “curl”, one of our favorite tools for interacting with the web. We certainly have gone back and forth on potential confusion, but we just loved the mashup!

With crul you can crawl and transform web pages into csv tables, explore and dynamically query APIs, filter and organize data, and push data sets to third party data lakes and analytics tools. Here’s a demo video (you can hear Nic jam’n like John Mayer according to Raitz):

We’ve personally struggled wrangling data from the web and APIs as data inputs using puppeteer/playwright/selenium or cobbling together python scripts and client libraries to consume APIs. The reality is that shit is a real pain in the ass, doesn’t scale (classic blocking for-loop or async saturation), and thorny maintenance/security issues (can’t touch this!).

At its core, crul is a purpose built map/expand/reduce engine for hierarchical web/api content (kind of like postman with a membership to Gold’s Gym in the 90291) and a familiar parser expression grammar that naturally gets the job done. The query language pays homage to an intuitive command, argument, flag, piping syntax with a dedicated Parser Expression Grammar (PEG). Using the PEG, Queries are converted into Query Plans, composed of Stages. Stages are composed of Commands which are either reducing, mapping or expanding and executed with parallelization. You can think of crul as an accordion and we represent that with the ability to pipe commands to subsequent commands with the double “||” operator. Cool hey, we think so.

There’s a boatload of other features including query optimizer/cache, domain throttle policies, query job scheduler, checkpoints, vault, OAuth for third-party APIs, headless browser cluster, ui and over 20+ stores to send your data to.

As a bootstrapped company we need to pick our battles so the oss rollout is going to take some time, planning and money. Our long-term vision is to open source the core with the help of our friends at the linux foundation along with a commercial offering.

Our underlying software vision was to really understand the dynamics of the state machine in a fluid and flexible way so YES some logical layers do contain nodejs and are loosely typed. Heavy lifting is managed by Golang/C/C++ contrib packages and the literal REST is a series of microservices that can be run independently or grafted together. Command language bindings are currently limited to nodejs and we have yet to invest in a standard IPC protocol for custom commands. We really want to define our software roadmap based on hardening the core for 2.0.0 before flipping to oss.

Crul is one bad mother#^@%*& and the web is finally yours!

Nic and Carl - (Crul early days)

P.S. Crul, Inc. is a registered Delaware C-Corp with a stock plan (409a), provisional patents, trademark, no debts and working product. The doors are wide open to the old skool crew in an unusual and creative way, yes you can become an employee and future shareholder. We have a lot of work ahead of us but we got this!

· One min read
crul team

We are extremely excited to introduce you to crul 1.0.1!

We are Carl and Nic, the creators of crul, and we have been hard at work for the last year and a half building our dream of turning the web into a dataset.

The name crul comes from a mashup of the word "crawl" (as in web crawling), and "curl", one of our favorite tools for interacting with the web. We certainly have gone back on forth on potential confusion, but we just loved the mashup!

With crul, you can rapidly interact with webpages and APIs to build datasets, expand links asynchronously for custom crawling operations, and process and structure datasets for scheduled delivery to third party systems.

We aim to make web data accessible and usable to anyone who wants it.

We are so excited to help you unlock the power of crul for your web data use cases!

Carl and Nic