5 posts tagged with "introduction"

OpenAI, meet crul

May 4, 2023 · 5 min read

crul team

Intro

Many of you have probably already tried using OpenAI's API to interact with available AI models. crul's natural API interaction and expansion principles make it easy to create generative AI workflows that dynamically generate and chain together prompts.

Although this blog is limited to OpenAI's API and ChatGPT related models, you can use the same concepts to chain together or distribute prompts across multiple models.

First contact

Let's start small and run a single prompt.

Note: The OpenAI API requires auth, so you'll first need to configure a basic credential containing your OpenAI API key with the name openai.

Running a single prompt

This query uses the prompt command to run a simple prompt using OpenAI's API.

prompt "Write a haiku about APIs"

Simple prompt

Note: If you rerun this query, you'll get back the cached results. You can bypass the cache using the --cache false global flag.

Another way to run this query is with a template from crul's query library. You can think of this template as a macro for the expanded query described in the Template explanation at the end of this post. This template will show you the underlying implementation of the prompt command using the api command.

Chaining prompts

One prompt is cool enough, but let's chain some prompts together with some expansion. In this query, we'll prompt for a list of 5 cities, then split the comma separated response and normalize into 5 rows, then use a different prompt that includes the values ( $response$ ) from the first prompt.

prompt "give me a list of 5 cities on earth, comma separated. like Paris, Los Angeles,etc."
|| split response ","
|| normalize response
|| prompt "what is the distance in miles to $response$ from San Francisco. Respond only with the numeric value. So if the answer is 3000 miles, respond with just 3000, no miles, commas or other punctuation or units of measurement"

Chained prompt

Seed prompts from web content

We can manually fill in the prompt, or generate one/many from a previous set of results. For example, let's get all the headlines from the Hacker News homepage, then ask OpenAI to create a haiku based on each headline. Notice the $headline$ token in the prompt which allows us to create dynamic, previous stage dependent prompts.

open https://news.ycombinator.com/news
|| filter "(nodeName == 'A' and parentElement.attributes.class == 'titleline')"
|| rename innerText headline
|| prompt "Write a haiku about the following headline: $headline$"

Note: This query will have outbound requests throttled by domain policies, which defaults to 1 request per second per domain. This is also the throttle for the OpenAI API as of this blog post, so all good there!

HN Haikus

That's kind of cute! Let's try a more complex prompt and translate each title to French.

Translate web content

What's different about the next query is that we'll merge the headlines into a single row containing a @@@ delimited string to pass into a single request. This isn't neccessary, however it reduces the number of requests we make in the prompt commands stage.

The last two stages (extract and normalize) of the query will extract the JSON response and expand it into rows.

For similar queries, you will need to write your prompt accordingly to let the model understand the data structure you are providing it with.

open https://news.ycombinator.com/news
|| filter "(nodeName == 'A' and parentElement.attributes.class == 'titleline')"
|| mergecolumn innerText --asArray false --delimiter "@@@"
|| rename innerText headlines
|| prompt "Respond only with an array of json objects containing the original headline in a headline key and the translation in a translation key. Translate each of the headlines contained in the following @@@ delimited string to French: $headlines$"
|| extract response
|| normalize

HN Translations

Using different models

The current prompt command defaults to OpenAI's gpt-3.5-turbo model. To override, see the prompt command's --prompt.model flag.

Summary

Have fun playing with API and web content as seeds for LLM prompts! The possiblities are endless, and crul is great tool for quickly trying out new ideas, and creating ML powered workflows.

Possible next steps using crul?

Schedule the query and export the results to a data store of your choice to automate a workflow.
Use the responses as inputs to another API request.
Download the results to CSV/JSON for further processing elsewhere.

We're working on a few improvements to transform web content into JSON friendly strings. Until then, you may run into classic (annoying) issues with string escaping in the --data object.

Please let us know if you run into any issues!

Pretty cool no? Or terrifying, you decide.

Template explanation

The Prompt OpenAI template is essentially the query below. It's a pretty standard authenticated API request using the crul language, where we set headers for Content-Type and Authorization and provide a data payload.

To run this query, you'll need to remplate the $template...$ tokens with explicit values or tokens of your own. For example $template.apiKey$ could be replaced with $CREDENTIALS.openai$ .

api post https://api.openai.com/v1/chat/completions
--headers '{
  "Content-Type": "application/json",
  "Authorization": "Bearer $template.apiKey$",
}'
--data '{
  "model": "gpt-3.5-turbo",
  "messages": [{"role": "user", "content": "$template.prompt$"}],
}'

Tales from the other side of the Hacker News front page

March 7, 2023 · 7 min read

crul team

First off, we're so grateful to all of you who took the time to read our Hacker News post and try out crul. It's our 1.0.x, and we had no idea what to expect, seeing so many of you jump in has really fueled our spirit.

We know we have a lot to improve, but we hope you found it useful and intriguing. We could really use your help in understanding more about your use cases, and what interested you about crul.

30 Second Survey

Alright, let's go back in time and behind the scenes. 🎞️

Tuesday February 21st 10AM:

We call off our planned launch and decide to push it a week. Why? Nic becomes convinced we shouldn't let potential Windows and Linux users out to dry. Carl agrees. Native binaries could be tough, but a docker image? - possible. Let's give it a shot. After all, you only launch (for the first time) once - YOL(FTFT)O?

Our new launch date? One week from now - February 28th, 8AM. ⏰

Wednesday February 22nd, 3PM:

First successful build of our docker image! 🐳 Lots of tweaks and CICD still needed, but we are now feeling good about our planned release date. Carl adds in some nice enhancements to our query bar command suggest section and clears up a few long standing bugs with the query bar.

Friday February the 24th, 6PM:

Both our docker and Mac OS 1.0.3 builds are done, we upload and start to tie up loose ends on the crul.com homepage and enhancing documentation. 🏁

Saturday February the 25th, 9AM:

Our account page looks pretty ... rough. Before you all saw it, it was just 4 big ugly purple buttons, it gets the job done, but we can make it look better. Carl calls for a refresh. 🧼

By the end of the day it's looking much more polished. Nice work!

Meanwhile we are tweaking our post for Hacker News, we want it to sound like it’s coming from us, two engineers who love what they are building, but figuring things out as we go. 🗺️

Sunday February 26th, 9AM:

Today we'll run end to end tests, proofread things (far from done - we're still catching things!), keep adding documentation, examples, etc. Get a little rest as well, we hope to have a big week ahead. 🛏️

Monday February 27th, 10AM:

There's something up with upgrades, it's inconsistent but does fail sometimes. Ouch! 😱 Nic spends most of the day diagnosing, no luck. We decide to move forward, we'll fix it in time for the next upgrade.

Tuesday February 28th, 7AM:

We're up early, at least for Nic, Carl likes to get the worm. 🪱 Just getting everything in place ahead of submitting.

Tuesday February 28th, 8AM:

Submit! Post a message to our Slack, "Hey check this out - we're on Hacker News!"

Major OOPS, stay tuned. ⭕

Tuesday February 28th, 9:30AM:

Hm, we're still in shownew, it's been an hour and a half, we haven't moved to the show section of HN, which is just a click off the home page and should get us a little more traffic. We've got (~7) upvotes and a comment? What's going on? 😬

We draft an email for dang/Hacker News.

Tuesday February 28th, 10:10AM:

We get a response from HN, we screwed up badly by posting to our slack channel, just a couple quick upvotes from our slack friends and we are flagged. We really screwed up here, but fortunately, it's not egregious, and we get a second chance. 🤭🤭🤭

I can't iterate enough, do NOT post the link to your HN post after submission! It's clearly outlined not to solicit upvotes or comments, but we took this too literally. We convinced ourselves that a simple "Hey check this out - we're on Hacker News!" did not explicitly solicit interaction, but, of course, it goes against the spirit of the submission rules - we were dumb and lucky, don't make the same mistake!

Tuesday February 28th, 10:15AM:

We hit the show section, traffic starts coming in, upvotes start coming in, comments start coming in.

OMG!

Tuesday February 28th, 10:30AM:

We're getting good steady traffic for the first time - ever? So exciting, we're glued to the post and our traffic metrics! We also get on the front page!!! 🤩

Tuesday February 28th, 2:00PM:

Still in the 15-25 rank range on the home page, lots of discussion, sign ups and inbound, it's exciting but nerve-wracking. 😹

Tuesday February 28th, 8:00PM:

Still (!) in the 15-25 rank range on the home page, still (!) steady traffic and downloads, we are so thrilled. We're adding tickets to our backlog like crazy based off of what we are hearing back. 😊

Wednesday February 29th, 8:00AM:

Can't fully remember if we were still on the front page 24 hours later, but we're pretty sure - it's a bit of a blur and I doubt either of us slept well from the excitement. The post is still doing well, we are still getting traffic and plenty of sign ups. 🚘

Wednesday February 29th, 10:00AM:

Responding to tickets and questions and trying to wrap our heads around how to figure out what people found cool about crul, and whether it lived up to their expectations. 🤷‍♂️

|| timestamp (Present day)

The last week has been a whirlwind 🌀, and it is still ongoing now as our sign ups keep ticking up. We hope you enjoyed a little taste of the behind the scenes. 🎬

What's next?

We've bootstrapped crul and are excited to continue developing it into our larger vision, but we truly need your help. We built what we wanted to build, what we thought was cool and interesting and powerful, but now we could really use your perspective. 👀

What did/do you want to do with crul? Were you able to? What made sense, what was confusing? Would you be willing to share your thoughts here?

30 Second Survey

We would also love to chat with you, and if you've got 15 minutes to tell us your thoughts and maybe hear some epic stories from Carl, you'll be first in line for a commemorative "i used crul before it was cool" mug! ☕

Lastly, if you have an enterprise use case, we would love to collaborate with you as a design partner. We're a lot of fun to work with, dedicated, and care deeply about solving your problems. Hell, we'll probably make it free if you ask nicely! 😉

TLDR; Launched on Hacker News, did well, was exciting, now what? Help us out?

Stumbling along as we go,

Nic and Carl (Founders of Crul, Inc.)

Show HN

February 28, 2023 · 2 min read

crul team

Hi HN, we’re Carl and Nic, the creators of crul, and we’ve been hard at work for the last year and a half building our dream of turning the web into a dataset. In a nutshell crul is a tool for querying and building web and api data feeds from anywhere to anywhere.

With crul you can crawl and transform web pages into csv tables, explore and dynamically query APIs, filter and organize data, and push data sets to third party data lakes and analytics tools. Here’s a demo video, we’ve been told Nic sounds like John Mayer (lol)

We’ve personally struggled wrangling data from the web using puppeteer/playwright/selenium, jq or cobbling together python scripts, client libraries, and schedulers to consume APIs. The reality is that shit is hard, doesn’t scale (classic blocking for-loop or async saturation), and thorny maintenance/security issues. The tools we love to hate.

Crul’s value prop is simple: Query any Webpage or API for free.

At its core, crul is based on the foundational linked nature of Web/API content. It consists of a purpose built map/expand/reduce engine for hierarchical Web/API content (kind of like postman but with a membership to Gold's Gym) with a familiar parser expression grammar that naturally gets the job done (and layered caching to make it quick to fix when it doesn’t on the first try). There’s a boatload of other features like domain policies, scheduler, checkpoints, templates, REST API, Web UI, vault, OAuth for third parties and 20+ stores to send your data to.

Our goal is to open source crul as time and resources permit. At the end of the day it’s just the two of us trying to figure things out as we go! We’re just getting started.

Crul is one bad mother#^@%*& and the web is finally yours!

Download crul for free as a Mac OS desktop application or as a Docker image and let us know if you love it or hate it. And come say hello to us on our slack channel - we’re a friendly bunch!

Nic and Carl - (Crul early days)

P.S. Every download helps so give it a spin!

Dear xsplunk friends

February 2, 2023 · 4 min read

crul team

Hi xsplunk, we’re Carl Yestrau and Nic Stone, the creators of crul. We have been hard at work for the last year and a half building our dream of turning the web into a dataset. We couldn’t think of a more close to home community to first spill the beans with than our xsplunk friends.

We are greatly sorry for those of you that have been impacted by the industry wide reduction in force and the ensuing psychological and financial trauma. Remember, time heals and we are always here to listen and you don’t have to be alone. ❤️‍ Nic and I’s journey has not been easy. To be frank, it’s been hella rough.

We appreciate having the chance to reach out to our closest and trusted crew. We'd love for you to take a look at what we've been up to by downloading crul for free and letting us know if you love it or hate it. If you've got use cases ideas or feedback, we'd love to hear it. Maybe we can work together again.

In a nutshell crul is a tool for building web and api data feeds from anywhere to anywhere. The name crul comes from a mashup of the word "crawl" (as in web crawling), and “curl”, one of our favorite tools for interacting with the web. We certainly have gone back and forth on potential confusion, but we just loved the mashup!

With crul you can crawl and transform web pages into csv tables, explore and dynamically query APIs, filter and organize data, and push data sets to third party data lakes and analytics tools. Here’s a demo video (you can hear Nic jam’n like John Mayer according to Raitz):

We’ve personally struggled wrangling data from the web and APIs as data inputs using puppeteer/playwright/selenium or cobbling together python scripts and client libraries to consume APIs. The reality is that shit is a real pain in the ass, doesn’t scale (classic blocking for-loop or async saturation), and thorny maintenance/security issues (can’t touch this!).

At its core, crul is a purpose built map/expand/reduce engine for hierarchical web/api content (kind of like postman with a membership to Gold’s Gym in the 90291) and a familiar parser expression grammar that naturally gets the job done. The query language pays homage to an intuitive command, argument, flag, piping syntax with a dedicated Parser Expression Grammar (PEG). Using the PEG, Queries are converted into Query Plans, composed of Stages. Stages are composed of Commands which are either reducing, mapping or expanding and executed with parallelization. You can think of crul as an accordion and we represent that with the ability to pipe commands to subsequent commands with the double “||” operator. Cool hey, we think so.

There’s a boatload of other features including query optimizer/cache, domain throttle policies, query job scheduler, checkpoints, vault, OAuth for third-party APIs, headless browser cluster, ui and over 20+ stores to send your data to.

As a bootstrapped company we need to pick our battles so the oss rollout is going to take some time, planning and money. Our long-term vision is to open source the core with the help of our friends at the linux foundation along with a commercial offering.

Our underlying software vision was to really understand the dynamics of the state machine in a fluid and flexible way so YES some logical layers do contain nodejs and are loosely typed. Heavy lifting is managed by Golang/C/C++ contrib packages and the literal REST is a series of microservices that can be run independently or grafted together. Command language bindings are currently limited to nodejs and we have yet to invest in a standard IPC protocol for custom commands. We really want to define our software roadmap based on hardening the core for 2.0.0 before flipping to oss.

Crul is one bad mother#^@%*& and the web is finally yours!

Nic and Carl - (Crul early days)

P.S. Crul, Inc. is a registered Delaware C-Corp with a stock plan (409a), provisional patents, trademark, no debts and working product. The doors are wide open to the old skool crew in an unusual and creative way, yes you can become an employee and future shareholder. We have a lot of work ahead of us but we got this!

Introducing crul!

January 30, 2023 · One min read

crul team

We are extremely excited to introduce you to crul 1.0.1!

We are Carl and Nic, the creators of crul, and we have been hard at work for the last year and a half building our dream of turning the web into a dataset.

The name crul comes from a mashup of the word "crawl" (as in web crawling), and "curl", one of our favorite tools for interacting with the web. We certainly have gone back on forth on potential confusion, but we just loved the mashup!

With crul, you can rapidly interact with webpages and APIs to build datasets, expand links asynchronously for custom crawling operations, and process and structure datasets for scheduled delivery to third party systems.

We aim to make web data accessible and usable to anyone who wants it.

We are so excited to help you unlock the power of crul for your web data use cases!

Carl and Nic

Intro​

First contact​

Running a single prompt​

Chaining prompts​

Seed prompts from web content​

Translate web content​

Using different models​

Summary​

Template explanation​

What's next?​

Intro

First contact

Running a single prompt

Chaining prompts

Seed prompts from web content

Translate web content

Using different models

Summary

Template explanation

What's next?