Skip to main content

Vector Embeddings

crul can be used to generate vector embeddings, load vector embeddings to a vector database such as pinecone, and semantically query a vector database. The three commands that support this functionality are:

  • The vectorize command, which transforms crul results into vector embeddings using the OpenAI API.
  • The vectorload command, which loads vectors into a vector database such as pinecone.
  • The vectorquery command, which queries a vector database such as pinecone.

Note: These vector related commands require auth.

  • For the vectorize command, you'll need to configure an openai credential containing your OpenAI API key with the name openai.
  • For the vectorload and vectorquery commands you'll need to configure a pinecone credential containing your Pinecone API key with the name pinecone.

How it works

The two main commands to understand are the vectorize command and the vectorquery command.

  • The vectorize command can take any crul results, whether from an API, webpage, cellar file, or other source, and transform them into vector embeddings using the OpenAI embeddings endpoint. From here, you can use the api command to push vector embeddings to a vector database, or if pinecone is your vector databse, simply use the vectorload command.

  • The vectorquery command can be used to semantically query an existing pinecone vector database. This can be a databse that has vector embeddings loadded in using crul and the api/vectorload commands, or an existing vector database that is already configured with vectors.

Need support for another vector database? Let us know!

Let's take a look at some examples.

Examples

vectorize only

Query

devices
|| vectorize name

vectorquery only

Query

vectorquery "Headlines relating to California" --pinecone.index "{INDEX}.pinecone.io"

vectorize and vectorload

Query

devices
|| vectorize name
|| vectorload --pinecone.index "{INDEX}.pinecone.io"

vectorize, vectorload and vectorquery

Query

This first example will demonstrate all 3 commands at once. We will first use the open command to get back a list of headlines, then vectorize the results, vectorload the results into a pinecone vector database index, then vectorquery the pinecone vector database index with a semantic search for Headlines relating to California.

open https://news.ycombinator.com/news
|| filter "(nodeName == 'A' and parentElement.attributes.class == 'titleline')"
|| rename innerText headline
|| vectorize innerText
|| vectorload --pinecone.index "{INDEX}.pinecone.io"
|| vectorquery "Headlines relating to California" --pinecone.index "{INDEX}.pinecone.io" \