Vector Embeddings
crul can be used to generate vector embeddings, load vector embeddings to a vector database such as pinecone, and semantically query a vector database. The three commands that support this functionality are:
- The
vectorizecommand, which transforms crul results into vector embeddings using the OpenAI API. - The
vectorloadcommand, which loads vectors into a vector database such as pinecone. - The
vectorquerycommand, which queries a vector database such as pinecone.
Note: These vector related commands require auth.
- For the
vectorizecommand, you'll need to configure anopenaicredential containing your OpenAI API key with the nameopenai. - For the
vectorloadandvectorquerycommands you'll need to configure apineconecredential containing your Pinecone API key with the namepinecone.
How it works​
The two main commands to understand are the vectorize command and the vectorquery command.
The
vectorizecommand can take any crul results, whether from an API, webpage, cellar file, or other source, and transform them into vector embeddings using the OpenAI embeddings endpoint. From here, you can use theapicommand to push vector embeddings to a vector database, or if pinecone is your vector databse, simply use thevectorloadcommand.The
vectorquerycommand can be used to semantically query an existing pinecone vector database. This can be a databse that has vector embeddings loadded in using crul and theapi/vectorloadcommands, or an existing vector database that is already configured with vectors.
Need support for another vector database? Let us know!
Let's take a look at some examples.
Examples​
vectorize only​
Query​
devices
|| vectorize name
vectorquery only​
Query​
vectorquery "Headlines relating to California" --pinecone.index "{INDEX}.pinecone.io"
vectorize and vectorload​
Query​
devices
|| vectorize name
|| vectorload --pinecone.index "{INDEX}.pinecone.io"
vectorize, vectorload and vectorquery​
Query​
This first example will demonstrate all 3 commands at once. We will first use the open command to get back a list of headlines, then vectorize the results, vectorload the results into a pinecone vector database index, then vectorquery the pinecone vector database index with a semantic search for Headlines relating to California.
open https://news.ycombinator.com/news
|| filter "(nodeName == 'A' and parentElement.attributes.class == 'titleline')"
|| rename innerText headline
|| vectorize innerText
|| vectorload --pinecone.index "{INDEX}.pinecone.io"
|| vectorquery "Headlines relating to California" --pinecone.index "{INDEX}.pinecone.io" \