Skip to main content

Building a data feed of vector embeddings from Monday.com to Pinecone

Building a data feed from a query is a common use case of crul. In this example, we'll deploy a crul query that transforms REST API responses from Monday.com into vector embeddings that are then loaded into a Pinecone vector database on a set schedule

Vector loading

Use case

We want to get board items from Monday.com, transform them into vector embeddings, then load into a Pinecone vector database. We'll also demonstrate how to semantically query the embeddings from crul!

If you're not familiar with Monday.com, think of it as an all-in-one work management platform that helps teams streamline their workflow, collaborate seamlessly, and manage complex projects effectively.

Monday Board

Prerequisites

We will need a global API token for the Monday.com API. This API token will need to be configured as a custom credential.

We will also need an API key for the Pinecone REST API. This API key will also need to be configured as a Pinecone credential.

Finally we'll need a Pinecone index that is configured with the same dimensions as the vector embeddings generated by the vectorize command (1536). Keep track of the name of the index as you will need it for future stages.

Creating a Pinecone index

The query

Here is the entire query, we'll walk through it step by step next.

graphql https://api.monday.com/v2/ 'query {
boards {
items {
name
}
}
}'
--headers '{ "Authorization": "$CREDENTIALS.monday$" }'
|| normalize boards
|| normalize items
|| diff vectorized_board_items --update true
|| vectorize item
|| vectorload --pinecone.index "{INDEX}.pinecone.io"

Building our query

Retrieving board items

We first need to retrieve board items from our Monday.com account. The Monday.com API uses GraphQL to access resources, so we'll use the graphql command to list all current board items.

The normalize commands will convert the GraphQL response into a tabular format.

graphql https://api.monday.com/v2/ 'query {
boards {
items {
name
}
}
}'
--headers '{ "Authorization": "$CREDENTIALS.monday$" }'
|| normalize boards
|| normalize items

Board items

Converting board items to vector embeddings

The vectorize command can be used to easily turn the values in a particular column into vector embeddings using OpenAI. This stage will convert the item column to vector embeddings

...
|| vectorize items

Vector embeddings

You can override the deafult model, or construct a query to another vector embedding generating API using the api/curl commands.

Loading vector embeddings

The vectorload command will simply load all rows (which should contain vector embeddings) into a pinecone index.

...
|| vectorload --pinecone.index "{INDEX}.pinecone.io"

Vector loading

Test by querying our vector database

We can test if our embeddings were loaded correctly by directly querying the vector database from crul with a semantic query using the vectorquery command. You will want to run this query in a new tab as it does not depend on the previous stages.

vectorquery "Tasks relating to our upcoming user conference" --pinecone.index "{INDEX}.pinecone.io"

Vector query

Optional diffing

We may want to incorporate a diff to prevent sending the same embeddings multiple times. We can do this using the diff command, which will maintain a set of board items that have already been converted to vector embeddings and loaded into a pinecone index.

The diff command can be configured to run with a number of different strategies to maintain, update, etc. the diff. See the diff command for more details.

We'll want to run it prior to the vectorize command, so that we only vectorize new items.

...
|| diff vectorized_board_items --update true

Advanced

We could also use checkpoints combined with dates to retrieve new and/or edited items based off of the current timestamp $TIMESTAMP$ or $TIMESTAMP.ISO$. Once we have made the request, we can save the checkpoint, and access and reuse it next time the query runs.

This has a similar result to diff, and could be used in combination. It is an important consideration if the number of boards/items is expected to continue to increase.

Scheduling

The last step is to schedule our query to run on a set interval. First let's copy the query. Then, from the menu, navigate to the scheduler admin page.

Add scheduled query

Provide a name, the query, add the desired interval, and submit.

Now our query will run on the set interval and populate our vector database index!

Summary

We've now seen how simple it is to create a query that transforms GraphQL API responses into vector embeddings and loads them into a Pinecone vector database. We can also query that database all from within crul!

It's not just Pinecone! Send to one of 30+ stores using the freeze command, or construct your own api/curl command to make generic REST requests to another destination or vector database.

Any crul query can be turned into a feed of vector embeddings using these steps! Have a web page that you would like turned into vector embeddings? No problem! Need to turn a REST API into a data feed? We got you!

Join our community

Come hang out and ask us any questions. Many of the features and fixes in crul come from our user requests!

Join our Discord