10 posts tagged with "release"

Run Postman Collections and curl Enhancements - All About Our 1.9 Release

June 10, 2024 · One min read

crul team

Run Postman Collections directly. Re-use your favorite curl command snippets.

Postman Collections

Postman Collections are a powerful way for organizing API requests and modeling API workflows. Use the postman command to run existing Postman collections within crul to fuel your data pipeline.

curl Enhancements

With our improved curl command emulator run your favorite curl command snippets and have the results automatically converted to a tabular dataset. xml, csv, json, yaml and parquet are supported out of the box!

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Read Parquet Files, Dataset JSON Schema Validation - All About Our 1.8 Release

April 29, 2024 · 2 min read

crul team

Read Parquet files from your object store, transform and validate against JSON schemas for export to your Data Lake. Are you interested in something? Let us know!

Get Crul 1.8

Read Parquet Files

The curl and api commands can now automatically read Apache Parquet files in compressed (gz) or uncompressed formats. Get your Data Lake exploration or migration game on.

Dataset JSON Schema Validation

The new jsonvalidate command allows for the validation of datasets against predefined JSON Schemas either local or remote. OCSF transformation and validation has never been easier.

Commnd Bulk Operations

Enhance search performance through multiple command operations in one call now with the addcolumn, rename, replace, tonumber commands.

Parquet Write Compression Options

Use the compression algorithm of your choice when freezing results in Apache Parquet format. Support for snappy, gzip, lzo, brotli, lz4, zstd.

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

More Business Apps and Stores, Parquet export - All About Our 1.7 Release

November 2, 2023 · 2 min read

crul team

Authenticate and query more Business Apps for export to your Data Lake in Parquet format. Are we missing something? Let us know!

Get Crul 1.7

More Business Apps for Data Lakes

Authenticate with over 20+ Business App services now including Okta for Custom Authorization Servers, Microsoft 365 and Zoom.

Export in Parquet Data File Format

Schedule and store incremental diffs in Apache Parquet format. Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Business Apps for Data Lakes, Query Variables, API Pagination - All About Our 1.6 Release

October 1, 2023 · 2 min read

crul team

We've packed in plenty of your requests into this one! Authenticate and query Business Apps for export to your Data Lake. Maintain state across query stages using the new --variable flag. Paginate through REST and SOAP APIs with ease. Are we missing something? Let us know!

Get Crul 1.6

Business Apps for Data Lakes

Authenticate and query Workday, Salesforce, Reltio, Mulesoft and Zoom. Store Business App event data to your preferred Data Lake.

Query Variables

You will often find yourself needing a value from a few stages ago that has been filtered out while developing queries. Some examples would be credentials, dates, or other bits of state. This release introduces query variables for simple stage level state setting and retrieval.

API Pagination

Paginate any type of REST and SOAP API specification.

Global controls

Disable caching globally for certain development and production use cases. Turn off Domain Throttling for increased outbound request throughput - use at your own risk!

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Tidy your data

August 18, 2023 · 6 min read

crul team

Getting clean data is hard enough, but sometimes, you need more than clean data. Just like in a house or an apartment, pushing the clutter into the closet doesn't make it go away. You need to tidy your data.

It's one thing to have tabular data with nice columns and logical rows. It's another to have data that is ready for analysis. Tidy data is data that is ready for analysis. It's data that is organized in a way that makes it easy to work with, and it is easy to manipulate, visualize, and model. Tidy data is data that is easy to use.

What is tidy data?

The tidy data concept was introduced by Hadley Wickham in his 2014 paper, Tidy Data. And it still rings true today.

Simply put, tidy data a dataset where:

Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.

Althought this structure is not always necessary for high quality analysis, it can often make big difference in the ease of analysis.

Tidy data in crul

The defacto data format in crul is the table, and the query language makes it easy to normalize (similar to flattening), rename, table, and untable data dynamically. Other common operations like joining and appending are also possible.

Often to get tiny data you'll need to "melt" your columns. This is where crul's new melt command shines. More on this shortly, but first, what does it mean to "melt" data?

Melting data

Melting data is the process of taking columns and turning them into rows. This is often necessary when you have a dataset that has multiple columns that represent the same thing. For example, if you have a dataset that has a column for each year, you might want to melt the data so that you have a single column for the year and a single column for the value.

Data prior to melting:

song	artist	2019	2020	2021	2022	2023
song1	artist1	100	200	300	400	500
song2	artist2	200	300	400	500	600
song3	artist3	300	400	500	600	700

Data after melting:

song	artist	year	plays
song1	artist1	2019	100
song1	artist1	2020	200
song1	artist1	2021	300
...	...	...	...
song1	artist2	2022	500
...	...	...	...
song3	artist3	2023	700

By melting the data, you can now easily analyze the data by year. You can also easily visualize the data by year. And you can easily model the data by year.

Melting data in crul

The melt command in crul makes it easy to melt data. It takes a list of columns to melt. It then melts the data in those columns, and keeps the rest. You can then rename the columns to whatever you want, and continue processing, download as a csv or json file, or push to a third party store (like an S3 bucket).

Let's see an example of melting the data from the previous example.

We'll assume our data is in a file called plays.csv that we have uploaded to the cellar. It will be the same as the data in the previous example.

thaw plays.csv
|| melt 2019 2020 2021 2022 2023
|| rename column year

You can also provide wildcards to the melt command. For example, if you wanted to melt all columns that start with 20, you could do the following:

thaw plays.csv
|| melt 20*
|| rename column year

More examples from the tidy data paper

Let's take two examples from tidy data paper and see how we can melt the data in crul.

Example 1: Billboard top 100

We'll start with the billboard charts dataset from the tidy data paper. You can find the dataset here.

We'll first upload that csv to the cellar so we can thaw it into our pipeline.

thaw billboard.csv

Raw Billboard

Notice that we have observations in our columns, specifically the billboard rank at different weeks in columns x1st.week, x2nd.week, etc. This is not tidy!

Let's melt all columns that fit the regex pattern x.* (x1st.week, x2nd.week, etc.).

thaw billboard.csv
|| melt x.*
|| rename value.week rank
|| rename column week

Tidy Billboard

From here we can do a little more cleanup and renaming of columns, construct timestamps, or process otherwise, but our data is now effectively "molten".

Example 2: Tuberculosis

thaw tb.csv

Raw TB

Notice that we have observations in our columns, specifically the number of cases for different categories/dates in columns new_sp_m04, new_sp_m514, etc. This is not tidy!

Let's melt all columns that fit the regex pattern new_sp.* (new_sp_m04, new_sp_m514, etc.).

We are also using the untable command to remove an unwanted row that will match our pattern.

Finally we use a combination of the fillEmpty and filter commands to filter out null values. This is optional, in fact you might want these empty values in your results for analysis, or you may want to fill them with a different default and leave them in!

thaw tb
|| untable new_sp
|| melt new_sp.*
|| fillEmpty --filler "EMPTY"
|| filter "(value != 'EMPTY')"

Tidy TB

From here we can do a little more cleanup and renaming of columns, construct timestamps, or process otherwise, but our data is now effectively "molten".

Why use crul for tidy data?

The advantage of using crul for tidy data is the ability to both access the data and process it quickly in one place. Crul's caching tiers make it easy to iteratively design your dataset. You can also configure a schedule to automatically build data sets and optionally push them to one or more of 30+ common stores.

You can take advantage of other powerful commands in combination with the melt command. For example, incorporate semi-synthetic data generation with the synthesize command, incorporate prompting with the prompt command, or enrich/seed your data sets from web or API content with the open and api commands.

Happy melting!

Join our Community

Come hang out and ask us any questions.

Join our discord

SOAP API support, XML file uploads, melt your data - All About Our 1.5 Release

August 13, 2023 · One min read

crul team

Turn SOAP APIs into data sets with ease. Tidy up your data with the melt command. Support for XML data sources and uploads. Are we missing something? Let us know!

Get Crul 1.5

SOAP API support

Make SOAP API requests using the soap command. Transform SOAP APIs into clean, tabular data sets with the benefits of crul's caching, domain policies, and powerful command library.

Melt data

Apply Tidy Data principles through the new melt command. Perfect for nested data when the normalize isn't quite what you are looking for.

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Data Processing with JavaScript and Syntax Highlighting - All About Our 1.4 Release

June 30, 2023 · One min read

crul team

Map, reduce, expand your mind and data using JavaScript and JSON. Are we missing something? Let us know!

Get Crul 1.4

Data Processing with JavaScript

Embed JavaScript for custom data processing within a crul query using the evaluate command. Include your favorite JavaScript data manipulation libraries like mathjs, Lodash or load custom ESM module libraries.

Syntax Highlighting

Beautiful syntax highlighting support for all your JSONs and JavaScripts in the crul query bar. Use the triple back tick ```javascript/json notation and look like a pro.

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Vector Embeddings, Semantic Search, run curl, Query GraphQL, API Docs - All About Our 1.3 Release

June 12, 2023 · 2 min read

crul team

We're going multi-dimensional in this release, query GraphQL API's, run curl scripts, generate Vector Embeddings, query using Semantic Search and developer API Documentation. Are we missing something? Let us know!

Get Crul 1.3

Vector Embeddings and Semantic Search

Dynamically generate Vector embeddings from API and Web data. Persist vector embeddings into a vector database such as pinecone, and semantically query on the fly. Query pre-populated vector databases for performant search.

Run curl Scripts and Query GraphQL APIs

Take your curl and GraphQL games to the next level... Run, paginate, cache, schedule and securely authenticate curl and GraphQL queries. Transform and persist curl and GraphQL differential results to 30+ stores. Generate synthetic datasets and Vector Embeddings using curl and GraphQL results data. Use generative AI to create curl/GraphQL scripts/queries to run.

Developer API Documentation

The crul API allows for programatic access to core crul services and resources. This includes dispatching queries and results retrieval, as well as create, read, update, delete (CRUD) operations on core crul resources such as scheduled queries, credentials, domain policies, and more.

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Chainable GPT Prompts, Synthetic Data Generation - All About Our 1.2 Release

May 4, 2023 · 2 min read

crul team

Whoa, we've included Prompts and Synthetic Data Generation in this release. Are we missing something? Let us know!

Get Crul 1.2

Chainable GPT Prompts

Integrate GPT into your data pipeline. Send, chain and reuse prompts with fine grained sampling, likelihood, penalty, bias and model control.

Seed prompts with API and Web data. Recursively generate prompts from other prompts. Lose your mind.

Synthetic Data Generation

Create synthetic data sets using both real data in combination with fully synthesized values. Use natural language prompts describing the synthetic data sets you would like to generate.

And much much more...

Read more of more...

Join our Community

Come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Proxy, Browser Stealth, Fetch ZIP Archives, HTML Content Extraction - All About Our 1.1 Release

April 4, 2023 · 2 min read

crul team

Holy moly what a wild month it has been! The ticker keeps ticking on downloads and we’ve included our user top requests in this release. Are we missing something? Let us know!

Get Crul 1.1

API/Web Proxy Support

A proxy server can now be used as a gateway between crul and the internet for core Web ( open, requests, scrape, form) and API (api) commands. Proxy servers are a useful way to provide varying levels of functionality, security, and privacy depending on your use case, needs, or company policy.

Support for http, https, socks4, socks5 and pac proxies comes out of the box. Party at Crul’s - bring your own proxy!

Related Links: Proxy web page request || Proxy API request

Headless Browser Stealth

Remotely controlling a web browser leaves traces called browser fingerprints. Browser fingerprints can be used to distinguish a remotely controlled browser from a normal user controlled web browser.

By default, in Crul 1.1.0 ‘Browser Stealth’ mode is enabled, hiding the browser's remote control state by erasing the browser fingerprint that is associated with a non-human user.

Fetch ZIP Archives to Scan and Extract

ZIP archives can now be fetched (api command) with the ability to scan metadata and read/serialize specific files into tabular format.

New commands for HTML content extraction

The new parseHTMLTable command allows for simplified conversion of HTML tables into Crul’s tabular format. Easy to process and export as a csv! The new parseArticle command can turn web pages containing an article into a standardized data structure of article content, headline, author etc.

Related Links: HTML table extraction || HTML article extraction

And much much more...

Read more of more...

Join our Community

We started up the Crul discord channel recently, come hang out and ask us any questions. Some of the features and fixes in this release come from your requests!

Join our discord

Crul’n IT,
Nic and Carl

Postman Collections​

curl Enhancements​

And much much more...​

Join our Community​

Read Parquet Files​

Dataset JSON Schema Validation​

Commnd Bulk Operations​

Parquet Write Compression Options​

And much much more...​

Join our Community​

More Business Apps for Data Lakes​

Export in Parquet Data File Format​

And much much more...​

Join our Community​

Business Apps for Data Lakes​

Query Variables​

API Pagination​

Global controls​

And much much more...​

Join our Community​

What is tidy data?

Tidy data in crul

Melting data

Melting data in crul

More examples from the tidy data paper

Example 1: Billboard top 100​

Example 2: Tuberculosis​

Why use crul for tidy data?

Join our Community​

SOAP API support​

Melt data​

And much much more...​

Join our Community​

Data Processing with JavaScript​

Syntax Highlighting​

And much much more...​

Join our Community​

Vector Embeddings and Semantic Search​

Run curl Scripts and Query GraphQL APIs​

Developer API Documentation​

And much much more...​

Join our Community​

Chainable GPT Prompts​

Synthetic Data Generation​

And much much more...​

Join our Community​

API/Web Proxy Support​

Headless Browser Stealth​

Fetch ZIP Archives to Scan and Extract​

New commands for HTML content extraction​

And much much more...​

Join our Community​

Postman Collections

curl Enhancements

And much much more...

Join our Community

Read Parquet Files

Dataset JSON Schema Validation

Commnd Bulk Operations

Parquet Write Compression Options

And much much more...

Join our Community

More Business Apps for Data Lakes

Export in Parquet Data File Format

And much much more...

Join our Community

Business Apps for Data Lakes

Query Variables

API Pagination

Global controls

And much much more...

Join our Community

Example 1: Billboard top 100

Example 2: Tuberculosis

Join our Community

SOAP API support

Melt data

And much much more...

Join our Community

Data Processing with JavaScript

Syntax Highlighting

And much much more...

Join our Community

Vector Embeddings and Semantic Search

Run curl Scripts and Query GraphQL APIs

Developer API Documentation

And much much more...

Join our Community

Chainable GPT Prompts

Synthetic Data Generation

And much much more...

Join our Community

API/Web Proxy Support

Headless Browser Stealth

Fetch ZIP Archives to Scan and Extract

New commands for HTML content extraction

And much much more...

Join our Community