A list of tools that generally make life easier when working with data.
Things I use and recommend
- jq is a lightweight and flexible command-line JSON processor.
- csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
- gron gron transforms JSON into discrete assignments to make it easier to grep for what you want and see the absolute ‘path’ to it. It eases the exploration of APIs that return large blobs of JSON but have terrible documentation. Its primary purpose is to make it easy to find the path to a value in a deeply nested JSON blob when you don’t already know the structure; much of jq’s power is unlocked only once you know that structure.
yq a lightweight and portable command-line YAML processor. yq uses jq like syntax but works with yaml files as well as json.
I find it useful to work with
terraform show -json ~/tmp/tfplan.
- q is a command line tool that allows direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files). * vscode-edit-csv is an extension for Visual Studio Code that allows you to edit csv files with an Excel-like table UI.
Things I haven’t tried yet
- Data Retriever automates the first steps in the data analysis pipeline by downloading, cleaning, and standardizing datasets, and importing them into relational databases, flat files, or programming languages.
- VisiData is a terminal interface for exploring and arranging tabular data.
- SQLFluff is a dialect-flexible and configurable SQL linter.
- angle-grinder allows you to parse, aggregate, sum, average, min/max, percentile, and sort your data. You can see it, live-updating, in your terminal.
- immudb is a database with built-in cryptographic proof and verification. It can track changes in sensitive data and the integrity of the history will be protected by the clients, without the need to trust the server. It can operate as a key-value store or as relational database (SQL).
- lux is a Python library that facilitate fast and easy data exploration by automating the visualization and data analysis process. By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset. Visualizations are displayed via an interactive widget that enables users to quickly browse through large collections of visualizations and make sense of their data. Blog. Demo
- NoCoDB turns any MySQL, PostgreSQL, SQL Server, SQLite & MariaDB into a smart-spreadsheet.
- hadolint is Dockerfile linter that also uses Shellcheck to parse inline Bash code.
- Nushell is a new shell inspired by Powershell, functional programming and modern CLI tools.
- jc JSONifies the output of many CLI tools and file-types for easier parsing in scripts. See the Parsers section for supported commands and file-types.
- jtbl accepts piped JSON data from stdin and outputs a text table representation to stdout
- bpfcc-tools contains various Linux kernel tracing tools. For example, execsnoop can list all executed processes while it runs.
- htmlq is like jq but for html.
- RESTler is a stateful rest API fuzzer.
- Turbolift is a simple tool to help apply changes across many GitHub repositories simultaneously. Perhaps similar to clustergit?
- [sqlean], extra functions for sqlite