pq #
A Parquet Swiss Army Knife. Inspect, query, transform, and view Parquet files from the command line.
# What's in this file?
$ pq info events.parquet
File: events.parquet
Size: 1.2 GiB
Rows: 48,291,037
Row Groups: 12
Columns: 8
Compression: ZSTD
# What columns does it have?
$ pq schema events.parquet
Schema (8 columns):
├── id: int64
├── event: string
├── user_id: int32
├── ts: timestamp(us)
├── city: string
├── device: string
├── duration_ms: int32
╰── payload: struct
├── action: string
╰── metadata: map<string, string>
# Peek at the data
$ pq head events.parquet -n 3
╭────┬───────┬─────────┬──────────────────────┬──────────╮
│ id ┆ event ┆ user_id ┆ ts ┆ city │
╞════╪═══════╪═════════╪══════════════════════╪══════════╡
│ 1 ┆ click ┆ 402 ┆ 2025-01-15T08:23:11Z ┆ Seattle │
├╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ view ┆ 117 ┆ 2025-01-15T08:23:14Z ┆ Portland │
├╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ click ┆ 892 ┆ 2025-01-15T08:23:19Z ┆ Denver │
╰────┴───────┴─────────┴──────────────────────┴──────────╯
# SQL queries - reference files directly in FROM
$ pq sql "SELECT city, count(*) n FROM 'events.parquet' GROUP BY city ORDER BY n DESC LIMIT 3"
╭──────────┬───────╮
│ city ┆ n │
╞══════════╪═══════╡
│ Seattle ┆ 12485 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ Portland ┆ 9712 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ Denver ┆ 8033 │
╰──────────┴───────╯
# jq expressions
$ pq jq events.parquet '{city, event}' | head -2
{"city":"Seattle","event":"click"}
{"city":"Portland","event":"view"}
# Works with remote files too - lazily fetches only the bytes it needs
$ pq count "https://example.com/big-dataset.parquet"
2964624
Features #
Remote Files
HTTPS, S3, GCS, and Azure URLs work everywhere, lazily fetching only the bytes it needs
Output Formats
Pretty tables in a terminal, JSONL when piped, plus JSON, CSV, and plain TSV
Install #
Homebrew (macOS/Linux)
brew install joewalnes/tap/pq
Download binary
Pre-built binaries for macOS and Linux:
# macOS (Apple Silicon)
curl -Lo pq https://github.com/joewalnes/pq/releases/latest/download/pq-darwin-arm64
# Linux (x86_64)
curl -Lo pq https://github.com/joewalnes/pq/releases/latest/download/pq-linux-amd64
# Linux (ARM)
curl -Lo pq https://github.com/joewalnes/pq/releases/latest/download/pq-linux-arm64
Then make it executable and move it to your PATH:
chmod +x pq
sudo mv pq /usr/local/bin/
From source
Requires Rust 1.75+:
git clone https://github.com/joewalnes/pq.git
cd pq
make install # builds release binary, copies to ~/.local/bin/pq
Getting started #
- Interactive Viewer - navigate data with the TUI
- Getting Started tutorial - import, inspect, query, export
- CLI Reference - every command and flag