CLI Reference #

Auto-generated - do not edit by hand. Run ./docs/generate-cli-reference.sh to regenerate.

pq #

A Parquet Swiss Army Knife - inspect, query, transform, and view Parquet files

Usage: pq [OPTIONS] <COMMAND>

Viewer:
  view          Interactive TUI data viewer (default)

Metadata:
  info          Display file summary (size, rows, schema, compression)
  schema        Display schema (tree, json-schema, arrow, ddl, pyarrow)
  stats         Display column statistics (min, max, nulls, distinct)
  layout        Display physical layout (row groups, pages)
  validate      Validate file integrity

Data:
  cat           Dump rows
  head          Show first N rows
  tail          Show last N rows
  sample        Show random N rows
  count         Fast row count
  grep          Search rows by regex

Query:
  sql           Execute SQL via DataFusion
  jq            Apply jq expressions

Transform:
  select        Project columns
  slice         Extract row range
  merge         Combine files
  split         Split file

I/O:
  import        Import CSV/JSON/JSONL to Parquet
  export        Export Parquet to CSV/JSON/JSONL

  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help
  -V, --version                 Print version



Examples:
  pq data.parquet                              # open in TUI viewer
  pq info data.parquet
  pq cat data.parquet --limit 100
  pq sql "SELECT count(*) FROM 'data.parquet'"
  pq jq data.parquet '.name'

pq view #

Interactive TUI data viewer (default when a file is given without a subcommand)

Usage: pq view [OPTIONS] <FILE>

Arguments:
  <FILE>  Parquet file path

Options:
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq info #

Display file summary (size, rows, schema, compression, metadata)

Usage: pq info [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq schema #

Display schema in various formats.

Styles:
  tree        Indented tree (default)
  json        JSON object
  json-schema JSON Schema
  arrow       Arrow type names
  ddl         PostgreSQL-compatible CREATE TABLE
  pyarrow     Python PyArrow schema constructor

Usage: pq schema [OPTIONS] <FILES>...

Arguments:
  <FILES>...
          Parquet file path(s)

Options:
  -s, --style <STYLE>
          Schema style

          Possible values:
          - tree
          - json
          - json-schema
          - arrow
          - ddl:         PostgreSQL-compatible DDL (CREATE TABLE)
          - pyarrow

          [default: tree]

  -f, --format <OUTPUT_FORMAT>
          Output format (table, json, jsonl, csv, plain)

          [possible values: json, jsonl, csv, table, plain]

      --color <COLOR>
          Color output

          [default: auto]
          [possible values: auto, always, never]

  -q, --quiet
          Suppress non-essential output

  -v, --verbose
          Increase verbosity

  -h, --help
          Print help (see a summary with '-h')

pq stats #

Display column statistics (min, max, nulls, distinct)

Usage: pq stats [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
      --describe                   Include data-level statistics (min, max, mean, stddev, distinct,
                                   top-K)
      --top <TOP>                  Number of top frequent values to show per column (with
                                   --describe) [default: 5]
      --sample-size <SAMPLE_SIZE>  Maximum rows to read for --describe (0 = all rows) [default:
                                   100000]
  -f, --format <OUTPUT_FORMAT>     Output format (table, json, jsonl, csv, plain) [possible values:
                                   json, jsonl, csv, table, plain]
      --color <COLOR>              Color output [default: auto] [possible values: auto, always,
                                   never]
  -q, --quiet                      Suppress non-essential output
  -v, --verbose                    Increase verbosity
  -h, --help                       Print help

pq layout #

Display physical layout (row groups, column chunks, pages)

Usage: pq layout [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq validate #

Validate Parquet file integrity (footer, schema, statistics)

Usage: pq validate [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq cat #

Dump rows from a Parquet file

Usage: pq cat [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -l, --limit <LIMIT>           Maximum number of rows to output
  -o, --offset <OFFSET>         Number of rows to skip
  -c, --columns <COLUMNS>       Columns to include (comma-separated)
  -w, --where <WHERE_CLAUSE>    SQL WHERE clause to filter rows
      --jq <JQ>                 jq expression to apply to each row
  -O, --output <OUTPUT>         Write output to a file (format auto-detected from extension:
                                .parquet, .json, .jsonl, .csv)
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq head #

Show first N rows (default 10)

Usage: pq head [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -n, --lines <LINES>           Number of rows to show [default: 10]
  -c, --columns <COLUMNS>       Columns to include (comma-separated)
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq tail #

Show last N rows (default 10)

Usage: pq tail [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -n, --lines <LINES>           Number of rows to show [default: 10]
  -c, --columns <COLUMNS>       Columns to include (comma-separated)
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq sample #

Show random N rows

Usage: pq sample [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -n, --lines <LINES>           Number of rows to sample [default: 10]
      --seed <SEED>             Random seed for reproducibility
  -c, --columns <COLUMNS>       Columns to include (comma-separated)
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq count #

Fast row count (metadata-only when possible)

Usage: pq count [OPTIONS] [FILES]...

Arguments:
  [FILES]...  Parquet file paths

Options:
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq grep #

Search rows matching a regex pattern across all columns.

By default, searches all string-representable columns. Use -c to restrict
to specific columns. Returns matching rows in the output format.

Examples:
  pq grep data.parquet 'error|warn'           # regex across all columns
  pq grep data.parquet 'alice' -i             # case-insensitive
  pq grep data.parquet '404' -c status,code   # search specific columns
  pq grep data.parquet 'timeout' --limit 10   # first 10 matches

Usage: pq grep [OPTIONS] <FILES>... <PATTERN>

Arguments:
  <FILES>...
          Parquet file path(s)

  <PATTERN>
          Regex pattern to search for

Options:
  -c, --columns <COLUMNS>
          Columns to search (comma-separated; default: all)

  -l, --limit <LIMIT>
          Maximum number of matching rows to return

  -i, --ignore-case
          Case-insensitive matching

  -f, --format <OUTPUT_FORMAT>
          Output format (table, json, jsonl, csv, plain)

          [possible values: json, jsonl, csv, table, plain]

      --color <COLOR>
          Color output

          [default: auto]
          [possible values: auto, always, never]

  -q, --quiet
          Suppress non-essential output

  -v, --verbose
          Increase verbosity

  -h, --help
          Print help (see a summary with '-h')

pq sql #

Execute SQL queries on Parquet files using Apache DataFusion.

Files are referenced directly in the FROM clause using single-quoted paths.
Glob patterns (e.g., 'logs/*.parquet') are supported.

Examples:
  pq sql "SELECT * FROM 'data.parquet' LIMIT 10"
  pq sql "SELECT city, count(*) FROM 'data.parquet' GROUP BY city"
  pq sql "SELECT a.id, b.name FROM 'a.parquet' a JOIN 'b.parquet' b ON a.id = b.id"
  pq sql "SELECT * FROM 'logs/*.parquet' WHERE level = 'ERROR'"

SQL reference: https://datafusion.apache.org/user-guide/sql/index.html

Usage: pq sql [OPTIONS] [QUERY]

Arguments:
  [QUERY]
          SQL query (files can be referenced directly in FROM clause)

Options:
  -o, --output <OUTPUT>
          Write output to a file (format auto-detected from extension: .parquet, .json, .jsonl,
          .csv)

  -f, --format <OUTPUT_FORMAT>
          Output format (table, json, jsonl, csv, plain)

          [possible values: json, jsonl, csv, table, plain]

      --color <COLOR>
          Color output

          [default: auto]
          [possible values: auto, always, never]

  -q, --quiet
          Suppress non-essential output

  -v, --verbose
          Increase verbosity

  -h, --help
          Print help (see a summary with '-h')

pq jq #

Apply jq expressions to Parquet data.

Each row is processed as a JSON object. Use --slurp to collect all rows
into an array first.

Examples:
  pq jq data.parquet '.name'                           # extract field
  pq jq data.parquet '{name, age}' -r                  # construct objects
  pq jq data.parquet 'select(.age > 30)'               # filter rows
  pq jq data.parquet '[.orders[].price] | add'         # nested aggregation
  pq jq data.parquet 'group_by(.city) | map({city: .[0].city, n: length})' -s

jq reference: https://jqlang.github.io/jq/manual/

Usage: pq jq [OPTIONS] <FILES>... <FILTER>

Arguments:
  <FILES>...
          Parquet file path(s)

  <FILTER>
          jq filter expression

Options:
  -s, --slurp
          Read all rows into an array before filtering

  -r, --raw-output
          Output raw strings without JSON quoting

  -o, --output <OUTPUT>
          Write output to a file (format auto-detected from extension: .parquet, .json, .jsonl,
          .csv)

  -f, --format <OUTPUT_FORMAT>
          Output format (table, json, jsonl, csv, plain)

          [possible values: json, jsonl, csv, table, plain]

      --color <COLOR>
          Color output

          [default: auto]
          [possible values: auto, always, never]

  -q, --quiet
          Suppress non-essential output

  -v, --verbose
          Increase verbosity

  -h, --help
          Print help (see a summary with '-h')

pq select #

Project columns into a new Parquet file.

Examples:
  pq select data.parquet -c id,name -o subset.parquet
  pq select data.parquet -c 'id,name,address' -o subset.parquet

Usage: pq select [OPTIONS] --columns <COLUMNS> --output <OUTPUT> <FILE>

Arguments:
  <FILE>
          Input Parquet file

Options:
  -c, --columns <COLUMNS>
          Columns to include (comma-separated)

  -o, --output <OUTPUT>
          Output file path

  -f, --format <OUTPUT_FORMAT>
          Output format (table, json, jsonl, csv, plain)

          [possible values: json, jsonl, csv, table, plain]

      --color <COLOR>
          Color output

          [default: auto]
          [possible values: auto, always, never]

  -q, --quiet
          Suppress non-essential output

  -v, --verbose
          Increase verbosity

  -h, --help
          Print help (see a summary with '-h')

pq slice #

Extract a row range into a new Parquet file

Usage: pq slice [OPTIONS] --limit <LIMIT> --output <OUTPUT> <FILE>

Arguments:
  <FILE>  Input Parquet file

Options:
      --offset <OFFSET>         Start offset [default: 0]
      --limit <LIMIT>           Number of rows to extract
  -o, --output <OUTPUT>         Output file path
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq merge #

Combine multiple Parquet files into one

Usage: pq merge [OPTIONS] --output <OUTPUT> [FILES]...

Arguments:
  [FILES]...  Input Parquet files

Options:
  -o, --output <OUTPUT>            Output file path
      --schema-mode <SCHEMA_MODE>  Schema reconciliation mode [default: strict] [possible values:
                                   strict, union, intersect]
  -f, --format <OUTPUT_FORMAT>     Output format (table, json, jsonl, csv, plain) [possible values:
                                   json, jsonl, csv, table, plain]
      --color <COLOR>              Color output [default: auto] [possible values: auto, always,
                                   never]
  -q, --quiet                      Suppress non-essential output
  -v, --verbose                    Increase verbosity
  -h, --help                       Print help

pq split #

Split a Parquet file into multiple files

Usage: pq split [OPTIONS] --output <OUTPUT> <FILE>

Arguments:
  <FILE>  Input Parquet file

Options:
      --rows <ROWS>                  Number of rows per output file
      --partition-by <PARTITION_BY>  Column(s) to partition by (comma-separated, Hive-style output)
  -o, --output <OUTPUT>              Output directory
  -f, --format <OUTPUT_FORMAT>       Output format (table, json, jsonl, csv, plain) [possible
                                     values: json, jsonl, csv, table, plain]
      --color <COLOR>                Color output [default: auto] [possible values: auto, always,
                                     never]
  -q, --quiet                        Suppress non-essential output
  -v, --verbose                      Increase verbosity
  -h, --help                         Print help

pq import #

Import CSV/JSON/JSONL into Parquet format

Usage: pq import [OPTIONS] --output <OUTPUT> <INPUT>

Arguments:
  <INPUT>  Input file (JSON, JSONL, or CSV)

Options:
  -o, --output <OUTPUT>              Output Parquet file path
  -F, --input-format <INPUT_FORMAT>  Input format (auto-detected from extension if not specified)
                                     [possible values: json, jsonl, csv]
  -f, --format <OUTPUT_FORMAT>       Output format (table, json, jsonl, csv, plain) [possible
                                     values: json, jsonl, csv, table, plain]
      --color <COLOR>                Color output [default: auto] [possible values: auto, always,
                                     never]
  -q, --quiet                        Suppress non-essential output
  -v, --verbose                      Increase verbosity
  -h, --help                         Print help

pq export #

Export Parquet data to CSV, JSON, or JSONL

Usage: pq export [OPTIONS] <FILES>...

Arguments:
  <FILES>...  Parquet file path(s)

Options:
  -o, --output <OUTPUT>         Output file path (default: stdout)
  -l, --limit <LIMIT>           Maximum number of rows to export
  -f, --format <OUTPUT_FORMAT>  Output format (table, json, jsonl, csv, plain) [possible values:
                                json, jsonl, csv, table, plain]
      --color <COLOR>           Color output [default: auto] [possible values: auto, always, never]
  -q, --quiet                   Suppress non-essential output
  -v, --verbose                 Increase verbosity
  -h, --help                    Print help

pq completions #

Generate shell completions

Add to your ~/.zshrc (zsh):
  eval "$(pq completions zsh)"

Add to your ~/.bashrc (bash):
  eval "$(pq completions bash)"

Add to your ~/.config/fish/config.fish (fish):
  pq completions fish | source

Usage: pq completions [OPTIONS] <SHELL>

Arguments:
  <SHELL>
          Shell to generate completions for

          [possible values: bash, elvish, fish, powershell, zsh]

Options:
  -f, --format <OUTPUT_FORMAT>
          Output format (table, json, jsonl, csv, plain)

          [possible values: json, jsonl, csv, table, plain]

      --color <COLOR>
          Color output

          [default: auto]
          [possible values: auto, always, never]

  -q, --quiet
          Suppress non-essential output

  -v, --verbose
          Increase verbosity

  -h, --help
          Print help (see a summary with '-h')