Example Data #
Public Parquet files hosted at data.pqtool.dev for trying out pq. No
download needed - pq reads remote files lazily, fetching only the bytes it
needs.
Available files #
| File | Rows | Size | Description |
|---|---|---|---|
| orders-10k.parquet | 10,000 | 2 MB | Tiny - instant for any command |
| orders-100k.parquet | 100,000 | 19 MB | Small - good for SQL and jq tutorials |
| orders-100m.parquet | 100,000,000 | 16 GB | Large - demonstrates lazy loading and range requests |
All three files share the same schema: a synthetic e-commerce orders dataset with 30 columns covering a wide range of Parquet types.
Quick start #
# Inspect the schema (fetches only metadata, ~1 KB)
pq schema "https://data.pqtool.dev/orders-10k.parquet"
# Preview rows
pq head "https://data.pqtool.dev/orders-10k.parquet"
# SQL query
pq sql "SELECT status, count(*) n
FROM 'https://data.pqtool.dev/orders-100k.parquet'
GROUP BY status ORDER BY n DESC"
# Count 100 million rows (reads only the footer, ~600 bytes)
pq count "https://data.pqtool.dev/orders-100m.parquet"
Schema #
Schema (30 columns):
├── order_id: int64
├── user_id: int32
├── order_date: date32
├── created_at: timestamp(us)
├── updated_at: timestamp(us) (nullable)
├── status: string
├── is_priority: boolean
├── is_returning_customer: boolean
├── subtotal: float64
├── tax_amount: float32
├── discount_pct: float64
├── total_amount: float64
├── currency: string
├── item_count: int16
├── customer_name: string
├── email: string
├── age: int8 (nullable)
├── loyalty_points: uint32 (nullable)
├── weight_kg: decimal128(10,3)
├── session_token: binary(16)
├── shipping_address: struct
│ ├── street: string
│ ├── city: string
│ ├── state: string
│ ├── zip: string
│ └── country: string
├── billing_address: struct (nullable)
│ ├── street: string
│ ├── city: string
│ ├── state: string
│ ├── zip: string
│ └── country: string
├── tags: list<string> (nullable)
├── ratings: list<int8> (nullable)
├── line_items: list<struct>
│ ├── product_id: int32
│ ├── product_name: string
│ ├── quantity: int16
│ ├── unit_price: float64
│ └── category: string
├── payment: struct
│ ├── method: string
│ ├── card_last_four: string (nullable)
│ └── processor_response: struct
│ ├── code: string
│ ├── message: string
│ └── risk_score: float32
├── metadata: map<string, string>
├── notes: string (nullable)
├── referral_source: string (nullable)
└── session_duration_ms: int64
The dataset includes primitives (int, float, bool, string), temporal types
(date, timestamp), nested structs (including 3-level nesting via
payment.processor_response), lists of scalars and structs, maps, decimals,
fixed-size binary, and columns with varying null rates (5%-80%).
Download locally #
If you want a local copy:
curl -Lo orders-10k.parquet https://data.pqtool.dev/orders-10k.parquet
curl -Lo orders-100k.parquet https://data.pqtool.dev/orders-100k.parquet
The 16 GB file is best used remotely via pq's lazy loading - there's no need to download it.