CLI

Inspect Mosaic files from the terminal with the mosaic binary. A native, JVM-free toolkit driving the read-only MosaicReader API.

Install

# run from source
cargo run -p paimon-mosaic-cli -- schema data.mosaic

# install the `mosaic` binary
cargo install --path cli
mosaic schema data.mosaic

Commands

Command	Shows	Reads
`schema`	column names, Arrow types, nullability, bucket	footer only
`meta`	row groups, rows, per-column stats	footer + index
`footer`	magic, version, buckets, compression	footer only
`buckets`	per-bucket layout and member columns	footer + index
`pages`	per-column encoding + slot size	bucket data
`dictionary`	dictionary entries of a dict column	bucket data
`column-size`	on-disk bytes per column	footer + index + paged directories
`cat` / `head`	cat: all rows by default (`-n` to limit); head: first 10; `-c`, `--where`	column data
`count`	total row count	footer + index
`convert`	import CSV or JSON lines into a new Mosaic file	writes file

Inspection and query commands accept --json (convert writes a file). cat scans all rows by default (-n to limit); head prints 10 rows by default. cat/head/pages/column-size take -c a,b; dictionary takes -c <col>.

schema

Columns, Arrow types, nullability and bucket assignment, in original input order. Footer only.

$ mosaic schema data.mosaic
5 columns, 4 buckets
  id: Int32 not null [bucket 0]
  name: Utf8 [bucket 2]
  kind: Utf8 [bucket 1]
  score: Float64 [bucket 3]
  flag: Int32 [bucket 0]

$ mosaic schema data.mosaic --json
{"columns":5,"buckets":4,"fields":[{"name":"id","type":"Int32","nullable":false,"bucket":0}, ...]}

footer

The 32-byte file footer: magic, format version, bucket count, row groups and compression.

$ mosaic footer data.mosaic
magic=MOSA version=1 buckets=4 row_groups=1 compression=zstd

buckets

Per row group, each bucket's layout (empty / monolithic / paged), on-disk size and member columns. Mosaic groups columns into buckets by name order. Monolithic buckets also report uncompressed size and ratio.

$ mosaic buckets data.mosaic
row group 0:
    bucket 0: monolithic 27B (uncompressed 59 B, 2.19x) [kind]
    bucket 1: paged 373B [flag, id]
    bucket 2: paged 220B [name]
    bucket 3: paged 542B [score]

pages

Per-column physical encoding (plain / const / dict / all_null) and on-disk slot size.

$ mosaic pages data.mosaic
row group 0:
    flag: bucket 0 encoding=const slot=16B
    id: bucket 0 encoding=plain slot=349B
    kind: bucket 1 encoding=dict slot=28B
    name: bucket 2 encoding=plain slot=216B
    score: bucket 3 encoding=plain slot=538B

dictionary

Dump the dictionary of a dict-encoded column. Non-dict columns report as such.

$ mosaic dictionary data.mosaic -c kind
row group 0: 3 entries
    0: a
    1: b
    2: c

$ mosaic dictionary data.mosaic -c kind --json
{"column":"kind","row_groups":[["a","b","c"]]}

column-size

On-disk bytes per column. Paged buckets give exact per-column sizes; a multi-column monolithic bucket is split evenly and marked (approx).

$ mosaic column-size data.mosaic
  id: 349 B
  name: 216 B
  kind: 28 B
  total: 593 B

cat / head

cat scans all rows by default (-n to limit); head prints the first 10 rows by default. -c projects columns, --where filters rows (one condition: = != > >= < <=; integers and floats compare exactly so =0.3 only matches a stored 0.3; Date32 accepts epoch-day or YYYY-MM-DD; row groups whose stats exclude the predicate are skipped), --json emits newline-delimited JSON.

$ mosaic cat data.mosaic -n 2
+----+--------+------+-------+------+
| id | name   | kind | score | flag |
+----+--------+------+-------+------+
| 0  | user_0 | a    | 0     | 7    |
| 1  | user_1 | b    | 1.5   | 7    |
+----+--------+------+-------+------+

$ mosaic cat data.mosaic -n 2 -c name,score   # projection

$ mosaic cat data.mosaic -n 2 --json
{"id":0,"name":"user_0","kind":"a","score":0,"flag":7}
{"id":1,"name":"user_1","kind":"b","score":1.5,"flag":7}

$ mosaic cat data.mosaic --where "kind=a"   # all matching rows
$ mosaic head data.mosaic --json             # preview rows

count

Total row count across all row groups.

$ mosaic count data.mosaic
200

convert

Import a CSV (with header) or JSON lines (one object per line) into a new Mosaic file; the schema is inferred. --stats id,score builds min/max stats for those columns, which cat --where then uses to skip non-matching row groups. Refuses to replace an existing output unless --overwrite is given.

$ mosaic convert data.csv -o data.mosaic --stats id
wrote data.mosaic (200 rows, 5 columns)
$ mosaic convert data.ndjson -o data.mosaic
wrote data.mosaic (200 rows, 5 columns)

Embedding instead For C/C++ or Java callers, embed the format directly via the ffi (mosaic.h) or jni crates rather than shelling out to this CLI.

CLI

Install

Commands

schema

meta

footer

buckets

pages

dictionary

column-size

cat / head

count

convert