PK Clustering Override

PK Clustering Override #

By default, data files in a primary key table are physically sorted by the primary key. This is optimal for point lookups but can hurt scan performance when queries filter on non-primary-key columns.

PK Clustering Override mode changes the physical sort order of data files from the primary key to user-specified clustering columns. This significantly improves scan performance for queries that filter or group by clustering columns, while still maintaining primary key uniqueness through deletion vectors.

Quick Start #

CREATE TABLE my_table (
    id BIGINT,
    dt STRING,
    city STRING,
    amount DOUBLE,
    PRIMARY KEY (id) NOT ENFORCED
) WITH (
    'pk-clustering-override' = 'true',
    'clustering.columns' = 'city',
    'deletion-vectors.enabled' = 'true',
    'bucket' = '4'
);

After this, data files within each bucket will be physically sorted by city instead of id. Queries like SELECT * FROM my_table WHERE city = 'Beijing' can skip irrelevant data files by checking their min/max statistics on the clustering column.

Requirements #

Option Requirement
pk-clustering-override true
clustering.columns Must be set (one or more non-primary-key columns)
deletion-vectors.enabled Must be true
merge-engine deduplicate (default) or first-row only

When to Use #

PK Clustering Override is beneficial when:

  • Analytical queries frequently filter or aggregate on non-primary-key columns (e.g., WHERE city = 'Beijing').
  • The table uses deduplicate or first-row merge engine.
Although data files are no longer sorted by the primary key, filtering on bucket-key fields (which default to the primary key) still benefits from bucket pruning. The query engine can skip entire buckets that do not contain matching values, so queries like WHERE id = 12345 remain efficient.

Unsupported modes:

  • Merge engine: partial-update or aggregation.
  • Changelog producer: lookup or full-compaction.
  • Configue: sequence.fields or record-level.expire-time.
Edit This Page
Copyright © 2025 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.