BTree Index
BTree index builds a logical B-tree structure over SST files, enabling efficient point lookups and range queries on scalar columns.
Use it for selective predicates on scalar columns such as identifiers, timestamps, categories with
many distinct values, or other columns frequently used in WHERE clauses.
Supported predicate shapes include:
| Predicate | Example |
|---|---|
| Equality | name = 'a200' |
| IN | name IN ('a200', 'a300') |
| Range | price >= 10 AND price < 100 |
| Null checks | name IS NOT NULL |
| AND / OR combinations | name = 'a200' OR name = 'a300' |
LIKE, startsWith, contains, and NOT IN predicates may still need broader index file reads.
Use btree-index.fallback-scan-max-size to cap fallback range and string scans by the total
candidate index file size.
For keyword-style text retrieval, use Full-Text Index instead.
Build BTree Index
-- Create BTree index on 'name' column
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'name',
index_type => 'btree'
);
You can build only selected partitions:
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'name',
index_type => 'btree',
partitions => 'dt=2026-06-18;dt=2026-06-19'
);
BTree Options
| Option | Default | Description |
|---|---|---|
sorted-index.records-per-range | 10000000 | Expected number of records per sorted global index file for BTree and Bitmap builds. |
sorted-index.build.max-parallelism | 4096 | Maximum Flink or Spark parallelism for building sorted global indexes. |
btree-index.block-size | 64 kb | Block size used by BTree index files. |
btree-index.cache-size | 128 mb | Cache size used by BTree index readers. |
btree-index.fallback-scan-max-size | 256 mb | Maximum total size of candidate BTree global index files to allow fallback index scans. Set to 0 b to disable fallback scans. |
btree-index.compression | none | Compression algorithm used by BTree index blocks. |
The legacy btree-index.records-per-range and
btree-index.build.max-parallelism keys are still recognized as fallback keys.
Query with BTree Index
Once a BTree index is built, it is automatically used during scan when a filter predicate matches the indexed column.
- SQL
- Python SDK
SELECT * FROM my_table WHERE name IN ('a200', 'a300');
from pypaimon.common.predicate_builder import PredicateBuilder
table = catalog.get_table("db.my_table")
read_builder = table.new_read_builder()
read_builder = read_builder.with_filter(
PredicateBuilder(table.fields)
.is_in("name", ["a200", "a300"])
)
scan = read_builder.new_scan()
read = read_builder.new_read()
pa_table = read.to_arrow(scan.plan().splits())
print(pa_table)