This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Global Index #
Overview #
Global Index is a powerful indexing mechanism for Data Evolution (append) tables. It enables efficient row-level lookups and filtering without full-table scans. Paimon supports multiple global index types:
- BTree Index: A B-tree based index for scalar column lookups. Supports equality, IN, range predicates, and can be combined across multiple columns with AND/OR logic.
- Vector Index: An approximate nearest neighbor (ANN) index powered by DiskANN for vector similarity search.
Global indexes work on top of Data Evolution tables. To use global indexes, your table must have:
'bucket' = '-1'(unaware-bucket mode)'row-tracking.enabled' = 'true''data-evolution.enabled' = 'true'
Prerequisites #
Create a table with the required properties:
CREATE TABLE my_table (
id INT,
name STRING,
embedding ARRAY<FLOAT>
) TBLPROPERTIES (
'bucket' = '-1',
'row-tracking.enabled' = 'true',
'data-evolution.enabled' = 'true',
'global-index.enabled' = 'true'
);
BTree Index #
BTree index builds a logical B-tree structure over SST files, enabling efficient point lookups and range queries on scalar columns.
Build BTree Index
-- Create BTree index on 'name' column
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'name',
index_type => 'btree'
);
Query with BTree Index
Once a BTree index is built, it is automatically used during scan when a filter predicate matches the indexed column.
SELECT * FROM my_table WHERE name IN ('a200', 'a300');
Vector Index #
Vector Index provides approximate nearest neighbor (ANN) search based on the DiskANN algorithm. It is suitable for vector similarity search scenarios such as recommendation systems, image retrieval, and RAG (Retrieval Augmented Generation) applications.
Build Vector Index
-- Create Lumina vector index on 'embedding' column
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'embedding',
index_type => 'lumina-vector-ann',
options => 'lumina.index.dimension=128'
);
Vector Search
-- Search for top-5 nearest neighbors
SELECT * FROM vector_search('my_table', 'embedding', array(1.0f, 2.0f, 3.0f), 5);
Table table = catalog.getTable(identifier);
// Step 1: Build vector search
float[] queryVector = {1.0f, 2.0f, 3.0f};
GlobalIndexResult result = table.newVectorSearchBuilder()
.withVector(queryVector)
.withLimit(5)
.withVectorColumn("embedding")
.executeLocal();
// Step 2: Read matching rows using the search result
ReadBuilder readBuilder = table.newReadBuilder();
TableScan.Plan plan = readBuilder.newScan().withGlobalIndexResult(result).plan();
try (RecordReader<InternalRow> reader = readBuilder.newRead().createReader(plan)) {
reader.forEachRemaining(row -> {
System.out.println("id=" + row.getInt(0) + ", name=" + row.getString(1));
});
}