Skip to main content

Global Index

Overview

Global Index is a powerful indexing mechanism for Data Evolution (append) tables. It enables efficient row-level lookups and filtering without full-table scans. Paimon supports multiple global index types:

  • BTree Index: A B-tree based index for scalar column lookups. Supports equality, IN, range predicates, and can be combined across multiple columns with AND/OR logic.
  • Vector Index: An approximate nearest neighbor (ANN) index powered by DiskANN for vector similarity search.
  • Full-Text Index: A full-text search index powered by Tantivy for text retrieval. Supports term matching and relevance scoring.

Global indexes work on top of Data Evolution tables. To use global indexes, your table must have:

  • 'bucket' = '-1' (unaware-bucket mode)
  • 'row-tracking.enabled' = 'true'
  • 'data-evolution.enabled' = 'true'

Global index queries may not be exact when the index only covers part of the table data. If a query predicate matches the index, Paimon returns only the results from the indexed portion. Matching records in data that has not been indexed yet will not be returned.

Prerequisites

Create a table with the required properties:

CREATE TABLE my_table (
id INT,
name STRING,
embedding ARRAY<FLOAT>,
content STRING
) TBLPROPERTIES (
'bucket' = '-1',
'row-tracking.enabled' = 'true',
'data-evolution.enabled' = 'true',
'global-index.enabled' = 'true'
);

BTree Index

BTree index builds a logical B-tree structure over SST files, enabling efficient point lookups and range queries on scalar columns.

Build BTree Index

-- Create BTree index on 'name' column
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'name',
index_type => 'btree'
);

Query with BTree Index

Once a BTree index is built, it is automatically used during scan when a filter predicate matches the indexed column.

SELECT * FROM my_table WHERE name IN ('a200', 'a300');

Vector Index

Vector Index provides approximate nearest neighbor (ANN) search based on the DiskANN algorithm. It is suitable for vector similarity search scenarios such as recommendation systems, image retrieval, and RAG (Retrieval Augmented Generation) applications.

Build Vector Index

-- Create Lumina vector index on 'embedding' column
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'embedding',
index_type => 'lumina',
options => 'lumina.index.dimension=128'
);

The legacy index type lumina-vector-ann is still accepted for existing tables and SQL compatibility.

Vector Search

-- Search for top-5 nearest neighbors
SELECT * FROM vector_search('my_table', 'embedding', array(1.0f, 2.0f, 3.0f), 5);

Full-Text Index

Full-Text Index provides text search capabilities powered by Tantivy. It is suitable for text retrieval scenarios such as document search, log analysis, and content-based filtering.

Build Full-Text Index

-- Create full-text index on 'content' column
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'content',
index_type => 'tantivy-fulltext'
);

Full-Text Search

-- Search for top-10 documents matching the query
SELECT * FROM full_text_search('my_table', 'content', 'paimon lake format', 10);