Skip to main content

Bitmap Index

A bitmap index maps each distinct scalar value to a compressed 64-bit row-id bitmap. Use it for enum-like dimensions and tag columns where queries often use equality, IN, complement, or null predicates, for example status, country, tenant_type, or business tags.

Compared with BTree Index, bitmap index is optimized for set operations over exact row-id bitmaps. It is usually a better fit for low-cardinality or medium-cardinality dimensions, especially when IN, NOT IN, or != predicates are common. BTree index is still better for range predicates and high-cardinality columns where each value has only a few rows and range pruning is important.

Supported predicate shapes include:

PredicateExample
Equalitytag = 'vip'
INtag IN ('vip', 'trial')
Not equaltag != 'blocked'
NOT INtag NOT IN ('blocked', 'test')
Null checkstag IS NULL, tag IS NOT NULL
String predicatestag LIKE 'vip%', tag LIKE '%vip%', tag startsWith 'vip', tag contains 'vip'
Range predicatestag >= 'a', tag BETWEEN 'a' AND 'm'
AND / OR combinationstag = 'vip' OR tag = 'trial'

Equality, IN, null checks, and string prefix predicates use direct dictionary lookup. Other predicates such as endsWith, contains, general LIKE, and range predicates fall back to scanning bitmap dictionary entries only when the total size of candidate bitmap index files is within the configured fallback scan budget. If the budget is exceeded, Paimon falls back to other matching indexes or regular table scans.

Build Bitmap Index

-- Create bitmap index on 'tag' column
CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'tag',
index_type => 'bitmap'
);

You can build only selected partitions:

CALL sys.create_global_index(
table => 'db.my_table',
index_column => 'tag',
index_type => 'bitmap',
partitions => 'dt=2026-06-18;dt=2026-06-19'
);

Bitmap indexes share the sorted index build path with BTree indexes. Use sorted-index.records-per-range to control the expected records per generated index file, and sorted-index.build.max-parallelism to cap Flink or Spark build parallelism. The legacy btree-index.records-per-range and btree-index.build.max-parallelism keys are still recognized as fallback keys.

Bitmap Options

OptionDefaultDescription
sorted-index.records-per-range10000000Expected number of records per sorted global index file for BTree and Bitmap builds.
sorted-index.build.max-parallelism4096Maximum Flink or Spark parallelism for building sorted global indexes.
bitmap-index.dictionary-block-size16 kbTarget size of dictionary blocks in bitmap global index files. Smaller blocks reduce dictionary read amplification for high-cardinality columns; larger blocks reduce dictionary block index size.
bitmap-index.compressionnoneCompression algorithm for bitmap dictionary blocks and the dictionary block index. Supported values are the same block codecs as BTree index, such as none, lz4, lzo, and zstd.
bitmap-index.compression-level1Compression level used by codecs that support levels, such as zstd.
bitmap-index.fallback-scan-max-size256 mbMaximum total size of bitmap global index files in one reader to allow fallback dictionary scans for predicates that cannot use direct bitmap lookup. Set to 0 b to disable fallback scans.

Query with Bitmap Index

Once a bitmap index is built, it is automatically used during scan when a filter predicate matches the indexed column.

SELECT * FROM my_table WHERE tag IN ('vip', 'trial');

For complement predicates such as tag != 'blocked' or tag NOT IN ('blocked', 'test'), bitmap index evaluates the complement against each index file's own non-null row-id bitmap. This keeps results correct when one logical query unions multiple bitmap index files.

File Format

A bitmap global index file stores exact row-id bitmaps and a block-indexed dictionary. The reader opens a file by reading only its fixed-length footer. Null row sets, non-null row sets, and the dictionary block index are loaded lazily when a matching predicate needs them. Point lookups then read the dictionary block containing the target value and the corresponding bitmap block. The format is footer-driven: the footer stores the version, magic number, and offsets to all metadata blocks.

+----------------------------------------------+
| null rows bitmap block |
+----------------------------------------------+
| non-null rows bitmap block |
+----------------------------------------------+
| value bitmap and dictionary payload blocks |
+----------------------------------------------+
| ... |
+----------------------------------------------+
| dictionary block index |
+----------------------------------------------+
| footer |
+----------------------------------------------+

Each bitmap block is a serialized RoaringNavigableMap64. Bitmap blocks are not wrapped with an additional compression layer because Roaring already stores row ids compactly. Value bitmap blocks are written for non-null values in serialized-key order. Dictionary blocks are emitted as groups reach the configured target size, so the physical payload area can contain both value bitmap blocks and dictionary blocks. Readers use the stored offset and length fields and do not require value bitmap blocks or dictionary blocks to be physically contiguous.

Each dictionary block stores sorted value entries. Keys are serialized with the same key serializer as BTree global index keys, and are ordered by serialized bytes:

+----------------------------------------------+
| entry count (var-length int) |
+----------------------------------------------+
| key length (var-length int), key bytes |
| bitmap block offset (var-length long) |
| bitmap block length (var-length int) |
+----------------------------------------------+
| ... |
+----------------------------------------------+

The dictionary block body above is stored as a BTree-style compressed block. The configured dictionary compression is attempted when writing, and the uncompressed body is kept if compression does not save enough space. A 5-byte block trailer follows each dictionary block body and records the actual compression type and CRC. The stored dictionary block length is the block body length and does not include the trailer.

The dictionary block index stores one entry per dictionary block and is small enough to load when the index file is opened:

+----------------------------------------------+
| block count (var-length int) |
+----------------------------------------------+
| first key length (var-length int), key bytes |
| dictionary block offset (var-length long) |
| dictionary block length (var-length int) |
+----------------------------------------------+
| ... |
+----------------------------------------------+

The dictionary block index uses the same compressed-block encoding and 5-byte trailer as dictionary blocks. The footer's dictionary block index length also excludes this trailer.

The footer has fixed length and points to the main metadata blocks:

+----------------------------------------------+
| null rows block offset (long) |
| null rows block length (int) |
| non-null rows block offset (long) |
| non-null rows block length (int) |
| dictionary block index offset (long) |
| dictionary block index length (int) |
| value count (int) |
| version (int) = 1 |
| magic (int) |
+----------------------------------------------+

The sorted dictionary block index lets equality and IN predicates locate candidate values by binary search without deserializing every dictionary block or value bitmap in the file. Null checks can read only the needed null/non-null bitmap.

Each bitmap index file also stores manifest-level metadata with the logical minimum non-null key, maximum non-null key, and whether the file contains null values. The scanner uses this metadata to skip impossible files before opening bitmap index files, similar to BTree global index file pruning.

+----------------------------------------------+
| first key length (int), first key bytes |
| last key length (int), last key bytes |
| has nulls (byte) |
| metadata version (byte) = 1 |
| null key flags (byte) |
+----------------------------------------------+

For predicates that cannot be resolved by point or prefix lookup, the reader may scan all dictionary blocks and read matching bitmap blocks. This fallback is guarded by bitmap-index.fallback-scan-max-size, which compares against the total size of candidate bitmap index files handled by the reader. This keeps point lookup read amplification bounded for high-cardinality tag or dimension columns while allowing small bitmap indexes to answer broader predicates directly.