Table Index
This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Table index #

Table Index files is in the index directory.

Dynamic Bucket Index #

Dynamic bucket index is used to store the correspondence between the hash value of the primary-key and the bucket.

Its structure is very simple, only storing hash values in the file:

HASH_VALUE | HASH_VALUE | HASH_VALUE | HASH_VALUE | …

HASH_VALUE is the hash value of the primary-key. 4 bytes, BIG_ENDIAN.

Deletion Vectors #

Deletion file is used to store the deleted records position for each data file. Each bucket has one deletion file for primary key table.

The deletion file is a binary file, and the format is as follows:

  • First, record version by a byte. Current version is 1.
  • Then, record <size of serialized bin, serialized bin, checksum of serialized bin> in sequence.
  • Size and checksum are BIG_ENDIAN Integer.

For each serialized bin, its serialization format is determined by deletion-vectors.bitmap64. Paimon will use a 32-bit bitmap to store deleted records by default, but if deletion-vectors.bitmap64 is set to true, a 64-bit bitmap will be used. Serialization of the two bitmaps is different. Note that only 64-bit bitmap implementation is compatible with Iceberg.

Serialized bin for 32-bit bitmap:(default)

  • First, record a const magic number by an int (BIG_ENDIAN). Current the magic number is 1581511376.
  • Then, record a 32-bit serialized bitmap. Which is a RoaringBitmap (org.roaringbitmap.RoaringBitmap).

Serialized bin for 64-bit bitmap:

  • First, record a const magic number by an int (LITTLE_ENDIAN). Current the magic number is 1681511377.
  • Then, record a 64-bit serialized bitmap. Which supports positive 64-bit positions (the most significant bit must be 0), but is optimized for cases where most positions fit in 32 bits by using an array of 32-bit Roaring bitmaps. The internal bitmap array is grown as needed to accommodate the largest position. The serialization of the 64-bit bitmap is as follows:
    • First, record the size of bitmaps array by a long (LITTLE_ENDIAN).
    • Then, record the index by an int (LITTLE_ENDIAN) and serialized bytes of each bitmap in the array in sequence.
Edit This Page
Copyright © 2025 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.