Table Index
This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Table index #

Table Index files is in the index directory.

Dynamic Bucket Index #

Dynamic bucket index is used to store the correspondence between the hash value of the primary-key and the bucket.

Its structure is very simple, only storing hash values in the file:

HASH_VALUE | HASH_VALUE | HASH_VALUE | HASH_VALUE | …

HASH_VALUE is the hash value of the primary-key. 4 bytes, BIT_ENDIAN.

Deletion Vectors #

Deletion file is used to store the deleted records position for each data file. Each bucket has one deletion file for primary key table.

The deletion file is a binary file, and the format is as follows:

  • First, record version by a byte. Current version is 1.
  • Then, record <size of serialized bin, serialized bin, checksum of serialized bin> in sequence.
  • Size and checksum are BIT_ENDIAN Integer.

For each serialized bin:

  • First, record a const magic number by an int (BIT_ENDIAN). Current the magic number is 1581511376.
  • Then, record serialized bitmap. Which is a RoaringBitmap (org.roaringbitmap.RoaringBitmap).
Edit This Page
Copyright © 2024 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.