This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Table Index
Table index #
Table Index files is in the index
directory.
Dynamic Bucket Index #
Dynamic bucket index is used to store the correspondence between the hash value of the primary-key and the bucket.
Its structure is very simple, only storing hash values in the file:
HASH_VALUE | HASH_VALUE | HASH_VALUE | HASH_VALUE | …
HASH_VALUE is the hash value of the primary-key. 4 bytes, BIT_ENDIAN.
Deletion Vectors #
Deletion file is used to store the deleted records position for each data file. Each bucket has one deletion file for primary key table.
The deletion file is a binary file, and the format is as follows:
- First, record version by a byte. Current version is 1.
- Then, record <size of serialized bin, serialized bin, checksum of serialized bin> in sequence.
- Size and checksum are BIT_ENDIAN Integer.
For each serialized bin:
- First, record a const magic number by an int (BIT_ENDIAN). Current the magic number is 1581511376.
- Then, record serialized bitmap. Which is a RoaringBitmap (org.roaringbitmap.RoaringBitmap).