Deletion Vectors
This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Deletion Vectors #

Overview #

The Deletion Vectors mode is designed to takes into account both data reading and writing efficiency.

In this mode, additional overhead (looking up LSM Tree and generating the corresponding Deletion File) will be introduced during writing, but during reading, data can be directly retrieved by employing data with deletion vectors, avoiding additional merge costs between different files.

Furthermore, data reading concurrency is no longer limited, and non-primary key columns can also be used for filter push down. Generally speaking, in this mode, we can get a huge improvement in read performance without losing too much write performance.

Usage #

By specifying 'deletion-vectors.enabled' = 'true', the Deletion Vectors mode can be enabled.

Limitation #

  • changelog-producer needs to be none or lookup.
  • changelog-producer.lookup-wait can’t be false.
  • merge-engine can’t be first-row, because the read of first-row is already no merging, deletion vectors are not needed.
  • This mode will filter the data in level-0, so when using time travel to read APPEND snapshot, there will be data delay.
Edit This Page
Copyright © 2024 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.