Read Optimized #

Overview #

For Primary Key Table, it’s a ‘MergeOnRead’ technology. When reading data, multiple layers of LSM data are merged, and the number of parallelism will be limited by the number of buckets. Although Paimon’s merge performance is efficient, it still cannot catch up with the ordinary AppendOnly table.

We recommend that you use Deletion Vectors mode.

If you don’t want to use Deletion Vectors mode, you want to query fast enough in certain scenarios, but can only find older data, you can also:

Configure ‘compaction.optimization-interval’ when writing data. For streaming jobs, optimized compaction will then be performed periodically; For batch jobs, optimized compaction will be carried out when the job ends. (Or configure 'full-compaction.delta-commits', its disadvantage is that it can only perform compaction synchronously, which will affect writing efficiency)
Query from read-optimized system table. Reading from results of optimized files avoids merging records with the same key, thus improving reading performance.

You can flexibly balance query performance and data latency when reading.