This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Compaction #

When more and more records are written into the LSM tree, the number of sorted runs will increase. Because querying an LSM tree requires all sorted runs to be combined, too many sorted runs will result in a poor query performance, or even out of memory.

To limit the number of sorted runs, we have to merge several sorted runs into one big sorted run once in a while. This procedure is called compaction.

However, compaction is a resource intensive procedure which consumes a certain amount of CPU time and disk IO, so too frequent compaction may in turn result in slower writes. It is a trade-off between query and write performance. Paimon currently adopts a compaction strategy similar to Rocksdb’s universal compaction.

Compaction solves:

Reduce Level 0 files to avoid poor query performance.
Produce changelog via changelog-producer.
Produce deletion vectors for MOW mode.
Snapshot Expiration, Tag Expiration, Partitions Expiration.

Limitation:

There can only be one job working on the same partition’s compaction, otherwise it will cause conflicts and one side will throw an exception failure.

Writing performance is almost always affected by compaction, so its tuning is crucial.

Asynchronous Compaction #

Compaction is inherently asynchronous, but if you want it to be completely asynchronous without blocking writes, expecting a mode for maximum writing throughput, the compaction can be done slowly and not in a hurry. You can use the following strategies for your table:

num-sorted-run.stop-trigger = 2147483647
sort-spill-threshold = 10
lookup-wait = false

This configuration will generate more files during peak write periods and gradually merge them for optimal read performance during low write periods.

Dedicated compaction job #

In general, if you expect multiple jobs to be written to the same table, you need to separate the compaction. You can use dedicated compaction job.

Record-Level expire #

In compaction, you can configure record-Level expire time to expire records, you should configure:

'record-level.expire-time': time retain for records.
'record-level.time-field': time field for record level expire.

Expiration happens in compaction, and there is no strong guarantee to expire records in time. You can trigger a full compaction manually to expire records which were not expired in time.

Full Compaction #

Paimon Compaction uses Universal-Compaction. By default, when there is too much incremental data, Full Compaction will be automatically performed. You don’t usually have to worry about it.

Paimon also provides a configuration that allows for regular execution of Full Compaction.

‘compaction.optimization-interval’: Implying how often to perform an optimization full compaction, this configuration is used to ensure the query timeliness of the read-optimized system table.
‘full-compaction.delta-commits’: Full compaction will be constantly triggered after delta commits. Its disadvantage is that it can only perform compaction synchronously, which will affect writing efficiency.

Lookup Compaction #

When primary key table is configured with lookup changelog producer or first-row merge-engine or has enabled deletion vectors for MOW mode, Paimon will use a radical compaction strategy to force compacting level 0 files to higher levels for every compaction trigger.

Paimon also provides configurations to optimize the frequency of this compaction.

‘lookup-compact’: compact mode used for lookup compaction. Possible values: radical, will use ForceUpLevel0Compaction strategy to radically compact new files; gentle, will use UniversalCompaction strategy to gently compact new files;
‘lookup-compact.max-interval’: The max interval for a forced L0 lookup compaction to be triggered in gentle mode. This option is only valid when lookup-compact mode is gentle.

By configuring ‘lookup-compact’ as gentle, new files in L0 will not be compacted immediately, this may greatly reduce the overall resource usage at the expense of worse data freshness in certain cases.

Compaction Options #

Number of Sorted Runs to Pause Writing #

When the number of sorted runs is small, Paimon writers will perform compaction asynchronously in separated threads, so records can be continuously written into the table. However, to avoid unbounded growth of sorted runs, writers will pause writing when the number of sorted runs hits the threshold. The following table property determines the threshold.

Option	Required	Default	Type	Description
num-sorted-run.stop-trigger	No	(none)	Integer	The number of sorted runs that trigger the stopping of writes, the default value is 'num-sorted-run.compaction-trigger' + 3.

Write stalls will become less frequent when num-sorted-run.stop-trigger becomes larger, thus improving writing performance. However, if this value becomes too large, more memory and CPU time will be needed when querying the table. If you are concerned about the OOM problem, please configure the following option. Its value depends on your memory size.

Option	Required	Default	Type	Description
sort-spill-threshold	No	(none)	Integer	If the maximum number of sort readers exceeds this value, a spill will be attempted. This prevents too many readers from consuming too much memory and causing OOM.

Number of Sorted Runs to Trigger Compaction #

Paimon uses LSM tree which supports a large number of updates. LSM organizes files in several sorted runs. When querying records from an LSM tree, all sorted runs must be combined to produce a complete view of all records.

One can easily see that too many sorted runs will result in poor query performance. To keep the number of sorted runs in a reasonable range, Paimon writers will automatically perform compactions. The following table property determines the minimum number of sorted runs to trigger a compaction.

Option	Required	Default	Type	Description
num-sorted-run.compaction-trigger	No	5	Integer	The sorted run number to trigger compaction. Includes level0 files (one file one sorted run) and high-level runs (one level one sorted run).

Compaction will become less frequent when num-sorted-run.compaction-trigger becomes larger, thus improving writing performance. However, if this value becomes too large, more memory and CPU time will be needed when querying the table. This is a trade-off between writing and query performance.