This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Expire Partition
Expiring Partitions #
You can set partition.expiration-time
when creating a partitioned table. Paimon streaming sink will periodically check
the status of partitions and delete expired partitions according to time.
How to determine whether a partition has expired: you can set partition.expiration-strategy
when creating a partitioned table,
this strategy determines how to extract the partition time and compare it with the current time to see if survival time
has exceeded the partition.expiration-time
. Expiration strategy supported values are:
values-time
: The strategy compares the time extracted from the partition value with the current time, this strategy as the default.update-time
: The strategy compares the last update time of the partition with the current time. What is the scenario for this strategy:- Your partition value is non-date formatted.
- You only want to keep data that has been updated in the last n days/months/years.
- Data initialization imports a large amount of historical data.
Note: After the partition expires, it is logically deleted and the latest snapshot cannot query its data. But the files in the file system are not immediately physically deleted, it depends on when the corresponding snapshot expires. See Expire Snapshots.
An example for single partition field:
values-time
strategy.
CREATE TABLE t (...) PARTITIONED BY (dt) WITH (
'partition.expiration-time' = '7 d',
'partition.expiration-check-interval' = '1 d',
'partition.timestamp-formatter' = 'yyyyMMdd' -- this is required in `values-time` strategy.
);
-- Let's say now the date is 2024-07-09,so before the date of 2024-07-02 will expire.
insert into t values('pk', '2024-07-01');
-- An example for multiple partition fields
CREATE TABLE t (...) PARTITIONED BY (other_key, dt) WITH (
'partition.expiration-time' = '7 d',
'partition.expiration-check-interval' = '1 d',
'partition.timestamp-formatter' = 'yyyyMMdd',
'partition.timestamp-pattern' = '$dt'
);
update-time
strategy.
CREATE TABLE t (...) PARTITIONED BY (dt) WITH (
'partition.expiration-time' = '7 d',
'partition.expiration-check-interval' = '1 d',
'partition.expiration-strategy' = 'update-time'
);
-- The last update time of the partition is now, so it will not expire.
insert into t values('pk', '2024-01-01');
-- Support non-date formatted partition.
insert into t values('pk', 'par-1');
More options:
Option | Default | Type | Description |
---|---|---|---|
partition.expiration-strategy |
values-time | String |
Specifies the expiration strategy for partition expiration.
Possible values:
|
partition.expiration-check-interval |
1 h | Duration | The check interval of partition expiration. |
partition.expiration-time |
(none) | Duration | The expiration interval of a partition. A partition will be expired if it‘s lifetime is over this value. Partition time is extracted from the partition value. |
partition.timestamp-formatter |
(none) | String | The formatter to format timestamp from string. It can be used with 'partition.timestamp-pattern' to create a formatter using the specified value.
|
partition.timestamp-pattern |
(none) | String | You can specify a pattern to get a timestamp from partitions. The formatter pattern is defined by 'partition.timestamp-formatter'.
|
end-input.check-partition-expire |
false | Boolean | Whether check partition expire after batch mode or bounded stream job finish. |