Expire Partition
This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Expiring Partitions #

You can set partition.expiration-time when creating a partitioned table. Paimon streaming sink will periodically check the status of partitions and delete expired partitions according to time.

How to determine whether a partition has expired: you can set partition.expiration-strategy when creating a partitioned table, this strategy determines how to extract the partition time and compare it with the current time to see if survival time has exceeded the partition.expiration-time. Expiration strategy supported values are:

  • values-time : The strategy compares the time extracted from the partition value with the current time, this strategy as the default.
  • update-time : The strategy compares the last update time of the partition with the current time. What is the scenario for this strategy:
    • Your partition value is non-date formatted.
    • You only want to keep data that has been updated in the last n days/months/years.
    • Data initialization imports a large amount of historical data.
Note: After the partition expires, it is logically deleted and the latest snapshot cannot query its data. But the files in the file system are not immediately physically deleted, it depends on when the corresponding snapshot expires. See Expire Snapshots.

An example for single partition field:

values-time strategy.

CREATE TABLE t (...) PARTITIONED BY (dt) WITH (
    'partition.expiration-time' = '7 d',
    'partition.expiration-check-interval' = '1 d',
    'partition.timestamp-formatter' = 'yyyyMMdd'   -- this is required in `values-time` strategy.
);
-- Let's say now the date is 2024-07-09,so before the date of 2024-07-02 will expire.
insert into t values('pk', '2024-07-01');

-- An example for multiple partition fields
CREATE TABLE t (...) PARTITIONED BY (other_key, dt) WITH (
    'partition.expiration-time' = '7 d',
    'partition.expiration-check-interval' = '1 d',
    'partition.timestamp-formatter' = 'yyyyMMdd',
    'partition.timestamp-pattern' = '$dt'
);

update-time strategy.

CREATE TABLE t (...) PARTITIONED BY (dt) WITH (
    'partition.expiration-time' = '7 d',
    'partition.expiration-check-interval' = '1 d',
    'partition.expiration-strategy' = 'update-time'
);

-- The last update time of the partition is now, so it will not expire.
insert into t values('pk', '2024-01-01');
-- Support non-date formatted partition.
insert into t values('pk', 'par-1'); 

More options:

Option Default Type Description
partition.expiration-strategy
values-time String Specifies the expiration strategy for partition expiration. Possible values:
  • values-time: The strategy compares the time extracted from the partition value with the current time.
  • update-time: The strategy compares the last update time of the partition with the current time.
  • partition.expiration-check-interval
    1 h Duration The check interval of partition expiration.
    partition.expiration-time
    (none) Duration The expiration interval of a partition. A partition will be expired if it‘s lifetime is over this value. Partition time is extracted from the partition value.
    partition.timestamp-formatter
    (none) String The formatter to format timestamp from string. It can be used with 'partition.timestamp-pattern' to create a formatter using the specified value.
    • Default formatter is 'yyyy-MM-dd HH:mm:ss' and 'yyyy-MM-dd'.
    • Supports multiple partition fields like '$year-$month-$day $hour:00:00'.
    • The timestamp-formatter is compatible with Java's DateTimeFormatter.
    partition.timestamp-pattern
    (none) String You can specify a pattern to get a timestamp from partitions. The formatter pattern is defined by 'partition.timestamp-formatter'.
    • By default, read from the first field.
    • If the timestamp in the partition is a single field called 'dt', you can use '$dt'.
    • If it is spread across multiple fields for year, month, day, and hour, you can use '$year-$month-$day $hour:00:00'.
    • If the timestamp is in fields dt and hour, you can use '$dt $hour:00:00'.
    end-input.check-partition-expire
    false Boolean Whether check partition expire after batch mode or bounded stream job finish.
    Edit This Page
    Copyright © 2024 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.