Skip to main content

Manage Tags

Paimon's snapshots can provide an easy way to query historical data. But in most scenarios, a job will generate too many snapshots and table will expire old snapshots according to table configuration. Snapshot expiration will also delete old data files, and the historical data of expired snapshots cannot be queried anymore.

To solve this problem, you can create a tag based on a snapshot. The tag will maintain the manifests and data files of the snapshot. A typical usage is creating tags daily, then you can maintain the historical data of each day for batch reading.

Automatic Creation

Paimon supports automatic creation of tags in writing job.

Step 1: Choose Creation Mode

You can set creation mode by table option 'tag.automatic-creation'. Supported values are:

  • process-time: Create TAG based on the time of the machine.
  • watermark: Create TAG based on the watermark of the Sink input.
  • batch: In a batch processing scenario, a tag is generated after the current task is completed.
info

If you choose Watermark, you may need to specify the time zone of watermark, if watermark is not in the UTC time zone, please configure 'sink.watermark-time-zone'.

Step 2: Choose Creation Period

What frequency is used to generate tags. You can choose 'daily', 'hourly' and 'two-hours' for 'tag.creation-period'.

If you need to wait for late data, you can configure a delay time: 'tag.creation-delay'.

Step 3: Automatic deletion of tags

You can configure 'tag.num-retained-max' or tag.default-time-retained to delete tags automatically.

Example, configure table to create a tag at 0:10 every day, with a maximum retention time of 3 months:

-- Flink SQL
CREATE TABLE t (
k INT PRIMARY KEY NOT ENFORCED,
f0 INT,
...
) WITH (
'tag.automatic-creation' = 'process-time',
'tag.creation-period' = 'daily',
'tag.creation-delay' = '10 m',
'tag.num-retained-max' = '90'
);

INSERT INTO t SELECT ...;

-- Spark SQL

-- Read latest snapshot
SELECT * FROM t;

-- Read Tag snapshot
SELECT * FROM t VERSION AS OF '2023-07-26';

-- Read Incremental between Tags
SELECT * FROM paimon_incremental_query('t', '2023-07-25', '2023-07-26');

See Query Tables to see more query for Spark.

Create Tags

You can create a tag with given name and snapshot ID.

Run the following command:

CALL sys.create_tag(`table` => 'database_name.table_name', tag => 'tag_name', [snapshot_id => <snapshot-id>]);

If snapshot_id unset, snapshot_id defaults to the latest.

Delete Tags

You can delete a tag by its name.

Run the following command:

CALL sys.delete_tag(`table` => 'database_name.table_name', tag => 'tag_name');

Rollback to Tag

Rollback table to a specific tag. All snapshots and tags whose snapshot id is larger than the tag will be deleted (and the data will be deleted too).

Run the following command:

CALL sys.rollback_to(`table` => 'database_name.table_name', tag => 'tag_name');