Procedures

Procedures #

This section introduce all available spark procedures about paimon.

Procedure Name Explanation Example
compact To compact files. Argument:
  • table: the target table identifier. Cannot be empty.
  • partitions: partition filter. "," means "AND"
    ";" means "OR".If you want to compact one partition with date=01 and day=01, you need to write 'date=01,day=01'. Left empty for all partitions. (Can't be used together with "where")
  • where: partition predicate. Left empty for all partitions. (Can't be used together with "partitions")
  • order_strategy: 'order' or 'zorder' or 'hilbert' or 'none'. Left empty for 'none'.
  • order_columns: the columns need to be sort. Left empty if 'order_strategy' is 'none'.
  • SET spark.sql.shuffle.partitions=10; --set the compact parallelism
    CALL sys.compact(table => 'T', partitions => 'p=0;p=1', order_strategy => 'zorder', order_by => 'a,b')
    CALL sys.compact(table => 'T', where => 'p>0 and p<3', order_strategy => 'zorder', order_by => 'a,b')
    expire_snapshots To expire snapshots. Argument:
  • table: the target table identifier. Cannot be empty.
  • retain_max: the maximum number of completed snapshots to retain.
  • retain_min: the minimum number of completed snapshots to retain.
  • older_than: timestamp before which snapshots will be removed.
  • max_deletes: the maximum number of snapshots that can be deleted at once.
  • CALL sys.expire_snapshots(table => 'default.T', retain_max => 10)
    create_tag To create a tag based on given snapshot. Arguments:
  • table: the target table identifier. Cannot be empty.
  • tag: name of the new tag. Cannot be empty.
  • snapshot(Long): id of the snapshot which the new tag is based on.
  • time_retained: The maximum time retained for newly created tags.
  • -- based on snapshot 10 with 1d
    CALL sys.create_tag(table => 'default.T', tag => 'my_tag', snapshot => 10, time_retained => '1 d')
    -- based on the latest snapshot
    CALL sys.create_tag(table => 'default.T', tag => 'my_tag')
    delete_tag To delete a tag. Arguments:
  • table: the target table identifier. Cannot be empty.
  • tag: name of the tag to be deleted. If you specify multiple tags, delimiter is ','.
  • CALL sys.delete_tag(table => 'default.T', tag => 'my_tag')
    rollback To rollback to a specific version of target table. Argument:
  • table: the target table identifier. Cannot be empty.
  • version: id of the snapshot or name of tag that will roll back to.
  • CALL sys.rollback(table => 'default.T', version => 'my_tag')
    CALL sys.rollback(table => 'default.T', version => 10)
    remove_orphan_files To remove the orphan data files and metadata files. Arguments:
  • table: the target table identifier. Cannot be empty.
  • older_than: to avoid deleting newly written files, this procedure only deletes orphan files older than 1 day by default. This argument can modify the interval.
  • CALL sys.remove_orphan_files(table => 'default.T', older_than => '2023-10-31 12:00:00')
    Edit This Page
    Copyright © 2024 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.