This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Iceberg Migration #
Apache Iceberg data with parquet file format could be migrated to Apache Paimon. When migrating an iceberg table to a paimon table, the origin iceberg table will permanently disappear. So please back up your data if you still need the original table. The migrated paimon table will be an append table.
We highly recommend to back up iceberg table data before migrating, because migrating action is not atomic. If been interrupted while migrating, you may lose your data.
Migrate Iceberg Table #
Currently, we can use paimon catalog with MigrateIcebergTableProcedure or MigrateIcebergTableAction to migrate the data used by latest iceberg snapshot in an iceberg table to a paimon table.
Iceberg tables managed by hadoop-catalog or hive-catalog are supported to be migrated to paimon. As for the type of paimon catalog, it needs to have access to the file system where the iceberg metadata and data files are located. This means we could migrate an iceberg table managed by hadoop-catalog to a paimon table in hive catalog if their warehouses are in the same file system.
When migrating, the iceberg data files which were marked by DELETED will be ignored. Only the data files referenced by manifest entries with ‘EXISTING’ and ‘ADDED’ content will be migrated to paimon. Notably, now we don’t support migrating iceberg tables with delete files(deletion vectors, position delete files, equality delete files etc.)
Now only parquet format is supported in iceberg migration.
MigrateIcebergTableProcedure #
You can run the following command to migrate an iceberg table to a paimon table.
-- Use named argument
CALL sys.migrate_iceberg_table(source_table => 'database_name.table_name', iceberg_options => 'iceberg_options', options => 'paimon_options', parallelism => parallelism);
-- Use indexed argument
CALL sys.migrate_iceberg_table('source_table','iceberg_options', 'options', 'parallelism');
source_table
, string type, is used to specify the source iceberg table to migrate, it’s required.iceberg_options
, string type, is used to specify the configuration of migration, multiple configuration items are separated by commas. it’s required.options
, string type, is used to specify the additional options for the target paimon table, it’s optional.parallelism
, integer type, is used to specify the parallelism of the migration job, it’s optional.
hadoop-catalog #
To migrate iceberg table managed by hadoop-catalog, you need set metadata.iceberg.storage=hadoop-catalog
and iceberg_warehouse
. Example:
CREATE CATALOG paimon_catalog WITH ('type' = 'paimon', 'warehouse' = '/path/to/paimon/warehouse');
USE CATALOG paimon_catalog;
CALL sys.migrate_iceberg_table(
source_table => 'iceberg_db.iceberg_tbl',
iceberg_options => 'metadata.iceberg.storage=hadoop-catalog,iceberg_warehouse=/path/to/iceberg/warehouse'
);
If you want the metadata of the migrated paimon table to be managed by hive, you can also create a hive catalog of paimon for migration. Example:
CREATE CATALOG paimon_catalog WITH (
'type' = 'paimon',
'metastore' = 'hive',
'uri' = 'thrift://<host>:<port>',
'warehouse' = '/path/to/paimon/warehouse'
);
USE CATALOG paimon_catalog;
CALL sys.migrate_iceberg_table(
source_table => 'iceberg_db.iceberg_tbl',
iceberg_options => 'metadata.iceberg.storage=hadoop-catalog,iceberg_warehouse=/path/to/iceberg/warehouse'
);
hive-catalog #
To migrate iceberg table managed by hive-catalog, you need set metadata.iceberg.storage=hive-catalog
and provide information about Hive Metastore used by the iceberg table in iceberg_options
.
Option | Default | Type | Description |
---|---|---|---|
metadata.iceberg.uri |
none | String | Hive metastore uri for Iceberg Hive catalog. |
metadata.iceberg.hive-conf-dir |
none | String | hive-conf-dir for Iceberg Hive catalog. |
metadata.iceberg.hadoop-conf-dir |
none | String | hadoop-conf-dir for Iceberg Hive catalog. |
metadata.iceberg.hive-client-class |
org.apache.hadoop.hive.metastore.HiveMetaStoreClient | String | Hive client class name for Iceberg Hive Catalog. |
Example:
CREATE CATALOG paimon_catalog WITH (
'type' = 'paimon',
'metastore' = 'hive',
'uri' = 'thrift://<host>:<port>',
'warehouse' = '/path/to/paimon/warehouse'
);
USE CATALOG paimon_catalog;
CALL sys.migrate_iceberg_table(
source_table => 'iceberg_db.iceberg_tbl',
iceberg_options => 'metadata.iceberg.storage=hive-catalog,metadata.iceberg.uri=thrift://<host>:<port>'
);
MigrateIcebergTableAction #
You can also use flink action for migration:
<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-1.1-SNAPSHOT.jar \
migrate_iceberg_table \
--table <icebergDatabase.icebergTable> \
--iceberg_options <iceberg-conf [,iceberg-conf ...]> \
[--parallelism <parallelism>] \
[--catalog_conf <paimon-catalog-conf> [--catalog_conf <paimon-catalog-conf> ...]] \
[--options <paimon-table-conf [,paimon-table-conf ...]> ]
Example:
<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-1.1-SNAPSHOT.jar \
migrate_iceberg_table \
--table iceberg_db.iceberg_tbl \
--iceberg_options metadata.iceberg.storage=hive-catalog, metadata.iceberg.uri=thrift://localhost:9083 \
--parallelism 6 \
--catalog_conf warehouse=/path/to/paimon/warehouse \
--catalog_conf metastore=hive \
--catalog_conf uri=thrift://localhost:9083