Hive Table Migration #

Apache Hive supports ORC, Parquet file formats that could be migrated to Paimon. When migrating data to a paimon table, the origin table will be permanently disappeared. So please back up your data if you still need the original table. The migrated table will be unaware-bucket append-only table.

Now, we can use paimon hive catalog with Migrate Table Procedure and Migrate File Procedure to totally migrate a table from hive to paimon.

Migrate Table Procedure: Paimon table does not exist, use the procedure upgrade hive table to paimon table. Hive table will disappear after action done.
Migrate File Procedure: Paimon table already exists, use the procedure to migrate files from hive table to paimon table. Notice that, Hive table will also disappear after action done.

These two actions now only support file format of hive “orc” and “parquet”, if your table file is formatted in other format like avro, these procedures will fail. But we will support avro format in the future. Please make sure your table file format is in “orc” and “parquet” now.

We highly recommend to back up hive table data before migrating, because migrating action is not atomic. If been interrupted while migrating, you may lose your data.

Example for Migration #

Migrate Hive Table

Command:

CALL sys.migrate_table('hive', '<hive_database>.<hive_tablename>', '<paimon_tableconf>');

Example

CREATE CATALOG PAIMON WITH ('type'='paimon', 'metastore' = 'hive', 'uri' = 'thrift://localhost:9083', 'warehouse'='/path/to/warehouse/');

USE CATALOG PAIMON;

CALL sys.migrate_table('hive', 'default.hivetable', 'file.format=orc');

After invoke, “hivetable” will totally convert to paimon format. Writing and reading the table by old “hive way” will fail. We can add our table properties while importing by sys.migrate_table('.', ‘'). here should be separated by “,”. For example:

CALL sys.migrate_table('hive', 'my_db.wait_to_upgrate', 'file.format=orc,read.batch-size=2096,write-only=true')

If your flink version is below 1.17, you can use flink action to achieve this:

<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-0.6.0-incubating.jar \
migrate_table
--warehouse <warehouse-path> \
--source_type hive \
--table <database.table-name> \
[--catalog_conf <paimon-catalog-conf> [--catalog_conf <paimon-catalog-conf> ...]] \
[--options <paimon-table-conf  [,paimon-table-conf ...]> ]

Example:

<FLINK_HOME>/flink run ./paimon-flink-action-0.6.0-incubating.jar migrate_table \
--warehouse /path/to/warehouse \
--catalog_conf uri=thrift://localhost:9083 \
--catalog_conf metastore=hive \
--source_type hive \
--table default.hive_or_paimon \

Migrate Hive File

Command:

CALL sys.migrate_file('hive', '<hive_database>.<hive_table_name>', '<paimon_database>.<paimon_tablename>');

Example

CREATE CATALOG PAIMON WITH ('type'='paimon', 'metastore' = 'hive', 'uri' = 'thrift://localhost:9083', 'warehouse'='/path/to/warehouse/');

USE CATALOG PAIMON;

CALL sys.migrate_file('hive', 'default.hivetable', 'default.paimontable');

After invoke, “hivetable” will disappear. And all files will be moved and renamed to paimon directory. “paimontable” here must have the same partition keys with “hivetable”, and “paimontable” should be in unaware-bucket mode.