Hive Table Migration #
Apache Hive supports ORC, Parquet file formats that could be migrated to Paimon. When migrating data to a paimon table, the origin table will be permanently disappeared. So please back up your data if you still need the original table. The migrated table will be append table.
Now, we can use paimon hive catalog with Migrate Table Procedure and Migrate File Procedure to totally migrate a table from hive to paimon. At the same time, you can use paimon hive catalog with Migrate Database Procedure to fully synchronize all tables in the database to paimon.
- Migrate Table Procedure: Paimon table does not exist, use the procedure upgrade hive table to paimon table. Hive table will disappear after action done.
- Migrate Database Procedure: Paimon table does not exist, use the procedure upgrade all hive tables in database to paimon table. All hive tables will disappear after action done.
- Migrate File Procedure: Paimon table already exists, use the procedure to migrate files from hive table to paimon table. Notice that, Hive table will also disappear after action done.
These three actions now support file format of hive “orc” and “parquet” and “avro”.
We highly recommend to back up hive table data before migrating, because migrating action is not atomic. If been interrupted while migrating, you may lose your data.
Example for Migration #
Migrate Hive Table
Command:
CALL sys.migrate_table('hive', '<hive_database>.<hive_tablename>', '<paimon_tableconf>');
Example
CREATE CATALOG PAIMON WITH ('type'='paimon', 'metastore' = 'hive', 'uri' = 'thrift://localhost:9083', 'warehouse'='/path/to/warehouse/');
USE CATALOG PAIMON;
CALL sys.migrate_table(connector => 'hive', source_table => 'default.hivetable', options => 'file.format=orc');
After invoke, “hivetable” will totally convert to paimon format. Writing and reading the table by old “hive way” will fail.
We can add our table properties while importing by sys.migrate_table('
CALL sys.migrate_table(
connector => 'hive',
source_table => 'my_db.wait_to_upgrate',
options => 'file.format=orc,read.batch-size=2096,write-only=true'
);
If your flink version is below 1.17, you can use flink action to achieve this:
<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-1.0.0.jar \
migrate_table
--warehouse <warehouse-path> \
--source_type hive \
--table <database.table-name> \
[--catalog_conf <paimon-catalog-conf> [--catalog_conf <paimon-catalog-conf> ...]] \
[--options <paimon-table-conf [,paimon-table-conf ...]> ]
Example:
<FLINK_HOME>/flink run ./paimon-flink-action-1.0.0.jar migrate_table \
--warehouse /path/to/warehouse \
--catalog_conf uri=thrift://localhost:9083 \
--catalog_conf metastore=hive \
--source_type hive \
--table default.hive_or_paimon \
Migrate Hive Database
Command:
CALL sys.migrate_database('hive', '<hive_database>', '<paimon_tableconf>');
Example
CREATE CATALOG PAIMON WITH ('type'='paimon', 'metastore' = 'hive', 'uri' = 'thrift://localhost:9083', 'warehouse'='/path/to/warehouse/');
USE CATALOG PAIMON;
CALL sys.migrate_database(connector => 'hive', source_database => 'default', options => 'file.format=orc');
After invoke, all tables in “default” database will totally convert to paimon format. Writing and reading the table by old “hive way” will fail.
We can add our table properties while importing by sys.migrate_database('
CALL sys.migrate_database(
connector => 'hive',
source_database => 'my_db',
options => 'file.format=orc,read.batch-size=2096,write-only=true'
);
If your flink version is below 1.17, you can use flink action to achieve this:
<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-1.0.0.jar \
migrate_databse
--warehouse <warehouse-path> \
--source_type hive \
--database <database> \
[--catalog_conf <paimon-catalog-conf> [--catalog_conf <paimon-catalog-conf> ...]] \
[--options <paimon-table-conf [,paimon-table-conf ...]> ]
Example:
<FLINK_HOME>/flink run ./paimon-flink-action-1.0.0.jar migrate_table \
--warehouse /path/to/warehouse \
--catalog_conf uri=thrift://localhost:9083 \
--catalog_conf metastore=hive \
--source_type hive \
--database default \
Migrate Hive File
Command:
CALL sys.migrate_file('hive', '<hive_database>.<hive_table_name>', '<paimon_database>.<paimon_tablename>');
Example
CREATE CATALOG PAIMON WITH ('type'='paimon', 'metastore' = 'hive', 'uri' = 'thrift://localhost:9083', 'warehouse'='/path/to/warehouse/');
USE CATALOG PAIMON;
CALL sys.migrate_file(connector => 'hive', source_table => 'default.hivetable', target_table => 'default.paimontable');
After invoke, “hivetable” will disappear. And all files will be moved and renamed to paimon directory. “paimontable” here must have the same partition keys with “hivetable”, and “paimontable” should be in unaware-bucket mode.