This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Hive Table Migration #
Apache Hive supports ORC, Parquet file formats that could be migrated to Paimon. When migrating data to a paimon table, the origin table will be permanently disappeared. So please back up your data if you still need the original table. The migrated table will be append table.
Now, we can use paimon hive catalog with Migrate Table Procedure to totally migrate a table from hive to paimon. At the same time, you can use paimon hive catalog with Migrate Database Procedure to fully synchronize all tables in the database to paimon.
- Migrate Table Procedure: Paimon table does not exist, use the procedure upgrade hive table to paimon table. Hive table will disappear after action done.
- Migrate Database Procedure: Paimon table does not exist, use the procedure upgrade all hive tables in database to paimon table. All hive tables will disappear after action done.
These three actions now support file format of hive “orc” and “parquet” and “avro”.
We highly recommend to back up hive table data before migrating, because migrating action is not atomic. If been interrupted while migrating, you may lose your data.
Migrate Hive Table #
CREATE CATALOG PAIMON WITH (
'type'='paimon',
'metastore' = 'hive',
'uri' = 'thrift://localhost:9083',
'warehouse'='/path/to/warehouse/');
USE CATALOG PAIMON;
CALL sys.migrate_table(
connector => 'hive',
source_table => 'default.hivetable',
-- You can specify the target table, and if the target table already exists
-- the file will be migrated directly to it
-- target_table => 'default.paimontarget',
-- You can specify delete_origin is false, this won't delete hivetable
-- delete_origin => false,
options => 'file.format=orc');
<FLINK_HOME>/flink run ./paimon-flink-action-1.1-SNAPSHOT.jar \
migrate_table \
--warehouse /path/to/warehouse \
--catalog_conf uri=thrift://localhost:9083 \
--catalog_conf metastore=hive \
--source_type hive \
--table default.hive_or_paimon
After invoke, “hivetable” will totally convert to paimon format. Writing and reading the table by old “hive way” will fail.
Migrate Hive Database #
CREATE CATALOG PAIMON WITH (
'type'='paimon',
'metastore' = 'hive',
'uri' = 'thrift://localhost:9083',
'warehouse'='/path/to/warehouse/');
USE CATALOG PAIMON;
CALL sys.migrate_database(
connector => 'hive',
source_database => 'default',
options => 'file.format=orc');
<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-1.1-SNAPSHOT.jar \
migrate_databse \
--warehouse <warehouse-path> \
--source_type hive \
--database <database> \
[--catalog_conf <paimon-catalog-conf> [--catalog_conf <paimon-catalog-conf> ...]] \
[--options <paimon-table-conf [,paimon-table-conf ...]> ]
Example:
<FLINK_HOME>/flink run ./paimon-flink-action-1.1-SNAPSHOT.jar migrate_table \
--warehouse /path/to/warehouse \
--catalog_conf uri=thrift://localhost:9083 \
--catalog_conf metastore=hive \
--source_type hive \
--database default
After invoke, all tables in “default” database will totally convert to paimon format. Writing and reading the table by old “hive way” will fail.