Hive #
This documentation is a guide for using Paimon in Hive.
Version #
Paimon currently supports Hive 3.1, 2.3, 2.2, 2.1 and 2.1-cdh-6.3.
Execution Engine #
Paimon currently supports MR and Tez execution engine for Hive Read, and MR execution engine for Hive Write. Note If you use beeline, please restart the hive cluster.
Installation #
Download the jar file with corresponding version.
Jar | |
---|---|
Hive 3.1 | paimon-hive-connector-3.1-0.7.0-incubating.jar |
Hive 2.3 | paimon-hive-connector-2.3-0.7.0-incubating.jar |
Hive 2.2 | paimon-hive-connector-2.2-0.7.0-incubating.jar |
Hive 2.1 | paimon-hive-connector-2.1-0.7.0-incubating.jar |
Hive 2.1-cdh-6.3 | paimon-hive-connector-2.1-cdh-6.3-0.7.0-incubating.jar |
You can also manually build bundled jar from the source code.
To build from source code, clone the git repository.
Build bundled jar with the following command.
mvn clean install -DskipTests
You can find Hive connector jar in ./paimon-hive/paimon-hive-connector-<hive-version>/target/paimon-hive-connector-<hive-version>-0.7.0-incubating.jar
.
There are several ways to add this jar to Hive.
- You can create an
auxlib
folder under the root directory of Hive, and copypaimon-hive-connector-0.7.0-incubating.jar
intoauxlib
. - You can also copy this jar to a path accessible by Hive, then use
add jar /path/to/paimon-hive-connector-0.7.0-incubating.jar
to enable paimon support in Hive. Note that this method is not recommended. If you’re using the MR execution engine and running a join statement, you may be faced with the exceptionorg.apache.hive.com.esotericsoftware.kryo.kryoexception: unable to find class
.
NOTE:
- If you are using HDFS, make sure that the environment variable
HADOOP_HOME
orHADOOP_CONF_DIR
is set. - With hive cbo, it may lead to some incorrect query results, such as to query
struct
type withnot null
predicate, you can disable the cbo byset hive.cbo.enable=false;
command.
Flink SQL: with Paimon Hive Catalog #
By using paimon Hive catalog, you can create, drop, select and insert into paimon tables from Flink. These operations directly affect the corresponding Hive metastore. Tables created in this way can also be accessed directly from Hive.
Step 1: Prepare Flink Hive Connector Bundled Jar
See creating a catalog with Hive metastore.
Step 2: Create Test Data with Flink SQL
Execute the following Flink SQL script in Flink SQL client to define a Paimon Hive catalog and create a table.
-- Flink SQL CLI
-- Define paimon Hive catalog
CREATE CATALOG my_hive WITH (
'type' = 'paimon',
'metastore' = 'hive',
-- 'uri' = 'thrift://<hive-metastore-host-name>:<port>', default use 'hive.metastore.uris' in HiveConf
-- 'hive-conf-dir' = '...', this is recommended in the kerberos environment
-- 'hadoop-conf-dir' = '...', this is recommended in the kerberos environment
-- 'warehouse' = 'hdfs:///path/to/table/store/warehouse', default use 'hive.metastore.warehouse.dir' in HiveConf
);
-- Use paimon Hive catalog
USE CATALOG my_hive;
-- Create a table in paimon Hive catalog (use "default" database by default)
CREATE TABLE test_table (
a int,
b string
);
-- Insert records into test table
INSERT INTO test_table VALUES (1, 'Table'), (2, 'Store');
-- Read records from test table
SELECT * FROM test_table;
/*
+---+-------+
| a | b |
+---+-------+
| 1 | Table |
| 2 | Store |
+---+-------+
*/
Hive SQL: access Paimon Tables already in Hive metastore #
Run the following Hive SQL in Hive CLI to access the created table.
-- Assume that paimon-hive-connector-<hive-version>-0.7.0-incubating.jar is already in auxlib directory.
-- List tables in Hive
-- (you might need to switch to "default" database if you're not there by default)
SHOW TABLES;
/*
OK
test_table
*/
-- Read records from test_table
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1 Table
2 Store
*/
-- Insert records into test table
-- Note tez engine does not support hive write, only the hive engine is supported.
INSERT INTO test_table VALUES (3, 'Paimon');
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1 Table
2 Store
3 Paimon
*/
-- time travel
SET paimon.scan.snapshot-id=1;
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1 Table
2 Store
3 Paimon
*/
SET paimon.scan.snapshot-id=null;
Hive SQL: create new Paimon Tables #
You can create new paimon tables in Hive. Run the following Hive SQL in Hive CLI.
-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's create a new paimon table.
SET hive.metastore.warehouse.dir=warehouse_path;
CREATE TABLE hive_test_table(
a INT COMMENT 'The a field',
b STRING COMMENT 'The b field'
)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';
Hive SQL: access Paimon Tables by External Table #
To access existing paimon table, you can also register them as external tables in Hive. Run the following Hive SQL in Hive CLI.
-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's use the test_table created in the above section.
-- To create an external table, you don't need to specify any column or table properties.
-- Pointing the location to the path of table is enough.
CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION '/path/to/table/store/warehouse/default.db/test_table';
-- In addition to the way setting location above, you can also place the location setting in TBProperties
-- to avoid Hive accessing Paimon's location through its own file system when creating tables.
-- This method is effective in scenarios using Object storage,such as s3.
CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
TBLPROPERTIES (
'paimon_location' ='s3://xxxxx/path/to/table/store/warehouse/default.db/test_table'
);
-- Read records from external_test_table
SELECT a, b FROM external_test_table ORDER BY a;
/*
OK
1 Table
2 Store
*/
-- Insert records into test table
INSERT INTO external_test_table VALUES (3, 'Paimon');
SELECT a, b FROM external_test_table ORDER BY a;
/*
OK
1 Table
2 Store
3 Paimon
*/
Hive Type Conversion #
This section lists all supported type conversion between Hive and Paimon.
All Hive’s data types are available in package org.apache.hadoop.hive.serde2.typeinfo
.
Hive Data Type | Paimon Data Type | Atomic Type |
---|---|---|
StructTypeInfo |
RowType |
false |
MapTypeInfo |
MapType |
false |
ListTypeInfo |
ArrayType |
false |
PrimitiveTypeInfo("boolean") |
BooleanType |
true |
PrimitiveTypeInfo("tinyint") |
TinyIntType |
true |
PrimitiveTypeInfo("smallint") |
SmallIntType |
true |
PrimitiveTypeInfo("int") |
IntType |
true |
PrimitiveTypeInfo("bigint") |
BigIntType |
true |
PrimitiveTypeInfo("float") |
FloatType |
true |
PrimitiveTypeInfo("double") |
DoubleType |
true |
CharTypeInfo(length) |
CharType(length) |
true |
PrimitiveTypeInfo("string") |
VarCharType(VarCharType.MAX_LENGTH) |
true |
VarcharTypeInfo(length) |
VarCharType(length), length is less than VarCharType.MAX_LENGTH |
true |
PrimitiveTypeInfo("date") |
DateType |
true |
PrimitiveTypeInfo("timestamp") |
TimestampType |
true |
DecimalTypeInfo(precision, scale) |
DecimalType(precision, scale) |
true |
PrimitiveTypeInfo("binary") |
VarBinaryType , BinaryType |
true |