Hive #
This documentation is a guide for using Paimon in Hive.
Version #
Paimon currently supports Hive 3.1, 2.3, 2.2, 2.1 and 2.1-cdh-6.3.
Execution Engine #
Paimon currently supports MR and Tez execution engine for Hive Read, and MR execution engine for Hive Write. Note If you use beeline, please restart the hive cluster.
Installation #
Download the jar file with corresponding version.
Jar | |
---|---|
Hive 3.1 | paimon-hive-connector-3.1-0.9.0.jar |
Hive 2.3 | paimon-hive-connector-2.3-0.9.0.jar |
Hive 2.2 | paimon-hive-connector-2.2-0.9.0.jar |
Hive 2.1 | paimon-hive-connector-2.1-0.9.0.jar |
Hive 2.1-cdh-6.3 | paimon-hive-connector-2.1-cdh-6.3-0.9.0.jar |
You can also manually build bundled jar from the source code.
To build from source code, clone the git repository.
Build bundled jar with the following command.
mvn clean install -DskipTests
You can find Hive connector jar in ./paimon-hive/paimon-hive-connector-<hive-version>/target/paimon-hive-connector-<hive-version>-0.9.0.jar
.
There are several ways to add this jar to Hive.
- You can create an
auxlib
folder under the root directory of Hive, and copypaimon-hive-connector-0.9.0.jar
intoauxlib
. - You can also copy this jar to a path accessible by Hive, then use
add jar /path/to/paimon-hive-connector-0.9.0.jar
to enable paimon support in Hive. Note that this method is not recommended. If you’re using the MR execution engine and running a join statement, you may be faced with the exceptionorg.apache.hive.com.esotericsoftware.kryo.kryoexception: unable to find class
.
NOTE:
- If you are using HDFS :
- Make sure that the environment variable
HADOOP_HOME
orHADOOP_CONF_DIR
is set. - You can set
paimon.hadoop-load-default-config
=false
to disable loading the default value fromcore-default.xml
、hdfs-default.xml
, which may lead smaller size for split.
- Make sure that the environment variable
- With hive cbo, it may lead to some incorrect query results, such as to query
struct
type withnot null
predicate, you can disable the cbo byset hive.cbo.enable=false;
command.
Hive SQL: access Paimon Tables already in Hive metastore #
Run the following Hive SQL in Hive CLI to access the created table.
-- Assume that paimon-hive-connector-<hive-version>-0.9.0.jar is already in auxlib directory.
-- List tables in Hive
-- (you might need to switch to "default" database if you're not there by default)
SHOW TABLES;
/*
OK
test_table
*/
-- Read records from test_table
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1 Table
2 Store
*/
-- Insert records into test table
-- Limitations:
-- Only support INSERT INTO, not support INSERT OVERWRITE
-- It is recommended to write to a non primary key table
-- Writing to a primary key table may result in a large number of small files
INSERT INTO test_table VALUES (3, 'Paimon');
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1 Table
2 Store
3 Paimon
*/
-- time travel
SET paimon.scan.snapshot-id=1;
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1 Table
2 Store
3 Paimon
*/
SET paimon.scan.snapshot-id=null;
Hive SQL: create new Paimon Tables #
You can create new paimon tables in Hive. Run the following Hive SQL in Hive CLI.
-- Assume that paimon-hive-connector-0.9.0.jar is already in auxlib directory.
-- Let's create a new paimon table.
SET hive.metastore.warehouse.dir=warehouse_path;
CREATE TABLE hive_test_table(
a INT COMMENT 'The a field',
b STRING COMMENT 'The b field'
)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';
Hive SQL: access Paimon Tables by External Table #
To access existing paimon table, you can also register them as external tables in Hive. Run the following Hive SQL in Hive CLI.
-- Assume that paimon-hive-connector-0.9.0.jar is already in auxlib directory.
-- Let's use the test_table created in the above section.
-- To create an external table, you don't need to specify any column or table properties.
-- Pointing the location to the path of table is enough.
CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION '/path/to/table/store/warehouse/default.db/test_table';
-- In addition to the way setting location above, you can also place the location setting in TBProperties
-- to avoid Hive accessing Paimon's location through its own file system when creating tables.
-- This method is effective in scenarios using Object storage,such as s3.
CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
TBLPROPERTIES (
'paimon_location' ='s3://xxxxx/path/to/table/store/warehouse/default.db/test_table'
);
-- Read records from external_test_table
SELECT a, b FROM external_test_table ORDER BY a;
/*
OK
1 Table
2 Store
*/
-- Insert records into test table
INSERT INTO external_test_table VALUES (3, 'Paimon');
SELECT a, b FROM external_test_table ORDER BY a;
/*
OK
1 Table
2 Store
3 Paimon
*/
Hive Type Conversion #
This section lists all supported type conversion between Hive and Paimon.
All Hive’s data types are available in package org.apache.hadoop.hive.serde2.typeinfo
.
Hive Data Type | Paimon Data Type | Atomic Type |
---|---|---|
StructTypeInfo |
RowType |
false |
MapTypeInfo |
MapType |
false |
ListTypeInfo |
ArrayType |
false |
PrimitiveTypeInfo("boolean") |
BooleanType |
true |
PrimitiveTypeInfo("tinyint") |
TinyIntType |
true |
PrimitiveTypeInfo("smallint") |
SmallIntType |
true |
PrimitiveTypeInfo("int") |
IntType |
true |
PrimitiveTypeInfo("bigint") |
BigIntType |
true |
PrimitiveTypeInfo("float") |
FloatType |
true |
PrimitiveTypeInfo("double") |
DoubleType |
true |
CharTypeInfo(length) |
CharType(length) |
true |
PrimitiveTypeInfo("string") |
VarCharType(VarCharType.MAX_LENGTH) |
true |
VarcharTypeInfo(length) |
VarCharType(length), length is less than VarCharType.MAX_LENGTH |
true |
PrimitiveTypeInfo("date") |
DateType |
true |
PrimitiveTypeInfo("timestamp") |
TimestampType |
true |
DecimalTypeInfo(precision, scale) |
DecimalType(precision, scale) |
true |
PrimitiveTypeInfo("binary") |
VarBinaryType , BinaryType |
true |