This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Presto #
This documentation is a guide for using Paimon in Presto.
Version #
Paimon currently supports Presto 0.236 and above.
Preparing Paimon Jar File #
Version | Jar |
---|---|
[0.236, 0.268) | paimon-presto-0.236-1.0-SNAPSHOT-plugin.tar.gz |
[0.268, 0.273) | paimon-presto-0.268-1.0-SNAPSHOT-plugin.tar.gz |
[0.273, latest] | paimon-presto-0.273-1.0-SNAPSHOT-plugin.tar.gz |
You can also manually build a bundled jar from the source code.
To build from the source code, clone the git repository.
Build presto connector plugin with the following command.
mvn clean install -DskipTests
After the packaging is complete, you can choose the corresponding connector based on your own Presto version:
Version | Package |
---|---|
[0.236, 0.268) | ./paimon-presto-0.236/target/paimon-presto-0.236-1.0-SNAPSHOT-plugin.tar.gz |
[0.268, 0.273) | ./paimon-presto-0.268/target/paimon-presto-0.268-1.0-SNAPSHOT-plugin.tar.gz |
[0.273, latest] | ./paimon-presto-0.273/target/paimon-presto-0.273-1.0-SNAPSHOT-plugin.tar.gz |
Of course, we also support different versions of Hive and Hadoop. But note that we utilize Presto-shaded versions of Hive and Hadoop packages to address dependency conflicts. You can check the following two links to select the appropriate versions of Hive and Hadoop:
Both Hive 2 and 3, as well as Hadoop 2 and 3, are supported.
For example, if your presto version is 0.274, hive and hadoop version is 2.x, you could run:
mvn clean install -DskipTests -am -pl paimon-presto-0.273 -Dpresto.version=0.274 -Dhadoop.apache2.version=2.7.4-9 -Dhive.apache.version=1.2.2-2
Tmp Dir #
Paimon will unzip some jars to the tmp directory for codegen. By default, Presto will use '/tmp'
as the temporary
directory, but '/tmp'
may be periodically deleted.
You can configure this environment variable when Presto starts:
-Djava.io.tmpdir=/path/to/other/tmpdir
Let Paimon use a secure temporary directory.
Configure Paimon Catalog #
Install Paimon Connector #
tar -zxf paimon-presto-${PRESTO_VERSION}/target/paimon-presto-${PRESTO_VERSION}-${PAIMON_VERSION}-plugin.tar.gz -C ${PRESTO_HOME}/plugin
Note that, the variable PRESTO_VERSION
is module name, must be one of 0.236, 0.268, 0.273.
Configuration #
cd ${PRESTO_HOME}
mkdir -p etc/catalog
connector.name=paimon
# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
warehouse=${YOUR_FS_PATH}
If you are using HDFS FileSystem, you will also need to do one more thing: choose one of the following ways to configure your HDFS:
- set environment variable HADOOP_HOME.
- set environment variable HADOOP_CONF_DIR.
- configure
hadoop-conf-dir
in the properties.
If you are using S3 FileSystem, you need to add paimon-s3-${PAIMON_VERSION}.jar
in ${PRESTO_HOME}/plugin/paimon
and additionally configure the following properties in paimon.properties
:
s3.endpoint=${YOUR_ENDPOINTS}
s3.access-key=${YOUR_AK}
s3.secret-key=${YOUR_SK}
Query HiveCatalog table:
vim etc/catalog/paimon.properties
and set the following config:
connector.name=paimon
# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
warehouse=${YOUR_FS_PATH}
metastore=hive
uri=thrift://${YOUR_HIVE_METASTORE}:9083
Kerberos #
You can configure kerberos keytab file when using KERBEROS authentication in the properties.
security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/presto/hdfs.keytab
Keytab files must be distributed to every node in the cluster that runs Presto.
Create Schema #
CREATE SCHEMA paimon.test_db;
Create Table #
CREATE TABLE paimon.test_db.orders (
order_key bigint,
order_status varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
Add Column #
CREATE TABLE paimon.test_db.orders (
order_key bigint,
orders_tatus varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
ALTER TABLE paimon.test_db.orders ADD COLUMN "shipping_address varchar;
Query #
SELECT * FROM paimon.default.MyTable
Presto to Paimon type mapping #
This section lists all supported type conversion between Presto and Paimon.
All Presto’s data types are available in package com.facebook.presto.common.type
.
Presto Data Type | Paimon Data Type | Atomic Type |
---|---|---|
RowType |
RowType |
false |
MapType |
MapType |
false |
ArrayType |
ArrayType |
false |
BooleanType |
BooleanType |
true |
TinyintType |
TinyIntType |
true |
SmallintType |
SmallIntType |
true |
IntegerType |
IntType |
true |
BigintType |
BigIntType |
true |
RealType |
FloatType |
true |
DoubleType |
DoubleType |
true |
CharType(length) |
CharType(length) |
true |
VarCharType(VarCharType.MAX_LENGTH) |
VarCharType(VarCharType.MAX_LENGTH) |
true |
VarCharType(length) |
VarCharType(length), length is less than VarCharType.MAX_LENGTH |
true |
DateType |
DateType |
true |
TimestampType |
TimestampType |
true |
DecimalType(precision, scale) |
DecimalType(precision, scale) |
true |
VarBinaryType(length) |
VarBinaryType(length) |
true |
TimestampWithTimeZoneType |
LocalZonedTimestampType |
true |