This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Trino #
This documentation is a guide for using Paimon in Trino.
Version #
Paimon currently supports Trino 440.
Filesystem #
From version 0.8, Paimon share Trino filesystem for all actions, which means, you should config Trino filesystem before using trino-paimon. You can find information about how to config filesystems for Trino on Trino official website.
Preparing Paimon Jar File #
You can also manually build a bundled jar from the source code. However, there are a few preliminary steps that need to be taken before compiling:
- To build from the source code, clone the git repository.
- Install JDK21 locally, and configure JDK21 as a global environment variable;
Then,you can build bundled jar with the following command:
mvn clean install -DskipTests
You can find Trino connector jar in ./paimon-trino-<trino-version>/target/paimon-trino-<trino-version>-1.0-SNAPSHOT-plugin.tar.gz
.
We use hadoop-apache as a dependency for Hadoop, and the default Hadoop dependency typically supports both Hadoop 2 and Hadoop 3. If you encounter an unsupported scenario, you can specify the corresponding Apache Hadoop version.
For example, if you want to use Hadoop 3.3.5-1, you can use the following command to build the jar:
mvn clean install -DskipTests -Dhadoop.apache.version=3.3.5-1
Configure Paimon Catalog #
Install Paimon Connector #
tar -zxf paimon-trino-<trino-version>-1.0-SNAPSHOT-plugin.tar.gz -C ${TRINO_HOME}/plugin
NOTE: For JDK 21, when Deploying Trino, should add jvm options:
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED
Configure #
Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/paimon.properties with the following contents to mount the paimon connector as the paimon catalog:
connector.name=paimon
warehouse=file:/tmp/warehouse
If you are using HDFS, choose one of the following ways to configure your HDFS:
- set environment variable HADOOP_HOME.
- set environment variable HADOOP_CONF_DIR.
- configure
hadoop-conf-dir
in the properties.
If you are using a Hadoop filesystem, you can still use trino-hdfs and trino-hive to config it.
For example, if you use oss as a storage, you can write in paimon.properties
according to Trino Reference:
hive.config.resources=/path/to/core-site.xml
Then, config core-site.xml according to Jindo Reference
Kerberos #
You can configure kerberos keytab file when using KERBEROS authentication in the properties.
security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/trino/hdfs.keytab
Keytab files must be distributed to every node in the cluster that runs Trino.
Create Schema #
CREATE SCHEMA paimon.test_db;
Create Table #
CREATE TABLE paimon.test_db.orders (
order_key bigint,
orders_tatus varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
Add Column #
CREATE TABLE paimon.test_db.orders (
order_key bigint,
orders_tatus varchar,
total_price decimal(18,4),
order_date date
)
WITH (
file_format = 'ORC',
primary_key = ARRAY['order_key','order_date'],
partitioned_by = ARRAY['order_date'],
bucket = '2',
bucket_key = 'order_key',
changelog_producer = 'input'
)
ALTER TABLE paimon.test_db.orders ADD COLUMN shipping_address varchar;
Query #
SELECT * FROM paimon.test_db.orders
Query with Time Traveling #
-- read the snapshot from specified timestamp
SELECT * FROM t FOR TIMESTAMP AS OF TIMESTAMP '2023-01-01 00:00:00 Asia/Shanghai';
-- read the snapshot with id 1L (use snapshot id as version)
SELECT * FROM t FOR VERSION AS OF 1;
-- read tag 'my-tag'
SELECT * FROM t FOR VERSION AS OF 'my-tag';
If tag’s name is a number and equals to a snapshot id, the VERSION AS OF syntax will consider tag first. For example, if
you have a tag named ‘1’ based on snapshot 2, the statement SELECT * FROM paimon.test_db.orders FOR VERSION AS OF '1'
actually queries snapshot 2
instead of snapshot 1.
Insert #
INSERT INTO paimon.test_db.orders VALUES (.....);
Supports:
- primary key table with fixed bucket.
- non-primary-key table with bucket -1.
Trino to Paimon type mapping #
This section lists all supported type conversion between Trino and Paimon.
All Trino’s data types are available in package io.trino.spi.type
.
Trino Data Type | Paimon Data Type | Atomic Type |
---|---|---|
RowType |
RowType |
false |
MapType |
MapType |
false |
ArrayType |
ArrayType |
false |
BooleanType |
BooleanType |
true |
TinyintType |
TinyIntType |
true |
SmallintType |
SmallIntType |
true |
IntegerType |
IntType |
true |
BigintType |
BigIntType |
true |
RealType |
FloatType |
true |
DoubleType |
DoubleType |
true |
CharType(length) |
CharType(length) |
true |
VarCharType(VarCharType.MAX_LENGTH) |
VarCharType(VarCharType.MAX_LENGTH) |
true |
VarCharType(length) |
VarCharType(length), length is less than VarCharType.MAX_LENGTH |
true |
DateType |
DateType |
true |
TimestampType |
TimestampType |
true |
DecimalType(precision, scale) |
DecimalType(precision, scale) |
true |
VarBinaryType(length) |
VarBinaryType(length) |
true |
TimestampWithTimeZoneType |
LocalZonedTimestampType |
true |
Tmp Dir #
Paimon will unzip some jars to the tmp directory for codegen. By default, Trino will use '/tmp'
as the temporary
directory, but '/tmp'
may be periodically deleted.
You can configure this environment variable when Trino starts:
-Djava.io.tmpdir=/path/to/other/tmpdir
Let Paimon use a secure temporary directory.