This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Trino #

This documentation is a guide for using Paimon in Trino.

Version #

Paimon currently supports Trino 440.

Filesystem #

From version 0.8, Paimon share Trino filesystem for all actions, which means, you should config Trino filesystem before using trino-paimon. You can find information about how to config filesystems for Trino on Trino official website.

Preparing Paimon Jar File #

Download

You can also manually build a bundled jar from the source code. However, there are a few preliminary steps that need to be taken before compiling:

To build from the source code, clone the git repository.
Install JDK21 locally, and configure JDK21 as a global environment variable;

Then,you can build bundled jar with the following command:

mvn clean install -DskipTests

You can find Trino connector jar in ./paimon-trino-<trino-version>/target/paimon-trino-<trino-version>-1.1-SNAPSHOT-plugin.tar.gz.

We use hadoop-apache as a dependency for Hadoop, and the default Hadoop dependency typically supports both Hadoop 2 and Hadoop 3. If you encounter an unsupported scenario, you can specify the corresponding Apache Hadoop version.

For example, if you want to use Hadoop 3.3.5-1, you can use the following command to build the jar:

mvn clean install -DskipTests -Dhadoop.apache.version=3.3.5-1

Configure Paimon Catalog #

Install Paimon Connector #

tar -zxf paimon-trino-<trino-version>-1.1-SNAPSHOT-plugin.tar.gz -C ${TRINO_HOME}/plugin

NOTE: For JDK 21, when Deploying Trino, should add jvm options: --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED

Configure #

Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/paimon.properties with the following contents to mount the paimon connector as the paimon catalog:

connector.name=paimon
warehouse=file:/tmp/warehouse

If you are using HDFS, choose one of the following ways to configure your HDFS:

set environment variable HADOOP_HOME.
set environment variable HADOOP_CONF_DIR.
configure hadoop-conf-dir in the properties.

If you are using a Hadoop filesystem, you can still use trino-hdfs and trino-hive to config it. For example, if you use oss as a storage, you can write in paimon.properties according to Trino Reference:

hive.config.resources=/path/to/core-site.xml

Then, config core-site.xml according to Jindo Reference

Kerberos #

You can configure kerberos keytab file when using KERBEROS authentication in the properties.

security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/trino/hdfs.keytab

Keytab files must be distributed to every node in the cluster that runs Trino.

Create Schema #

CREATE SCHEMA paimon.test_db;

Create Table #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    orders_tatus varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

Add Column #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    orders_tatus varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

ALTER TABLE paimon.test_db.orders ADD COLUMN shipping_address varchar;

Query #

SELECT * FROM paimon.test_db.orders

Query with Time Traveling #

-- read the snapshot from specified timestamp
SELECT * FROM t FOR TIMESTAMP AS OF TIMESTAMP '2023-01-01 00:00:00 Asia/Shanghai';

-- read the snapshot with id 1L (use snapshot id as version)
SELECT * FROM t FOR VERSION AS OF 1;

-- read tag 'my-tag'
SELECT * FROM t FOR VERSION AS OF 'my-tag';

If tag’s name is a number and equals to a snapshot id, the VERSION AS OF syntax will consider tag first. For example, if you have a tag named ‘1’ based on snapshot 2, the statement SELECT * FROM paimon.test_db.orders FOR VERSION AS OF '1' actually queries snapshot 2 instead of snapshot 1.

Insert #

INSERT INTO paimon.test_db.orders VALUES (.....);

Supports:

primary key table with fixed bucket.
non-primary-key table with bucket -1.

Trino to Paimon type mapping #

This section lists all supported type conversion between Trino and Paimon. All Trino’s data types are available in package io.trino.spi.type.

Trino Data Type	Paimon Data Type	Atomic Type
`RowType`	`RowType`	false
`MapType`	`MapType`	false
`ArrayType`	`ArrayType`	false
`BooleanType`	`BooleanType`	true
`TinyintType`	`TinyIntType`	true
`SmallintType`	`SmallIntType`	true
`IntegerType`	`IntType`	true
`BigintType`	`BigIntType`	true
`RealType`	`FloatType`	true
`DoubleType`	`DoubleType`	true
`CharType(length)`	`CharType(length)`	true
`VarCharType(VarCharType.MAX_LENGTH)`	`VarCharType(VarCharType.MAX_LENGTH)`	true
`VarCharType(length)`	`VarCharType(length), length is less than VarCharType.MAX_LENGTH`	true
`DateType`	`DateType`	true
`TimestampType`	`TimestampType`	true
`DecimalType(precision, scale)`	`DecimalType(precision, scale)`	true
`VarBinaryType(length)`	`VarBinaryType(length)`	true
`TimestampWithTimeZoneType`	`LocalZonedTimestampType`	true

Tmp Dir #

Paimon will unzip some jars to the tmp directory for codegen. By default, Trino will use '/tmp' as the temporary directory, but '/tmp' may be periodically deleted.

You can configure this environment variable when Trino starts:

-Djava.io.tmpdir=/path/to/other/tmpdir

Let Paimon use a secure temporary directory.