Trino #

This documentation is a guide for using Paimon in Trino.

Version #

Paimon currently supports Trino 420 and above.

Filesystem #

From version 0.8, paimon share trino filesystem for all actions, which means, you should config trino filesystem before using trino-paimon. You can find information about how to config filesystems for trino on trino official website.

Preparing Paimon Jar File #

Download from master: https://paimon.apache.org/docs/master/project/download/

You can also manually build a bundled jar from the source code. However, there are a few preliminary steps that need to be taken before compiling:

To build from the source code, clone the git repository.
Install JDK17 locally, and configure JDK17 as a global environment variable;

Then,you can build bundled jar with the following command:

mvn clean install -DskipTests

You can find Trino connector jar in ./paimon-trino-<trino-version>/target/paimon-trino-<trino-version>-0.8.2-plugin.tar.gz.

We use hadoop-apache as a dependency for Hadoop, and the default Hadoop dependency typically supports both Hadoop 2 and Hadoop 3. If you encounter an unsupported scenario, you can specify the corresponding Apache Hadoop version.

For example, if you want to use Hadoop 3.3.5-1, you can use the following command to build the jar:

mvn clean install -DskipTests -Dhadoop.apache.version=3.3.5-1

Tmp Dir #

Paimon will unzip some jars to the tmp directory for codegen. By default, Trino will use '/tmp' as the temporary directory, but '/tmp' may be periodically deleted.

You can configure this environment variable when Trino starts:

-Djava.io.tmpdir=/path/to/other/tmpdir

Let Paimon use a secure temporary directory.

Configure Paimon Catalog #

Install Paimon Connector #

tar -zxf paimon-trino-<trino-version>-0.8.2-plugin.tar.gz -C ${TRINO_HOME}/plugin

the variable trino-version is module name, must be one of 420, 427.

NOTE: For JDK 17, when Deploying Trino, should add jvm options: --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED

Configure #

Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/paimon.properties with the following contents to mount the paimon connector as the paimon catalog:

connector.name=paimon
warehouse=file:/tmp/warehouse

If you are using HDFS, choose one of the following ways to configure your HDFS:

set environment variable HADOOP_HOME.
set environment variable HADOOP_CONF_DIR.
configure hadoop-conf-dir in the properties.

If you are using a hadoop filesystem, you can still use trino-hdfs and trino-hive to config it. For example, if you use oss as a storage, you can write in paimon.properties according to Trino Reference:

hive.config.resources=/path/to/core-site.xml

Then, config core-site.xml according to Jindo Reference

Kerberos #

You can configure kerberos keytab file when using KERBEROS authentication in the properties.

security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/trino/hdfs.keytab

Keytab files must be distributed to every node in the cluster that runs Trino.

Create Schema #

CREATE SCHEMA paimon.test_db;

Create Table #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    orders_tatus varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

Add Column #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    orders_tatus varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

ALTER TABLE paimon.test_db.orders ADD COLUMN shipping_address varchar;

Query #

SELECT * FROM paimon.test_db.orders

Query with Time Traveling #

version >=420

-- read the snapshot from specified timestamp
SELECT * FROM t FOR TIMESTAMP AS OF TIMESTAMP '2023-01-01 00:00:00 Asia/Shanghai';

-- read the snapshot with id 1L (use snapshot id as version)
SELECT * FROM t FOR VERSION AS OF 1;

Trino to Paimon type mapping #

This section lists all supported type conversion between Trino and Paimon. All Trino’s data types are available in package io.trino.spi.type.

Trino Data Type	Paimon Data Type	Atomic Type
`RowType`	`RowType`	false
`MapType`	`MapType`	false
`ArrayType`	`ArrayType`	false
`BooleanType`	`BooleanType`	true
`TinyintType`	`TinyIntType`	true
`SmallintType`	`SmallIntType`	true
`IntegerType`	`IntType`	true
`BigintType`	`BigIntType`	true
`RealType`	`FloatType`	true
`DoubleType`	`DoubleType`	true
`CharType(length)`	`CharType(length)`	true
`VarCharType(VarCharType.MAX_LENGTH)`	`VarCharType(VarCharType.MAX_LENGTH)`	true
`VarCharType(length)`	`VarCharType(length), length is less than VarCharType.MAX_LENGTH`	true
`DateType`	`DateType`	true
`TimestampType`	`TimestampType`	true
`DecimalType(precision, scale)`	`DecimalType(precision, scale)`	true
`VarBinaryType(length)`	`VarBinaryType(length)`	true
`TimestampWithTimeZoneType`	`LocalZonedTimestampType`	true