Trino
This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Trino #

This documentation is a guide for using Paimon in Trino.

Version #

Paimon currently supports Trino 358 and above.

Preparing Paimon Jar File #

Version Package
[358, 368) paimon-trino-358-0.8-SNAPSHOT-plugin.tar.gz
[368, 369) paimon-trino-368-0.8-SNAPSHOT-plugin.tar.gz
[369, 370) paimon-trino-369-0.8-SNAPSHOT-plugin.tar.gz
[370, 388) paimon-trino-370-0.8-SNAPSHOT-plugin.tar.gz
[388, 393) paimon-trino-388-0.8-SNAPSHOT-plugin.tar.gz
[393, 422] paimon-trino-393-0.8-SNAPSHOT-plugin.tar.gz
[422, latest] paimon-trino-422-0.8-SNAPSHOT-plugin.tar.gz

You can also manually build a bundled jar from the source code. However, there are a few preliminary steps that need to be taken before compiling:

  • To build from the source code, clone the git repository.
  • Install JDK11 and JDK17 locally, and configure JDK11 as a global environment variable;
  • Configure the toolchains.xml file in ${{ MAVEN_HOME }}, the content is as follows.
 <toolchains>
    <toolchain>
        <type>jdk</type>
        <provides>
            <version>17</version>
            <vendor>adopt</vendor>
        </provides>
        <configuration>
            <jdkHome>${{ JAVA_HOME }}</jdkHome>
        </configuration>
    </toolchain>
 </toolchains>          

Then,you can build bundled jar with the following command:

mvn clean install -DskipTests

You can find Trino connector jar in ./paimon-trino-<trino-version>/target/paimon-trino-<trino-version>-0.8-SNAPSHOT-plugin.tar.gz.

We use hadoop-apache as a dependency for Hadoop, and the default Hadoop dependency typically supports both Hadoop 2 and Hadoop 3. If you encounter an unsupported scenario, you can specify the corresponding Apache Hadoop version.

For example, if you want to use Hadoop 3.3.5-1, you can use the following command to build the jar:

mvn clean install -DskipTests -Dhadoop.apache.version=3.3.5-1

Tmp Dir #

Paimon will unzip some jars to the tmp directory for codegen. By default, Trino will use '/tmp' as the temporary directory, but '/tmp' may be periodically deleted.

You can configure this environment variable when Trino starts:

-Djava.io.tmpdir=/path/to/other/tmpdir

Let Paimon use a secure temporary directory.

Configure Paimon Catalog #

Install Paimon Connector #

tar -zxf paimon-trino-<trino-version>-0.8-SNAPSHOT-plugin.tar.gz -C ${TRINO_HOME}/plugin

the variable trino-version is module name, must be one of 358, 368, 369, 370, 388, 393, 422.

NOTE: For JDK 17, when Deploying Trino, should add jvm options: --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED

Configure #

Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/paimon.properties with the following contents to mount the paimon connector as the paimon catalog:

connector.name=paimon
warehouse=file:/tmp/warehouse

If you are using HDFS, choose one of the following ways to configure your HDFS:

  • set environment variable HADOOP_HOME.
  • set environment variable HADOOP_CONF_DIR.
  • configure hadoop-conf-dir in the properties.

Kerberos #

You can configure kerberos keytab file when using KERBEROS authentication in the properties.

security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/trino/hdfs.keytab

Keytab files must be distributed to every node in the cluster that runs Trino.

Create Schema #

CREATE SCHEMA paimon.test_db;

Create Table #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    orders_tatus varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

Add Column #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    orders_tatus varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

ALTER TABLE paimon.test_db.orders ADD COLUMN shipping_address varchar;

Query #

SELECT * FROM paimon.test_db.orders

Query with Time Traveling #

-- read the snapshot from specified timestamp
SELECT * FROM t FOR TIMESTAMP AS OF TIMESTAMP '2023-01-01 00:00:00 Asia/Shanghai';

-- read the snapshot with id 1L (use snapshot id as version)
SELECT * FROM t FOR VERSION AS OF 1;
-- read the snapshot from specified timestamp with a long value in unix milliseconds
SET SESSION paimon.scan_timestamp_millis=1679486589444;
SELECT * FROM t;

Trino to Paimon type mapping #

This section lists all supported type conversion between Trino and Paimon. All Trino’s data types are available in package io.trino.spi.type.

Trino Data Type Paimon Data Type Atomic Type
RowType RowType false
MapType MapType false
ArrayType ArrayType false
BooleanType BooleanType true
TinyintType TinyIntType true
SmallintType SmallIntType true
IntegerType IntType true
BigintType BigIntType true
RealType FloatType true
DoubleType DoubleType true
CharType(length) CharType(length) true
VarCharType(VarCharType.MAX_LENGTH) VarCharType(VarCharType.MAX_LENGTH) true
VarCharType(length) VarCharType(length), length is less than VarCharType.MAX_LENGTH true
DateType DateType true
TimestampType TimestampType true
DecimalType(precision, scale) DecimalType(precision, scale) true
VarBinaryType(length) VarBinaryType(length) true
TimestampWithTimeZoneType LocalZonedTimestampType true