Presto

Presto #

This documentation is a guide for using Paimon in Presto.

Version #

Paimon currently supports Presto 0.236 and above.

Preparing Paimon Jar File #

Download from master: https://paimon.apache.org/docs/master/project/download/

You can also manually build a bundled jar from the source code.

To build from the source code, clone the git repository.

Build presto connector plugin with the following command.

mvn clean install -DskipTests

After the packaging is complete, you can choose the corresponding connector based on your own Presto version:

Version Package
[0.236, 0.268) ./paimon-presto-0.236/target/paimon-presto-0.236-0.7.0-incubating-plugin.tar.gz
[0.268, 0.273) ./paimon-presto-0.268/target/paimon-presto-0.268-0.7.0-incubating-plugin.tar.gz
[0.273, latest] ./paimon-presto-0.273/target/paimon-presto-0.273-0.7.0-incubating-plugin.tar.gz

Of course, we also support different versions of Hive and Hadoop. But note that we utilize Presto-shaded versions of Hive and Hadoop packages to address dependency conflicts. You can check the following two links to select the appropriate versions of Hive and Hadoop:

hadoop-apache2

hive-apache

Both Hive 2 and 3, as well as Hadoop 2 and 3, are supported.

For example, if your presto version is 0.274, hive and hadoop version is 2.x, you could run:

mvn clean install -DskipTests -am -pl paimon-presto-0.273 -Dpresto.version=0.274 -Dhadoop.apache2.version=2.7.4-9 -Dhive.apache.version=1.2.2-2

Tmp Dir #

Paimon will unzip some jars to the tmp directory for codegen. By default, Presto will use '/tmp' as the temporary directory, but '/tmp' may be periodically deleted.

You can configure this environment variable when Presto starts:

-Djava.io.tmpdir=/path/to/other/tmpdir

Let Paimon use a secure temporary directory.

Configure Paimon Catalog #

Install Paimon Connector #

tar -zxf paimon-presto-${PRESTO_VERSION}/target/paimon-presto-${PRESTO_VERSION}-${PAIMON_VERSION}-plugin.tar.gz -C ${PRESTO_HOME}/plugin

Note that, the variable PRESTO_VERSION is module name, must be one of 0.236, 0.268, 0.273.

Configuration #

cd ${PRESTO_HOME}
mkdir -p etc/catalog
connector.name=paimon
# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
warehouse=${YOUR_FS_PATH}

If you are using HDFS FileSystem, you will also need to do one more thing: choose one of the following ways to configure your HDFS:

  • set environment variable HADOOP_HOME.
  • set environment variable HADOOP_CONF_DIR.
  • configure hadoop-conf-dir in the properties.

If you are using S3 FileSystem, you need to add paimon-s3-${PAIMON_VERSION}.jar in ${PRESTO_HOME}/plugin/paimon and additionally configure the following properties in paimon.properties:

s3.endpoint=${YOUR_ENDPOINTS}
s3.access-key=${YOUR_AK}
s3.secret-key=${YOUR_SK}

Query HiveCatalog table:

vim etc/catalog/paimon.properties

and set the following config:

connector.name=paimon
# set your filesystem path, such as hdfs://namenode01:8020/path and s3://${YOUR_S3_BUCKET}/path
warehouse=${YOUR_FS_PATH}
metastore=hive
uri=thrift://${YOUR_HIVE_METASTORE}:9083

Kerberos #

You can configure kerberos keytab file when using KERBEROS authentication in the properties.

security.kerberos.login.principal=hadoop-user
security.kerberos.login.keytab=/etc/presto/hdfs.keytab

Keytab files must be distributed to every node in the cluster that runs Presto.

Create Schema #

CREATE SCHEMA paimon.test_db;

Create Table #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    order_status varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

Add Column #

CREATE TABLE paimon.test_db.orders (
    order_key bigint,
    orders_tatus varchar,
    total_price decimal(18,4),
    order_date date
)
WITH (
    file_format = 'ORC',
    primary_key = ARRAY['order_key','order_date'],
    partitioned_by = ARRAY['order_date'],
    bucket = '2',
    bucket_key = 'order_key',
    changelog_producer = 'input'
)

ALTER TABLE paimon.test_db.orders ADD COLUMN "shipping_address varchar;

Query #

SELECT * FROM paimon.default.MyTable

Presto to Paimon type mapping #

This section lists all supported type conversion between Presto and Paimon. All Presto’s data types are available in package com.facebook.presto.common.type.

Presto Data Type Paimon Data Type Atomic Type
RowType RowType false
MapType MapType false
ArrayType ArrayType false
BooleanType BooleanType true
TinyintType TinyIntType true
SmallintType SmallIntType true
IntegerType IntType true
BigintType BigIntType true
RealType FloatType true
DoubleType DoubleType true
CharType(length) CharType(length) true
VarCharType(VarCharType.MAX_LENGTH) VarCharType(VarCharType.MAX_LENGTH) true
VarCharType(length) VarCharType(length), length is less than VarCharType.MAX_LENGTH true
DateType DateType true
TimestampType TimestampType true
DecimalType(precision, scale) DecimalType(precision, scale) true
VarBinaryType(length) VarBinaryType(length) true
TimestampWithTimeZoneType LocalZonedTimestampType true
Edit This Page
Apache Paimon is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
Copyright © 2023 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.