S3

S3 #

Download paimon-s3-0.9.0.jar.
If you have already configured s3 access through Flink (Via Flink FileSystem), here you can skip the following configuration.

Put paimon-s3-0.9.0.jar into lib directory of your Flink home, and create catalog:

CREATE CATALOG my_catalog WITH (
    'type' = 'paimon',
    'warehouse' = 's3://<bucket>/<path>',
    's3.endpoint' = 'your-endpoint-hostname',
    's3.access-key' = 'xxx',
    's3.secret-key' = 'yyy'
);
If you have already configured s3 access through Spark (Via Hadoop FileSystem), here you can skip the following configuration.

Place paimon-s3-0.9.0.jar together with paimon-spark-0.9.0.jar under Spark’s jars directory, and start like

spark-sql \ 
  --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
  --conf spark.sql.catalog.paimon.warehouse=s3://<bucket>/<path> \
  --conf spark.sql.catalog.paimon.s3.endpoint=your-endpoint-hostname \
  --conf spark.sql.catalog.paimon.s3.access-key=xxx \
  --conf spark.sql.catalog.paimon.s3.secret-key=yyy
If you have already configured s3 access through Hive ((Via Hadoop FileSystem)), here you can skip the following configuration.

NOTE: You need to ensure that Hive metastore can access s3.

Place paimon-s3-0.9.0.jar together with paimon-hive-connector-0.9.0.jar under Hive’s auxlib directory, and start like

SET paimon.s3.endpoint=your-endpoint-hostname;
SET paimon.s3.access-key=xxx;
SET paimon.s3.secret-key=yyy;

And read table from hive metastore, table can be created by Flink or Spark, see Catalog with Hive Metastore

SELECT * FROM test_table;
SELECT COUNT(1) FROM test_table;

Paimon use shared trino filesystem as basic read and write system.

Please refer to Trino S3 to config s3 filesystem in trino.

S3 Complaint Object Stores #

The S3 Filesystem also support using S3 compliant object stores such as MinIO, Tencent’s COS and IBM’s Cloud Object Storage. Just configure your endpoint to the provider of the object store service.

s3.endpoint: your-endpoint-hostname

Configure Path Style Access #

Some S3 compliant object stores might not have virtual host style addressing enabled by default, for example when using Standalone MinIO for testing purpose. In such cases, you will have to provide the property to enable path style access.

s3.path.style.access: true

S3A Performance #

Tune Performance for S3AFileSystem.

If you encounter the following exception:

Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool.

Try to configure this in catalog options: fs.s3a.connection.maximum=1000.

Edit This Page
Copyright © 2024 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.