S3

S3 #

Download paimon-s3-0.4.0-incubating.jar.
If you have already configured s3 access through Flink (Via Flink FileSystem), here you can skip the following configuration.

Put paimon-s3-0.4.0-incubating.jar into lib directory of your Flink home, and create catalog:

CREATE CATALOG my_catalog WITH (
    'type' = 'paimon',
    'warehouse' = 's3://path/to/warehouse',
    's3.endpoint' = 'your-endpoint-hostname',
    's3.access-key' = 'xxx',
    's3.secret-key' = 'yyy'
);
If you have already configured s3 access through Spark (Via Hadoop FileSystem), here you can skip the following configuration.

Place paimon-s3-0.4.0-incubating.jar together with paimon-spark-0.4.0-incubating.jar under Spark’s jars directory, and start like

spark-sql \ 
  --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
  --conf spark.sql.catalog.paimon.warehouse=s3://<bucket>/<endpoint> \
  --conf spark.sql.catalog.paimon.s3.endpoint=your-endpoint-hostname \
  --conf spark.sql.catalog.paimon.s3.access-key=xxx \
  --conf spark.sql.catalog.paimon.s3.secret-key=yyy
If you have already configured s3 access through Hive ((Via Hadoop FileSystem)), here you can skip the following configuration.

NOTE: You need to ensure that Hive metastore can access s3.

Place paimon-s3-0.4.0-incubating.jar together with paimon-hive-connector-0.4.0-incubating.jar under Hive’s auxlib directory, and start like

SET paimon.s3.endpoint=your-endpoint-hostname;
SET paimon.s3.access-key=xxx;
SET paimon.s3.secret-key=yyy;

And read table from hive metastore, table can be created by Flink or Spark, see Catalog with Hive Metastore

SELECT * FROM test_table;
SELECT COUNT(1) FROM test_table;

Place paimon-s3-0.4.0-incubating.jar together with paimon-trino-0.4.0-incubating.jar under plugin/paimon directory.

Add options in etc/catalog/paimon.properties.

s3.endpoint=your-endpoint-hostname
s3.access-key=xxx
s3.secret-key=yyy

S3 Complaint Object Stores #

The S3 Filesystem also support using S3 compliant object stores such as MinIO, Tencent’s COS and IBM’s Cloud Object Storage. Just configure your endpoint to the provider of the object store service.

s3.endpoint: your-endpoint-hostname

Configure Path Style Access #

Some S3 compliant object stores might not have virtual host style addressing enabled by default, for example when using Standalone MinIO for testing purpose. In such cases, you will have to provide the property to enable path style access.

s3.path.style.access: true

S3A Performance #

Tune Performance for S3AFileSystem.

If you encounter the following exception:

Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool.

Try to configure this in catalog options: fs.s3a.connection.maximum=1000.