S3 #
Download paimon-s3-0.9.0.jar.If you have already configured s3 access through Flink (Via Flink FileSystem), here you can skip the following configuration.
Put paimon-s3-0.9.0.jar
into lib
directory of your Flink home, and create catalog:
CREATE CATALOG my_catalog WITH (
'type' = 'paimon',
'warehouse' = 's3://<bucket>/<path>',
's3.endpoint' = 'your-endpoint-hostname',
's3.access-key' = 'xxx',
's3.secret-key' = 'yyy'
);
If you have already configured s3 access through Spark (Via Hadoop FileSystem), here you can skip the following configuration.
Place paimon-s3-0.9.0.jar
together with paimon-spark-0.9.0.jar
under Spark’s jars directory, and start like
spark-sql \
--conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
--conf spark.sql.catalog.paimon.warehouse=s3://<bucket>/<path> \
--conf spark.sql.catalog.paimon.s3.endpoint=your-endpoint-hostname \
--conf spark.sql.catalog.paimon.s3.access-key=xxx \
--conf spark.sql.catalog.paimon.s3.secret-key=yyy
If you have already configured s3 access through Hive ((Via Hadoop FileSystem)), here you can skip the following configuration.
NOTE: You need to ensure that Hive metastore can access s3
.
Place paimon-s3-0.9.0.jar
together with paimon-hive-connector-0.9.0.jar
under Hive’s auxlib directory, and start like
SET paimon.s3.endpoint=your-endpoint-hostname;
SET paimon.s3.access-key=xxx;
SET paimon.s3.secret-key=yyy;
And read table from hive metastore, table can be created by Flink or Spark, see Catalog with Hive Metastore
SELECT * FROM test_table;
SELECT COUNT(1) FROM test_table;
Paimon use shared trino filesystem as basic read and write system.
Please refer to Trino S3 to config s3 filesystem in trino.
S3 Complaint Object Stores #
The S3 Filesystem also support using S3 compliant object stores such as MinIO, Tencent’s COS and IBM’s Cloud Object Storage. Just configure your endpoint to the provider of the object store service.
s3.endpoint: your-endpoint-hostname
Configure Path Style Access #
Some S3 compliant object stores might not have virtual host style addressing enabled by default, for example when using Standalone MinIO for testing purpose. In such cases, you will have to provide the property to enable path style access.
s3.path.style.access: true
S3A Performance #
Tune Performance for S3AFileSystem
.
If you encounter the following exception:
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool.
Try to configure this in catalog options: fs.s3a.connection.maximum=1000
.