S3 #
Download paimon-s3-0.4.0-incubating.jar.If you have already configured s3 access through Flink (Via Flink FileSystem), here you can skip the following configuration.
Put paimon-s3-0.4.0-incubating.jar
into lib
directory of your Flink home, and create catalog:
CREATE CATALOG my_catalog WITH (
'type' = 'paimon',
'warehouse' = 's3://path/to/warehouse',
's3.endpoint' = 'your-endpoint-hostname',
's3.access-key' = 'xxx',
's3.secret-key' = 'yyy'
);
If you have already configured s3 access through Spark (Via Hadoop FileSystem), here you can skip the following configuration.
Place paimon-s3-0.4.0-incubating.jar
together with paimon-spark-0.4.0-incubating.jar
under Spark’s jars directory, and start like
spark-sql \
--conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
--conf spark.sql.catalog.paimon.warehouse=s3://<bucket>/<endpoint> \
--conf spark.sql.catalog.paimon.s3.endpoint=your-endpoint-hostname \
--conf spark.sql.catalog.paimon.s3.access-key=xxx \
--conf spark.sql.catalog.paimon.s3.secret-key=yyy
If you have already configured s3 access through Hive ((Via Hadoop FileSystem)), here you can skip the following configuration.
NOTE: You need to ensure that Hive metastore can access s3
.
Place paimon-s3-0.4.0-incubating.jar
together with paimon-hive-connector-0.4.0-incubating.jar
under Hive’s auxlib directory, and start like
SET paimon.s3.endpoint=your-endpoint-hostname;
SET paimon.s3.access-key=xxx;
SET paimon.s3.secret-key=yyy;
And read table from hive metastore, table can be created by Flink or Spark, see Catalog with Hive Metastore
SELECT * FROM test_table;
SELECT COUNT(1) FROM test_table;
Place paimon-s3-0.4.0-incubating.jar
together with paimon-trino-0.4.0-incubating.jar
under plugin/paimon
directory.
Add options in etc/catalog/paimon.properties
.
s3.endpoint=your-endpoint-hostname
s3.access-key=xxx
s3.secret-key=yyy
S3 Complaint Object Stores #
The S3 Filesystem also support using S3 compliant object stores such as MinIO, Tencent’s COS and IBM’s Cloud Object Storage. Just configure your endpoint to the provider of the object store service.
s3.endpoint: your-endpoint-hostname
Configure Path Style Access #
Some S3 compliant object stores might not have virtual host style addressing enabled by default, for example when using Standalone MinIO for testing purpose. In such cases, you will have to provide the property to enable path style access.
s3.path.style.access: true
S3A Performance #
Tune Performance for S3AFileSystem
.
If you encounter the following exception:
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool.
Try to configure this in catalog options: fs.s3a.connection.maximum=1000
.