This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Iceberg Ecosystems #
AWS Athena #
AWS Athena may use old manifest reader to read Iceberg manifest by names, we should let Paimon producing legacy Iceberg
manifest list file, you can enable: 'metadata.iceberg.manifest-legacy-version'
.
DuckDB #
Duckdb may rely on files placed in the root/data
directory, while Paimon is usually placed directly in the root
directory, so you can configure this parameter for the table to achieve compatibility:
'data-file.path-directory' = 'data'
.
Trino Iceberg #
In this example, we use Trino Iceberg connector to access Paimon table through Iceberg Hive catalog. Before trying out this example, make sure that you have configured Trino Iceberg connector. See Trino’s document for more information.
Let’s first create a Paimon table with Iceberg compatibility enabled.
CREATE CATALOG paimon_catalog WITH (
'type' = 'paimon',
'warehouse' = '<path-to-warehouse>'
);
CREATE TABLE paimon_catalog.`default`.animals (
kind STRING,
name STRING
) WITH (
'metadata.iceberg.storage' = 'hive-catalog',
'metadata.iceberg.uri' = 'thrift://<host>:<port>'
);
INSERT INTO paimon_catalog.`default`.animals VALUES ('mammal', 'cat'), ('mammal', 'dog'), ('reptile', 'snake'), ('reptile', 'lizard');
Start spark-sql
with the following command line.
spark-sql --jars <path-to-paimon-jar> \
--conf spark.sql.catalog.paimon_catalog=org.apache.paimon.spark.SparkCatalog \
--conf spark.sql.catalog.paimon_catalog.warehouse=<path-to-warehouse> \
--packages org.apache.iceberg:iceberg-spark-runtime-<iceberg-version> \
--conf spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.iceberg_catalog.type=hadoop \
--conf spark.sql.catalog.iceberg_catalog.warehouse=<path-to-warehouse>/iceberg \
--conf spark.sql.catalog.iceberg_catalog.cache-enabled=false \ # disable iceberg catalog caching to quickly see the result
--conf spark.sql.extensions=org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
Run the following Spark SQL to create Paimon table, insert/update data, and query with Iceberg catalog.
CREATE TABLE paimon_catalog.`default`.animals (
kind STRING,
name STRING
) TBLPROPERTIES (
'metadata.iceberg.storage' = 'hive-catalog',
'metadata.iceberg.uri' = 'thrift://<host>:<port>'
);
INSERT INTO paimon_catalog.`default`.animals VALUES ('mammal', 'cat'), ('mammal', 'dog'), ('reptile', 'snake'), ('reptile', 'lizard');
Start Trino using Iceberg catalog and query from Paimon table.
SELECT * FROM animals WHERE class = 'mammal';
/*
kind | name
--------+------
mammal | cat
mammal | dog
*/