This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Apache Paimon #

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations. Paimon innovatively combines lake format and LSM (Log-structured merge-tree) structure, bringing realtime streaming updates into the lake architecture.

Paimon offers the following core capabilities:

Realtime updates:
- Primary key table supports writing of large-scale updates, has very high update performance, typically through Flink Streaming.
- Support defining Merge Engines, update records however you like. Deduplicate to keep last row, or partial-update, or aggregate records, or first-row, you decide.
- Support defining changelog-producer, produce correct and complete changelog in updates for merge engines, simplifying your streaming analytics.
Huge Append Data Processing:
- Append table (no primary-key) provides large scale batch & streaming processing capability. Automatic Small File Merge.
- Supports Data Compaction with z-order sorting to optimize file layout, provides fast queries based on data skipping using indexes such as minmax.
Data Lake Capabilities:
- Scalable metadata: supports storing Petabyte large-scale datasets and storing a large number of partitions.
- Supports ACID Transactions & Time Travel & Schema Evolution.

Try Paimon

If you’re interested in playing around with Paimon, check out our quick start guide with Flink or Spark. It provides a step by step introduction to the APIs and guides you through real applications.

Get Help with Paimon

If you get stuck, you can subscribe User Mailing List (user-subscribe@paimon.apache.org), Paimon tracks issues in GitHub and prefers to receive contributions as pull requests. You can also create an issue.