Apache Paimon™

A lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations. Innovatively combines lake format and LSM structure, bringing realtime streaming updates into the lake architecture.

One Storage for All Your Data

Key Features

Real-time Updates
Primary-key table supports real-time streaming updates of large amounts of data. Real-time query within 1 minute.

Flexible Updates
Defining Merge Engines, update records however you like. Deduplicate to keep last row, or partial-update, or aggregate records, or first-row, you decide.

Change-tracking Updates
Defining changelog-producer, produce correct and complete changelog in updates for merge engines, simplifying your streaming analytics.

Append Data Processing
Append table (no primary-key) provides large scale batch and streaming processing capability. Supports compaction with z-order sorting.

Query Data Skipping
Based on indexes such as minmax, filter irrelevant files and provide high-performance queries, more indexes are being supported.

Data Lake Capabilities
Low cost, High reliability, Scalable metadata, Time Travel and Full Schema Evolution. All advantage as a data lake storage.

Join the Community