A lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations. Innovatively combines lake format and LSM structure, bringing realtime streaming updates into the lake architecture.
Primary-key table supports real-time streaming updates of large amounts of data. Real-time query within 1 minute.
Flexible Updates
Defining Merge Engines, update records however you like. Deduplicate to keep last row, or partial-update, or aggregate records, or first-row, you decide.
Change-tracking Updates
Defining changelog-producer, produce correct and complete changelog in updates for merge engines, simplifying your streaming analytics.
Append Data Processing
Append table (no primary-key) provides large scale batch and streaming processing capability. Supports compaction with z-order sorting.
Query Data Skipping
Based on indexes such as minmax, filter irrelevant files and provide high-performance queries, more indexes are being supported.
Data Lake Capabilities
Low cost, High reliability, Scalable metadata, Time Travel and Full Schema Evolution. All advantage as a data lake storage.