This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Streaming #
You can streaming write to the Append table in a very flexible way through Flink, or through read the Append table Flink, using it like a queue. The only difference is that its latency is in minutes. Its advantages are very low cost and the ability to push down filters and projection.
Automatic small file merging #
In streaming writing job, without bucket definition, there is no compaction in writer, instead, will use
Compact Coordinator
to scan the small files and pass compaction task to Compact Worker
. In streaming mode, if you
run insert sql in flink, the topology will be like this:
Do not worry about backpressure, compaction never backpressure.
If you set write-only
to true, the Compact Coordinator
and Compact Worker
will be removed in the topology.
The auto compaction is only supported in Flink engine streaming mode. You can also start a compaction job in flink by
flink action in paimon and disable all the other compaction by set write-only
.
Streaming Query #
You can stream the Append table and use it like a Message Queue. As with primary key tables, there are two options for streaming reads:
- By default, Streaming read produces the latest snapshot on the table upon first startup, and continue to read the latest incremental records.
- You can specify
scan.mode
orscan.snapshot-id
orscan.timestamp-millis
orscan.file-creation-time-millis
to streaming read incremental only.
Similar to flink-kafka, order is not guaranteed by default, if your data has some sort of order requirement, you also
need to consider defining a bucket-key
, see Bucketed Append