Metrics
This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

Paimon Metrics #

Paimon has built a metrics system to measure the behaviours of reading and writing, like how many manifest files it scanned in the last planning, how long it took in the last commit operation, how many files it deleted in the last compact operation.

In Paimon’s metrics system, metrics are updated and reported at different levels of granularity. Currently, the levels of table and bucket are provided, which means you can get metrics per table or bucket.

There are three types of metrics provided in the Paimon metric system, Gauge, Counter, Histogram.

  • Gauge: Provides a value of any type at a point in time.
  • Counter: Used to count values by incrementing and decrementing.
  • Histogram: Measure the statistical distribution of a set of values including the min, max, mean, standard deviation and percentile.

Paimon has supported built-in metrics to measure operations of commits, scans, writes and compactions, which can be bridged to any computing engine that supports, like Flink, Spark etc.

Metrics List #

Below is lists of Paimon built-in metrics. They are summarized into types of scan metrics, commit metrics, write metrics, write buffer metrics and compaction metrics.

Scan Metrics #

Metrics Name Level Type Description
lastScanDuration Table Gauge The time it took to complete the last scan.
scanDuration Table Histogram Distributions of the time taken by the last few scans.
lastScannedManifests Table Gauge Number of scanned manifest files in the last scan.
lastSkippedByPartitionAndStats Table Gauge Skipped table files by partition filter and value / key stats information in the last scan.
lastSkippedByBucketAndLevelFilter Table Gauge Skipped table files by bucket, bucket key and level filter in the last scan.
lastSkippedByWholeBucketFilesFilter Table Gauge Skipped table files by bucket level value filter (only primary key table) in the last scan.
lastScanSkippedTableFiles Table Gauge Total skipped table files in the last scan.
lastScanResultedTableFiles Table Gauge Resulted table files in the last scan.

Commit Metrics #

Metrics Name Level Type Description
lastCommitDuration Table Gauge The time it took to complete the last commit.
commitDuration Table Histogram Distributions of the time taken by the last few commits.
lastCommitAttempts Table Gauge The number of attempts the last commit made.
lastTableFilesAdded Table Gauge Number of added table files in the last commit, including newly created data files and compacted after.
lastTableFilesDeleted Table Gauge Number of deleted table files in the last commit, which comes from compacted before.
lastTableFilesAppended Table Gauge Number of appended table files in the last commit, which means the newly created data files.
lastTableFilesCommitCompacted Table Gauge Number of compacted table files in the last commit, including compacted before and after.
lastChangelogFilesAppended Table Gauge Number of appended changelog files in last commit.
lastChangelogFileCommitCompacted Table Gauge Number of compacted changelog files in last commit.
lastGeneratedSnapshots Table Gauge Number of snapshot files generated in the last commit, maybe 1 snapshot or 2 snapshots.
lastDeltaRecordsAppended Table Gauge Delta records count in last commit with APPEND commit kind.
lastChangelogRecordsAppended Table Gauge Changelog records count in last commit with APPEND commit kind.
lastDeltaRecordsCommitCompacted Table Gauge Delta records count in last commit with COMPACT commit kind.
lastChangelogRecordsCommitCompacted Table Gauge Changelog records count in last commit with COMPACT commit kind.
lastPartitionsWritten Table Gauge Number of partitions written in the last commit.
lastBucketsWritten Table Gauge Number of buckets written in the last commit.

Write Metrics #

Metrics Name Level Type Description
writeRecordCount Bucket Counter Total number of records written into the bucket.
flushCostMillis Bucket Histogram Distributions of the time taken by the last few write buffer flushing.
prepareCommitCostMillis Bucket Histogram Distributions of the time taken by the last few call of `prepareCommit`.

Write Buffer Metrics #

Metrics Name Level Type Description
bufferPreemptCount Table Gauge The total number of memory preempted.
usedWriteBufferSizeByte Table Gauge Current used write buffer size in byte.
totalWriteBufferSizeByte Table Gauge The total write buffer size configured in byte.

Compaction Metrics #

Metrics Name Level Type Description
level0FileCount Bucket Gauge The level 0 file count will become larger if asynchronous compaction cannot be done in time.
lastCompactionDuration Bucket Gauge The time it took to complete the last compaction.
compactionDuration Bucket Histogram Distributions of the time taken by the last few compaction.
lastTableFilesCompactedBefore Bucket Gauge Number of deleted files in the last compaction.
lastTableFilesCompactedAfter Bucket Gauge Number of added files in the last compaction.
lastChangelogFilesCompacted Bucket Gauge Number of changelog files compacted in last compaction.
lastRewriteInputFileSize Bucket Gauge Size of deleted files in the last compaction.
lastRewriteOutputFileSize Bucket Gauge Size of added files in the last compaction.
lastRewriteChangelogFileSize Bucket Gauge Size of changelog files compacted in last compaction.

Paimon has implemented bridging metrics to Flink’s metrics system, which can be reported by Flink, and the lifecycle of metric groups are managed by Flink.

Please join the <scope>.<infix>.<metric_name> to get the complete metric identifier when using Flink to access Paimon, metric_name can be got from Metric List.

For example, the identifier of metric lastPartitionsWritten for table word_count in Flink job named insert_word_count is:

localhost.taskmanager.localhost:60340-775a20.insert_word_count.Global Committer : word_count.0.paimon.table.word_count.commit.lastPartitionsWritten.

From Flink Web-UI, go to the committer operator’s metrics, it’s shown as:

0.Global_Committer___word_count.paimon.table.word_count.commit.lastPartitionsWritten.

  1. Please refer to System Scope to understand Flink scope
  2. Scan metrics are only supported by Flink versions >= 1.18
Scope Infix
Scan Metrics <host>.jobmanager.<job_name> <source_operator_name>.coordinator. enumerator.paimon.table.<table_name>.scan
Commit Metrics <host>.taskmanager.<tm_id>.<job_name>.<committer_operator_name>.<subtask_index> paimon.table.<table_name>.commit
Write Metrics <host>.taskmanager.<tm_id>.<job_name>.<writer_operator_name>.<subtask_index> paimon.table.<table_name>.partition.<partition_string>.bucket.<bucket_index>.writer
Write Buffer Metrics <host>.taskmanager.<tm_id>.<job_name>.<writer_operator_name>.<subtask_index> paimon.table.<table_name>.writeBuffer
Compaction Metrics <host>.taskmanager.<tm_id>.<job_name>.<writer_operator_name>.<subtask_index> paimon.table.<table_name>.partition.<partition_string>.bucket.<bucket_index>.compaction
Flink Source Metrics <host>.taskmanager.<tm_id>.<job_name>.<source_operator_name>.<subtask_index> -
Flink Sink Metrics <host>.taskmanager.<tm_id>.<job_name>.<committer_operator_name>.<subtask_index> -

When using Flink to read and write, Paimon has implemented some key standard Flink connector metrics to measure the source latency and output of sink, see FLIP-33: Standardize Connector Metrics. Flink source / sink metrics implemented are listed here.

Metrics Name Level Type Description
currentEmitEventTimeLag Flink Source Operator Gauge Time difference between sending the record out of source and file creation.
currentFetchEventTimeLag Flink Source Operator Gauge Time difference between reading the data file and file creation.
Please note that if you specified consumer-id in your streaming query, the level of source metrics should turn into the reader operator, which is behind the Monitor operator.
Metrics Name Level Type Description
numBytesOut Table Counter The total number of output bytes.
numBytesOutPerSecond Table Meter The output bytes per second.
numRecordsOut Table Counter The total number of output records.
numRecordsOutPerSecond Table Meter The output records per second.