This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Configurations
Configuration #
CoreOptions #
Core options for paimon.
Key | Default | Type | Description |
---|---|---|---|
async-file-write |
true | Boolean | Whether to enable asynchronous IO writing when writing files. |
auto-create |
false | Boolean | Whether to create underlying storage when reading and writing the table. |
branch |
"main" | String | Specify branch name. |
bucket |
-1 | Integer | Bucket number for file store. It should either be equal to -1 (dynamic bucket mode), or it must be greater than 0 (fixed bucket mode). |
bucket-key |
(none) | String | Specify the paimon distribution policy. Data is assigned to each bucket according to the hash value of bucket-key. If you specify multiple fields, delimiter is ','. If not specified, the primary key will be used; if there is no primary key, the full row will be used. |
cache-page-size |
64 kb | MemorySize | Memory page size for caching. |
changelog-file.prefix |
"changelog-" | String | Specify the file name prefix of changelog files. |
changelog-producer |
none | Enum |
Whether to double write to a changelog file. This changelog file keeps the details of data changes, it can be read directly during stream reads. This can be applied to tables with primary keys. Possible values:
|
changelog-producer.row-deduplicate |
false | Boolean | Whether to generate -U, +U changelog for the same record. This configuration is only valid for the changelog-producer is lookup or full-compaction. |
changelog-producer.row-deduplicate-ignore-fields |
(none) | String | Fields that are ignored for comparison while generating -U, +U changelog for the same record. This configuration is only valid for the changelog-producer.row-deduplicate is true. |
changelog.num-retained.max |
(none) | Integer | The maximum number of completed changelog to retain. Should be greater than or equal to the minimum number. |
changelog.num-retained.min |
(none) | Integer | The minimum number of completed changelog to retain. Should be greater than or equal to 1. |
changelog.time-retained |
(none) | Duration | The maximum time of completed changelog to retain. |
commit.callback.#.param |
(none) | String | Parameter string for the constructor of class #. Callback class should parse the parameter by itself. |
commit.callbacks |
(none) | String | A list of commit callback classes to be called after a successful commit. Class names are connected with comma (example: com.test.CallbackA,com.sample.CallbackB). |
commit.force-compact |
false | Boolean | Whether to force a compaction before commit. |
commit.force-create-snapshot |
false | Boolean | Whether to force create snapshot on commit. |
commit.max-retries |
10 | Integer | Maximum number of retries when commit failed. |
commit.timeout |
(none) | Duration | Timeout duration of retry when commit failed. |
commit.user-prefix |
(none) | String | Specifies the commit user prefix. |
compaction.max-size-amplification-percent |
200 | Integer | The size amplification is defined as the amount (in percentage) of additional storage needed to store a single byte of data in the merge tree for changelog mode table. |
compaction.max.file-num |
(none) | Integer | For file set [f_0,...,f_N], the maximum file number to trigger a compaction for append-only table, even if sum(size(f_i)) < targetFileSize. This value avoids pending too much small files.
|
compaction.min.file-num |
5 | Integer | For file set [f_0,...,f_N], the minimum file number which satisfies sum(size(f_i)) >= targetFileSize to trigger a compaction for append-only table. This value avoids almost-full-file to be compacted, which is not cost-effective. |
compaction.optimization-interval |
(none) | Duration | Implying how often to perform an optimization compaction, this configuration is used to ensure the query timeliness of the read-optimized system table. |
compaction.size-ratio |
1 | Integer | Percentage flexibility while comparing sorted run size for changelog mode table. If the candidate sorted run(s) size is 1% smaller than the next sorted run's size, then include next sorted run into this candidate set. |
consumer-id |
(none) | String | Consumer id for recording the offset of consumption in the storage. |
consumer.expiration-time |
(none) | Duration | The expiration interval of consumer files. A consumer file will be expired if it's lifetime after last modification is over this value. |
consumer.ignore-progress |
false | Boolean | Whether to ignore consumer progress for the newly started job. |
consumer.mode |
exactly-once | Enum |
Specify the consumer consistency mode for table. Possible values:
|
continuous.discovery-interval |
10 s | Duration | The discovery interval of continuous reading. |
cross-partition-upsert.bootstrap-parallelism |
10 | Integer | The parallelism for bootstrap in a single task for cross partition upsert. |
cross-partition-upsert.index-ttl |
(none) | Duration | The TTL in rocksdb index for cross partition upsert (primary keys not contain all partition fields), this can avoid maintaining too many indexes and lead to worse and worse performance, but please note that this may also cause data duplication. |
data-file.external-paths |
(none) | String | The external paths where the data of this table will be written, multiple elements separated by commas. |
data-file.external-paths.specific-fs |
(none) | String | The specific file system of the external path when data-file.external-paths.strategy is set to specific-fs, should be the prefix scheme of the external path, now supported are s3 and oss. |
data-file.external-paths.strategy |
none | Enum |
The strategy of selecting an external path when writing data. Possible values:
|
data-file.path-directory |
(none) | String | Specify the path directory of data files. |
data-file.prefix |
"data-" | String | Specify the file name prefix of data files. |
data-file.thin-mode |
false | Boolean | Enable data file thin mode to avoid duplicate columns storage. |
delete-file.thread-num |
(none) | Integer | The maximum number of concurrent deleting files. By default is the number of processors available to the Java virtual machine. |
delete.force-produce-changelog |
false | Boolean | Force produce changelog in delete sql, or you can use 'streaming-read-overwrite' to read changelog from overwrite commit. |
deletion-vector.index-file.target-size |
2 mb | MemorySize | The target size of deletion vector index file. |
deletion-vectors.enabled |
false | Boolean | Whether to enable deletion vectors mode. In this mode, index files containing deletion vectors are generated when data is written, which marks the data for deletion. During read operations, by applying these index files, merging can be avoided. |
dynamic-bucket.assigner-parallelism |
(none) | Integer | Parallelism of assigner operator for dynamic bucket mode, it is related to the number of initialized bucket, too small will lead to insufficient processing speed of assigner. |
dynamic-bucket.initial-buckets |
(none) | Integer | Initial buckets for a partition in assigner operator for dynamic bucket mode. |
dynamic-bucket.target-row-num |
2000000 | Long | If the bucket is -1, for primary key table, is dynamic bucket mode, this option controls the target row number for one bucket. |
dynamic-partition-overwrite |
true | Boolean | Whether only overwrite dynamic partition when overwriting a partitioned table with dynamic partition columns. Works only when the table has partition keys. |
end-input.check-partition-expire |
false | Boolean | Optional endInput check partition expire used in case of batch mode or bounded stream. |
fields.default-aggregate-function |
(none) | String | Default aggregate function of all fields for partial-update and aggregate merge function. |
file-index.in-manifest-threshold |
500 bytes | MemorySize | The threshold to store file index bytes in manifest. |
file-index.read.enabled |
true | Boolean | Whether enabled read file index. |
file-reader-async-threshold |
10 mb | MemorySize | The threshold for read file async. |
file.block-size |
(none) | MemorySize | File block size of format, default value of orc stripe is 64 MB, and parquet row group is 128 MB. |
file.compression |
"zstd" | String | Default file compression. For faster read and write, it is recommended to use zstd. |
file.compression.per.level |
Map | Define different compression policies for different level, you can add the conf like this: 'file.compression.per.level' = '0:lz4,1:zstd'. | |
file.compression.zstd-level |
1 | Integer | Default file compression zstd level. For higher compression rates, it can be configured to 9, but the read and write speed will significantly decrease. |
file.format |
"parquet" | String | Specify the message format of data files, currently orc, parquet and avro are supported. |
file.format.per.level |
Map | Define different file format for different level, you can add the conf like this: 'file.format.per.level' = '0:avro,3:parquet', if the file format for level is not provided, the default format which set by `file.format` will be used. | |
file.suffix.include.compression |
false | Boolean | Whether to add file compression type in the file name of data file and changelog file. |
force-lookup |
false | Boolean | Whether to force the use of lookup for compaction. |
full-compaction.delta-commits |
(none) | Integer | Full compaction will be constantly triggered after delta commits. |
ignore-delete |
false | Boolean | Whether to ignore delete records. |
incremental-between |
(none) | String | Read incremental changes between start snapshot (exclusive) and end snapshot (inclusive), for example, '5,10' means changes between snapshot 5 and snapshot 10. |
incremental-between-scan-mode |
auto | Enum |
Scan kind when Read incremental changes between start snapshot (exclusive) and end snapshot (inclusive). Possible values:
|
incremental-between-timestamp |
(none) | String | Read incremental changes between start timestamp (exclusive) and end timestamp (inclusive), for example, 't1,t2' means changes between timestamp t1 and timestamp t2. |
incremental-to-auto-tag |
(none) | String | Used to specify the end tag (inclusive), and Paimon will find an earlier tag and return changes between them. If the tag doesn't exist or the earlier tag doesn't exist, return empty. |
local-merge-buffer-size |
(none) | MemorySize | Local merge will buffer and merge input records before they're shuffled by bucket and written into sink. The buffer will be flushed when it is full. Mainly to resolve data skew on primary keys. We recommend starting with 64 mb when trying out this feature. |
local-sort.max-num-file-handles |
128 | Integer | The maximal fan-in for external merge sort. It limits the number of file handles. If it is too small, may cause intermediate merging. But if it is too large, it will cause too many files opened at the same time, consume memory and lead to random reading. |
lookup-wait |
true | Boolean | When need to lookup, commit will wait for compaction by lookup. |
lookup.cache-file-retention |
1 h | Duration | The cached files retention time for lookup. After the file expires, if there is a need for access, it will be re-read from the DFS to build an index on the local disk. |
lookup.cache-max-disk-size |
infinite | MemorySize | Max disk size for lookup cache, you can use this option to limit the use of local disks. |
lookup.cache-max-memory-size |
256 mb | MemorySize | Max memory size for lookup cache. |
lookup.cache-spill-compression |
"zstd" | String | Spill compression for lookup cache, currently zstd, none, lz4 and lzo are supported. |
lookup.cache.bloom.filter.enabled |
true | Boolean | Whether to enable the bloom filter for lookup cache. |
lookup.cache.bloom.filter.fpp |
0.05 | Double | Define the default false positive probability for lookup cache bloom filters. |
lookup.cache.high-priority-pool-ratio |
0.25 | Double | The fraction of cache memory that is reserved for high-priority data like index, filter. |
lookup.hash-load-factor |
0.75 | Float | The index load factor for lookup. |
lookup.local-file-type |
sort | Enum |
The local file type for lookup. Possible values:
|
manifest.compression |
"zstd" | String | Default file compression for manifest. |
manifest.delete-file-drop-stats |
false | Boolean | For DELETE manifest entry in manifest file, drop stats to reduce memory and storage. Default value is false only for compatibility of old reader. |
manifest.format |
"avro" | String | Specify the message format of manifest files. |
manifest.full-compaction-threshold-size |
16 mb | MemorySize | The size threshold for triggering full compaction of manifest. |
manifest.merge-min-count |
30 | Integer | To avoid frequent manifest merges, this parameter specifies the minimum number of ManifestFileMeta to merge. |
manifest.target-file-size |
8 mb | MemorySize | Suggested file size of a manifest file. |
merge-engine |
deduplicate | Enum |
Specify the merge engine for table with primary key. Possible values:
|
metadata.stats-dense-store |
true | Boolean | Whether to store statistic densely in metadata (manifest files), which will significantly reduce the storage size of metadata when the none statistic mode is set. Note, when this mode is enabled with 'metadata.stats-mode:none', the Paimon sdk in reading engine requires at least version 0.9.1 or 1.0.0 or higher. |
metadata.stats-mode |
"truncate(16)" | String | The mode of metadata stats collection. none, counts, truncate(16), full is available.
|
metastore.partitioned-table |
false | Boolean | Whether to create this table as a partitioned table in metastore. For example, if you want to list all partitions of a Paimon table in Hive, you need to create this table as a partitioned table in Hive metastore. This config option does not affect the default filesystem metastore. |
metastore.tag-to-partition |
(none) | String | Whether to create this table as a partitioned table for mapping non-partitioned table tags in metastore. This allows the Hive engine to view this table in a partitioned table view and use partitioning field to read specific partitions (specific tags). |
metastore.tag-to-partition.preview |
none | Enum |
Whether to preview tag of generated snapshots in metastore. This allows the Hive engine to query specific tag before creation. Possible values:
|
num-levels |
(none) | Integer | Total level number, for example, there are 3 levels, including 0,1,2 levels. |
num-sorted-run.compaction-trigger |
5 | Integer | The sorted run number to trigger compaction. Includes level0 files (one file one sorted run) and high-level runs (one level one sorted run). |
num-sorted-run.stop-trigger |
(none) | Integer | The number of sorted runs that trigger the stopping of writes, the default value is 'num-sorted-run.compaction-trigger' + 3. |
object-location |
(none) | String | The object location for object table. |
page-size |
64 kb | MemorySize | Memory page size. |
parquet.enable.dictionary |
(none) | Integer | Turn off the dictionary encoding for all fields in parquet. |
partial-update.remove-record-on-delete |
false | Boolean | Whether to remove the whole row in partial-update engine when -D records are received. |
partial-update.remove-record-on-sequence-group |
(none) | String | When -D records of the given sequence groups are received, remove the whole row. |
partition |
(none) | String | Define partition by table options, cannot define partition on DDL and table options at the same time. |
partition.default-name |
"__DEFAULT_PARTITION__" | String | The default partition name in case the dynamic partition column value is null/empty string. |
partition.end-input-to-done |
false | Boolean | Whether mark the done status to indicate that the data is ready when end input. |
partition.expiration-check-interval |
1 h | Duration | The check interval of partition expiration. |
partition.expiration-max-num |
100 | Integer | The default deleted num of partition expiration. |
partition.expiration-strategy |
values-time | Enum |
The strategy determines how to extract the partition time and compare it with the current time. Possible values:
|
partition.expiration-time |
(none) | Duration | The expiration interval of a partition. A partition will be expired if it‘s lifetime is over this value. Partition time is extracted from the partition value. |
partition.legacy-name |
true | Boolean | The legacy partition name is using `toString` fpr all types. If false, using cast to string for all types. |
partition.mark-done-action |
"success-file" | String | Action to mark a partition done is to notify the downstream application that the partition has finished writing, the partition is ready to be read. 1. 'success-file': add '_success' file to directory. 2. 'done-partition': add 'xxx.done' partition to metastore. 3. 'mark-event': mark partition event to metastore. 4. 'http-report': report partition mark done to remote http server. 5. 'custom': use policy class to create a mark-partition policy. Both can be configured at the same time: 'done-partition,success-file,mark-event,custom'. |
partition.mark-done-action.custom.class |
(none) | String | The partition mark done class for implement PartitionMarkDoneAction interface. Only work in custom mark-done-action. |
partition.mark-done-action.http.params |
(none) | String | Http client request parameters will be written to the request body, this can only be used by http-report partition mark done action. |
partition.mark-done-action.http.timeout |
5 s | Duration | Http client connection timeout, this can only be used by http-report partition mark done action. |
partition.mark-done-action.http.url |
(none) | String | Mark done action will reports the partition to the remote http server, this can only be used by http-report partition mark done action. |
partition.timestamp-formatter |
(none) | String | The formatter to format timestamp from string. It can be used with 'partition.timestamp-pattern' to create a formatter using the specified value.
|
partition.timestamp-pattern |
(none) | String | You can specify a pattern to get a timestamp from partitions. The formatter pattern is defined by 'partition.timestamp-formatter'.
|
primary-key |
(none) | String | Define primary key by table options, cannot define primary key on DDL and table options at the same time. |
read.batch-size |
1024 | Integer | Read batch size for any file format if it supports. |
record-level.expire-time |
(none) | Duration | Record level expire time for primary key table, expiration happens in compaction, there is no strong guarantee to expire records in time. You must specific 'record-level.time-field' too. |
record-level.time-field |
(none) | String | Time field for record level expire. It supports the following types: `timestamps in seconds with INT`,`timestamps in seconds with BIGINT`, `timestamps in milliseconds with BIGINT` or `timestamp`. |
rowkind.field |
(none) | String | The field that generates the row kind for primary key table, the row kind determines which data is '+I', '-U', '+U' or '-D'. |
scan.bounded.watermark |
(none) | Long | End condition "watermark" for bounded streaming mode. Stream reading will end when a larger watermark snapshot is encountered. |
scan.fallback-branch |
(none) | String | When a batch job queries from a table, if a partition does not exist in the current branch, the reader will try to get this partition from this fallback branch. |
scan.file-creation-time-millis |
(none) | Long | After configuring this time, only the data files created after this time will be read. It is independent of snapshots, but it is imprecise filtering (depending on whether or not compaction occurs). |
scan.manifest.parallelism |
(none) | Integer | The parallelism of scanning manifest files, default value is the size of cpu processor. Note: Scale-up this parameter will increase memory usage while scanning manifest files. We can consider downsize it when we encounter an out of memory exception while scanning |
scan.max-splits-per-task |
10 | Integer | Max split size should be cached for one task while scanning. If splits size cached in enumerator are greater than tasks size multiply by this value, scanner will pause scanning. |
scan.mode |
default | Enum |
Specify the scanning behavior of the source. Possible values:
|
scan.plan-sort-partition |
false | Boolean | Whether to sort plan files by partition fields, this allows you to read according to the partition order, even if your partition writes are out of order. It is recommended that you use this for streaming read of the 'append-only' table. By default, streaming read will read the full snapshot first. In order to avoid the disorder reading for partitions, you can open this option. |
scan.snapshot-id |
(none) | Long | Optional snapshot id used in case of "from-snapshot" or "from-snapshot-full" scan mode |
scan.tag-name |
(none) | String | Optional tag name used in case of "from-snapshot" scan mode. |
scan.timestamp |
(none) | String | Optional timestamp used in case of "from-timestamp" scan mode, it will be automatically converted to timestamp in unix milliseconds, use local time zone |
scan.timestamp-millis |
(none) | Long | Optional timestamp used in case of "from-timestamp" scan mode. If there is no snapshot earlier than this time, the earliest snapshot will be chosen. |
scan.watermark |
(none) | Long | Optional watermark used in case of "from-snapshot" scan mode. If there is no snapshot later than this watermark, will throw an exceptions. |
sequence.field |
(none) | String | The field that generates the sequence number for primary key table, the sequence number determines which data is the most recent. |
sequence.field.sort-order |
ascending | Enum |
Specify the order of sequence.field. Possible values:
|
sink.watermark-time-zone |
"UTC" | String | The time zone to parse the long watermark value to TIMESTAMP value. The default value is 'UTC', which means the watermark is defined on TIMESTAMP column or not defined. If the watermark is defined on TIMESTAMP_LTZ column, the time zone of watermark is user configured time zone, the value should be the user configured local time zone. The option value is either a full name such as 'America/Los_Angeles', or a custom timezone id such as 'GMT-08:00'. |
snapshot.clean-empty-directories |
false | Boolean | Whether to try to clean empty directories when expiring snapshots, if enabled, please note:
|
snapshot.expire.execution-mode |
sync | Enum |
Specifies the execution mode of expire. Possible values:
|
snapshot.expire.limit |
50 | Integer | The maximum number of snapshots allowed to expire at a time. |
snapshot.num-retained.max |
infinite | Integer | The maximum number of completed snapshots to retain. Should be greater than or equal to the minimum number. |
snapshot.num-retained.min |
10 | Integer | The minimum number of completed snapshots to retain. Should be greater than or equal to 1. |
snapshot.time-retained |
1 h | Duration | The maximum time of completed snapshots to retain. |
snapshot.watermark-idle-timeout |
(none) | Duration | In watermarking, if a source remains idle beyond the specified timeout duration, it triggers snapshot advancement and facilitates tag creation. |
sort-compaction.local-sample.magnification |
1000 | Integer | The magnification of local sample for sort-compaction.The size of local sample is sink parallelism * magnification. |
sort-compaction.range-strategy |
QUANTITY | Enum |
The range strategy of sort compaction, the default value is quantity.
If the data size allocated for the sorting task is uneven,which may lead to performance bottlenecks, the config can be set to size. Possible values:
|
sort-engine |
loser-tree | Enum |
Specify the sort engine for table with primary key. Possible values:
|
sort-spill-buffer-size |
64 mb | MemorySize | Amount of data to spill records to disk in spilled sort. |
sort-spill-threshold |
(none) | Integer | If the maximum number of sort readers exceeds this value, a spill will be attempted. This prevents too many readers from consuming too much memory and causing OOM. |
source.split.open-file-cost |
4 mb | MemorySize | Open file cost of a source file. It is used to avoid reading too many files with a source split, which can be very slow. |
source.split.target-size |
128 mb | MemorySize | Target size of a source split when scanning a bucket. |
spill-compression |
"zstd" | String | Compression for spill, currently zstd, lzo and zstd are supported. |
spill-compression.zstd-level |
1 | Integer | Default spill compression zstd level. For higher compression rates, it can be configured to 9, but the read and write speed will significantly decrease. |
streaming-read-mode |
(none) | Enum |
The mode of streaming read that specifies to read the data of table file or log. Possible values:
|
streaming-read-overwrite |
false | Boolean | Whether to read the changes from overwrite in streaming mode. Cannot be set to true when changelog producer is full-compaction or lookup because it will read duplicated changes. |
streaming.read.snapshot.delay |
(none) | Duration | The delay duration of stream read when scan incremental snapshots. |
tag.automatic-completion |
false | Boolean | Whether to automatically complete missing tags. |
tag.automatic-creation |
none | Enum |
Whether to create tag automatically. And how to generate tags. Possible values:
|
tag.batch.customized-name |
(none) | String | Use customized name when creating tags in Batch mode. |
tag.callback.#.param |
(none) | String | Parameter string for the constructor of class #. Callback class should parse the parameter by itself. |
tag.callbacks |
(none) | String | A list of commit callback classes to be called after a successful tag. Class names are connected with comma (example: com.test.CallbackA,com.sample.CallbackB). |
tag.create-success-file |
false | Boolean | Whether to create tag success file for new created tags. |
tag.creation-delay |
0 ms | Duration | How long is the delay after the period ends before creating a tag. This can allow some late data to enter the Tag. |
tag.creation-period |
daily | Enum |
What frequency is used to generate tags. Possible values:
|
tag.creation-period-duration |
(none) | Duration | The period duration for tag auto create periods.If user set it, tag.creation-period would be invalid. |
tag.default-time-retained |
(none) | Duration | The default maximum time retained for newly created tags. It affects both auto-created tags and manually created (by procedure) tags. |
tag.num-retained-max |
(none) | Integer | The maximum number of tags to retain. It only affects auto-created tags. |
tag.period-formatter |
with_dashes | Enum |
The date format for tag periods. Possible values:
|
target-file-size |
(none) | MemorySize | Target size of a file.
|
type |
table | Enum |
Type of the table. Possible values:
|
write-buffer-for-append |
false | Boolean | This option only works for append-only table. Whether the write use write buffer to avoid out-of-memory error. |
write-buffer-size |
256 mb | MemorySize | Amount of data to build up in memory before converting to a sorted on-disk file. |
write-buffer-spill.max-disk-size |
infinite | MemorySize | The max disk to use for write buffer spill. This only work when the write buffer spill is enabled |
write-buffer-spillable |
(none) | Boolean | Whether the write buffer can be spillable. Enabled by default when using object storage. |
write-manifest-cache |
0 bytes | MemorySize | Cache size for reading manifest files for write initialization. |
write-max-writers-to-spill |
10 | Integer | When in batch append inserting, if the writer number is greater than this option, we open the buffer cache and spill function to avoid out-of-memory. |
write-only |
false | Boolean | If set to true, compactions and snapshot expiration will be skipped. This option is used along with dedicated compact jobs. |
write.batch-size |
1024 | Integer | Write batch size for any file format if it supports. |
zorder.var-length-contribution |
8 | Integer | The bytes of types (CHAR, VARCHAR, BINARY, VARBINARY) devote to the zorder sort. |
CatalogOptions #
Options for paimon catalog.
Key | Default | Type | Description |
---|---|---|---|
cache-enabled |
true | Boolean | Controls whether the catalog will cache databases, tables, manifests and partitions. |
cache.expiration-interval |
10 min | Duration | Controls the duration for which databases and tables in the catalog are cached. |
cache.manifest.max-memory |
(none) | MemorySize | Controls the maximum memory to cache manifest content. |
cache.manifest.small-file-memory |
128 mb | MemorySize | Controls the cache memory to cache small manifest files. |
cache.manifest.small-file-threshold |
1 mb | MemorySize | Controls the threshold of small manifest file. |
cache.partition.max-num |
0 | Long | Controls the max number for which partitions in the catalog are cached. |
cache.snapshot.max-num-per-table |
20 | Integer | Controls the max number for snapshots per table in the catalog are cached. |
case-sensitive |
(none) | Boolean | Indicates whether this catalog is case-sensitive. |
client-pool-size |
2 | Integer | Configure the size of the connection pool. |
format-table.enabled |
true | Boolean | Whether to support format tables, format table corresponds to a regular csv, parquet or orc table, allowing read and write operations. However, during these processes, it does not connect to the metastore; hence, newly added partitions will not be reflected in the metastore and need to be manually added as separate partition operations. |
lock-acquire-timeout |
8 min | Duration | The maximum time to wait for acquiring the lock. |
lock-check-max-sleep |
8 s | Duration | The maximum sleep time when retrying to check the lock. |
lock.enabled |
(none) | Boolean | Enable Catalog Lock. |
lock.type |
(none) | String | The Lock Type for Catalog, such as 'hive', 'zookeeper'. |
metastore |
"filesystem" | String | Metastore of paimon catalog, supports filesystem, hive and jdbc. |
sync-all-properties |
false | Boolean | Sync all table properties to hive metastore |
table.type |
managed | Enum |
Type of table. Possible values:
|
uri |
(none) | String | Uri of metastore server. |
warehouse |
(none) | String | The warehouse root path of catalog. |
HiveCatalogOptions #
Options for Hive catalog.
Key | Default | Type | Description |
---|---|---|---|
client-pool-cache.eviction-interval-ms |
300000 | Long | Setting the client's pool cache eviction interval(ms). |
client-pool-cache.keys |
(none) | String | Specify client cache key, multiple elements separated by commas.
|
hadoop-conf-dir |
(none) | String | File directory of the core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml. Currently, only local file system paths are supported. If not configured, try to load from 'HADOOP_CONF_DIR' or 'HADOOP_HOME' system environment. Configure Priority: 1.from 'hadoop-conf-dir' 2.from HADOOP_CONF_DIR 3.from HADOOP_HOME/conf 4.HADOOP_HOME/etc/hadoop. |
hive-conf-dir |
(none) | String | File directory of the hive-site.xml , used to create HiveMetastoreClient and security authentication, such as Kerberos, LDAP, Ranger and so on. If not configured, try to load from 'HIVE_CONF_DIR' env. |
location-in-properties |
false | Boolean | Setting the location in properties of hive table/database. If you don't want to access the location by the filesystem of hive when using a object storage such as s3,oss you can set this option to true. |
metastore.client.class |
"org.apache.hadoop.hive.metastore.HiveMetaStoreClient" | String | Class name of Hive metastore client. NOTE: This class must directly implements org.apache.hadoop.hive.metastore.IMetaStoreClient. |
JdbcCatalogOptions #
Options for Jdbc catalog.
Key | Default | Type | Description |
---|---|---|---|
catalog-key |
"jdbc" | String | Custom jdbc catalog store key. |
lock-key-max-length |
255 | Integer | Set the maximum length of the lock key. The 'lock-key' is composed of concatenating three fields : 'catalog-key', 'database', and 'table'. |
FlinkCatalogOptions #
Flink catalog options for paimon.
Key | Default | Type | Description |
---|---|---|---|
default-database |
"default" | String | |
disable-create-table-in-default-db |
false | Boolean | If true, creating table in default database is not allowed. Default is false. |
FlinkConnectorOptions #
Flink connector options for paimon.
Key | Default | Type | Description |
---|---|---|---|
end-input.watermark |
(none) | Long | Optional endInput watermark used in case of batch mode or bounded stream. |
lookup.async |
false | Boolean | Whether to enable async lookup join. |
lookup.async-thread-number |
16 | Integer | The thread number for lookup async. |
lookup.bootstrap-parallelism |
4 | Integer | The parallelism for bootstrap in a single task for lookup join. |
lookup.cache |
AUTO | Enum |
The cache mode of lookup join. Possible values:
|
lookup.dynamic-partition |
(none) | String | Specific dynamic partition for lookup, supports 'max_pt()' and 'max_two_pt()' currently. |
lookup.dynamic-partition.refresh-interval |
1 h | Duration | Specific dynamic partition refresh interval for lookup, scan all partitions and obtain corresponding partition. |
lookup.refresh.async |
false | Boolean | Whether to refresh lookup table in an async thread. |
lookup.refresh.async.pending-snapshot-count |
5 | Integer | If the pending snapshot count exceeds the threshold, lookup operator will refresh the table in sync. |
lookup.refresh.time-periods-blacklist |
(none) | String | The blacklist contains several time periods. During these time periods, the lookup table's cache refreshing is forbidden. Blacklist format is start1->end1,start2->end2,... , and the time format is yyyy-MM-dd HH:mm. Only used when lookup table is FULL cache mode. |
partition.idle-time-to-done |
(none) | Duration | Set a time duration when a partition has no new data after this time duration, mark the done status to indicate that the data is ready. |
partition.idle-time-to-report-statistic |
1 h | Duration | Set a time duration when a partition has no new data after this time duration, start to report the partition statistics to hms. |
partition.time-interval |
(none) | Duration | You can specify time interval for partition, for example, daily partition is '1 d', hourly partition is '1 h'. |
precommit-compact |
false | Boolean | If true, it will add a compact coordinator and worker operator after the writer operator,in order to compact several changelog files (for primary key tables) or newly created data files (for unaware bucket tables) from the same partition into large ones, which can decrease the number of small files. |
scan.infer-parallelism |
true | Boolean | If it is false, parallelism of source are set by global parallelism. Otherwise, source parallelism is inferred from splits number (batch mode) or bucket number(streaming mode). |
scan.infer-parallelism.max |
1024 | Integer | If scan.infer-parallelism is true, limit the parallelism of source through this option. |
scan.parallelism |
(none) | Integer | Define a custom parallelism for the scan source. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration. If user enable the scan.infer-parallelism, the planner will derive the parallelism by inferred parallelism. |
scan.remove-normalize |
false | Boolean | Whether to force the removal of the normalize node when streaming read. Note: This is dangerous and is likely to cause data errors if downstream is used to calculate aggregation and the input is not complete changelog. |
scan.split-enumerator.batch-size |
10 | Integer | How many splits should assign to subtask per batch in StaticFileStoreSplitEnumerator to avoid exceed `akka.framesize` limit. |
scan.split-enumerator.mode |
fair | Enum |
The mode used by StaticFileStoreSplitEnumerator to assign splits. Possible values:
|
scan.watermark.alignment.group |
(none) | String | A group of sources to align watermarks. |
scan.watermark.alignment.max-drift |
(none) | Duration | Maximal drift to align watermarks, before we pause consuming from the source/task/partition. |
scan.watermark.alignment.update-interval |
1 s | Duration | How often tasks should notify coordinator about the current watermark and how often the coordinator should announce the maximal aligned watermark. |
scan.watermark.emit.strategy |
on-event | Enum |
Emit strategy for watermark generation. Possible values:
|
scan.watermark.idle-timeout |
(none) | Duration | If no records flow in a partition of a stream for that amount of time, then that partition is considered "idle" and will not hold back the progress of watermarks in downstream operators. |
sink.clustering.by-columns |
(none) | String | Specifies the column name(s) used for comparison during range partitioning, in the format 'columnName1,columnName2'. If not set or set to an empty string, it indicates that the range partitioning feature is not enabled. This option will be effective only for bucket unaware table without primary keys and batch execution mode. |
sink.clustering.sample-factor |
100 | Integer | Specifies the sample factor. Let S represent the total number of samples, F represent the sample factor, and P represent the sink parallelism, then S=F×P. The minimum allowed sample factor is 20. |
sink.clustering.sort-in-cluster |
true | Boolean | Indicates whether to further sort data belonged to each sink task after range partitioning. |
sink.clustering.strategy |
"auto" | String | Specifies the comparison algorithm used for range partitioning, including 'zorder', 'hilbert', and 'order', corresponding to the z-order curve algorithm, hilbert curve algorithm, and basic type comparison algorithm, respectively. When not configured, it will automatically determine the algorithm based on the number of columns in 'sink.clustering.by-columns'. 'order' is used for 1 column, 'zorder' for less than 5 columns, and 'hilbert' for 5 or more columns. |
sink.committer-cpu |
1.0 | Double | Sink committer cpu to control cpu cores of global committer. |
sink.committer-memory |
(none) | MemorySize | Sink committer memory to control heap memory of global committer. |
sink.committer-operator-chaining |
true | Boolean | Allow sink committer and writer operator to be chained together |
sink.cross-partition.managed-memory |
256 mb | MemorySize | Weight of managed memory for RocksDB in cross-partition update, Flink will compute the memory size according to the weight, the actual memory used depends on the running environment. |
sink.managed.writer-buffer-memory |
256 mb | MemorySize | Weight of writer buffer in managed memory, Flink will compute the memory size for writer according to the weight, the actual memory used depends on the running environment. |
sink.operator-uid.suffix |
(none) | String | Set the uid suffix for the writer, dynamic bucket assigner and committer operators. The uid format is ${UID_PREFIX}_${TABLE_NAME}_${USER_UID_SUFFIX}. If the uid suffix is not set, flink will automatically generate the operator uid, which may be incompatible when the topology changes. |
sink.parallelism |
(none) | Integer | Defines a custom parallelism for the sink. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration. |
sink.savepoint.auto-tag |
false | Boolean | If true, a tag will be automatically created for the snapshot created by flink savepoint. |
sink.use-managed-memory-allocator |
false | Boolean | If true, flink sink will use managed memory for merge tree; otherwise, it will create an independent memory allocator. |
source.checkpoint-align.enabled |
false | Boolean | Whether to align the flink checkpoint with the snapshot of the paimon table, If true, a checkpoint will only be made if a snapshot is consumed. |
source.checkpoint-align.timeout |
30 s | Duration | If the new snapshot has not been generated when the checkpoint starts to trigger, the enumerator will block the checkpoint and wait for the new snapshot. Set the maximum waiting time to avoid infinite waiting, if timeout, the checkpoint will fail. Note that it should be set smaller than the checkpoint timeout. |
source.operator-uid.suffix |
(none) | String | Set the uid suffix for the source operators. After setting, the uid format is ${UID_PREFIX}_${TABLE_NAME}_${USER_UID_SUFFIX}. If the uid suffix is not set, flink will automatically generate the operator uid, which may be incompatible when the topology changes. |
streaming-read.shuffle-bucket-with-partition |
true | Boolean | Whether shuffle by partition and bucket when streaming read. |
unaware-bucket.compaction.parallelism |
(none) | Integer | Defines a custom parallelism for the unaware-bucket table compaction job. By default, if this option is not defined, the planner will derive the parallelism for each statement individually by also considering the global configuration. |
SparkCatalogOptions #
Spark catalog options for paimon.
Key | Default | Type | Description |
---|---|---|---|
catalog.create-underlying-session-catalog |
false | Boolean | If true, create and use an underlying session catalog instead of default session catalog when use SparkGenericCatalog. |
defaultDatabase |
"default" | String | The default database name. |
SparkConnectorOptions #
Spark connector options for paimon.
Key | Default | Type | Description |
---|---|---|---|
read.changelog |
false | Boolean | Whether to read row in the form of changelog (add rowkind column in row to represent its change type). |
read.stream.maxBytesPerTrigger |
(none) | Long | The maximum number of bytes returned in a single batch. |
read.stream.maxFilesPerTrigger |
(none) | Integer | The maximum number of files returned in a single batch. |
read.stream.maxRowsPerTrigger |
(none) | Long | The maximum number of rows returned in a single batch. |
read.stream.maxTriggerDelayMs |
(none) | Long | The maximum delay between two adjacent batches, which used to create MinRowsReadLimit with read.stream.minRowsPerTrigger together. |
read.stream.minRowsPerTrigger |
(none) | Long | The minimum number of rows returned in a single batch, which used to create MinRowsReadLimit with read.stream.maxTriggerDelayMs together. |
write.merge-schema |
false | Boolean | If true, merge the data schema and the table schema automatically before write data. |
write.merge-schema.explicit-cast |
false | Boolean | If true, allow to merge data types if the two types meet the rules for explicit casting. |
ORC Options #
Key | Default | Type | Description |
---|---|---|---|
orc.column.encoding.direct |
(none) | Integer | Comma-separated list of fields for which dictionary encoding is to be skipped in orc. |
orc.dictionary.key.threshold |
0.8 | Double | If the number of distinct keys in a dictionary is greater than this fraction of the total number of non-null rows, turn off dictionary encoding in orc. Use 0 to always disable dictionary encoding. Use 1 to always use dictionary encoding. |
RocksDB Options #
The following options allow users to finely adjust RocksDB for better performance. You can either specify them in table properties or in dynamic table hints.
Key | Default | Type | Description |
---|---|---|---|
lookup.cache-rows |
10000 | Long | The maximum number of rows to store in the cache. |
lookup.continuous.discovery-interval |
(none) | Duration | The discovery interval of lookup continuous reading. This is used as an SQL hint. If it's not configured, the lookup function will fallback to 'continuous.discovery-interval'. |
rocksdb.block.blocksize |
4 kb | MemorySize | The approximate size (in bytes) of user data packed per block. The default blocksize is '4KB'. |
rocksdb.block.cache-size |
128 mb | MemorySize | The amount of the cache for data blocks in RocksDB. |
rocksdb.block.metadata-blocksize |
4 kb | MemorySize | Approximate size of partitioned metadata packed per block. Currently applied to indexes block when partitioned index/filters option is enabled. The default blocksize is '4KB'. |
rocksdb.bloom-filter.bits-per-key |
10.0 | Double | Bits per key that bloom filter will use, this only take effect when bloom filter is used. The default value is 10.0. |
rocksdb.bloom-filter.block-based-mode |
false | Boolean | If true, RocksDB will use block-based filter instead of full filter, this only take effect when bloom filter is used. The default value is 'false'. |
rocksdb.compaction.level.max-size-level-base |
256 mb | MemorySize | The upper-bound of the total size of level base files in bytes. The default value is '256MB'. |
rocksdb.compaction.level.target-file-size-base |
64 mb | MemorySize | The target file size for compaction, which determines a level-1 file size. The default value is '64MB'. |
rocksdb.compaction.level.use-dynamic-size |
false | Boolean | If true, RocksDB will pick target size of each level dynamically. From an empty DB, RocksDB would make last level the base level, which means merging L0 data into the last level, until it exceeds max_bytes_for_level_base. And then repeat this process for second last level and so on. The default value is 'false'. For more information, please refer to RocksDB's doc. |
rocksdb.compaction.style |
LEVEL | Enum |
The specified compaction style for DB. Candidate compaction style is LEVEL, FIFO, UNIVERSAL or NONE, and Flink chooses 'LEVEL' as default style. Possible values:
|
rocksdb.compression.type |
LZ4_COMPRESSION | Enum |
The compression type. Possible values:
|
rocksdb.files.open |
-1 | Integer | The maximum number of open files (per stateful operator) that can be used by the DB, '-1' means no limit. The default value is '-1'. |
rocksdb.thread.num |
2 | Integer | The maximum number of concurrent background flush and compaction jobs (per stateful operator). The default value is '2'. |
rocksdb.use-bloom-filter |
false | Boolean | If true, every newly created SST file will contain a Bloom filter. It is disabled by default. |
rocksdb.writebuffer.count |
2 | Integer | The maximum number of write buffers that are built up in memory. The default value is '2'. |
rocksdb.writebuffer.number-to-merge |
1 | Integer | The minimum number of write buffers that will be merged together before writing to storage. The default value is '1'. |
rocksdb.writebuffer.size |
64 mb | MemorySize | The amount of data built up in memory (backed by an unsorted log on disk) before converting to a sorted on-disk files. The default writebuffer size is '64MB'. |