T
- The type of objects written by the constructed ParquetWriter.SELF
- The type of this builder that is returned by builder methodspublic abstract static class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>> extends Object
Object models should extend this builder to provide writer configuration options.
Modifier | Constructor and Description |
---|---|
protected |
Builder(org.apache.parquet.io.OutputFile path) |
Modifier and Type | Method and Description |
---|---|
ParquetWriter<T> |
build()
Build a
ParquetWriter with the accumulated configuration. |
SELF |
config(String property,
String value)
Set a property that will be available to the read path.
|
SELF |
enableDictionaryEncoding()
Enables dictionary encoding for the constructed writer.
|
SELF |
enablePageWriteChecksum()
Enables writing page level checksums for the constructed writer.
|
SELF |
enableValidation()
Enables validation for the constructed writer.
|
protected abstract org.apache.parquet.hadoop.api.WriteSupport<T> |
getWriteSupport(org.apache.hadoop.conf.Configuration conf) |
protected abstract SELF |
self() |
SELF |
withBloomFilterEnabled(boolean enabled)
Sets the bloom filter enabled/disabled.
|
SELF |
withBloomFilterEnabled(String columnPath,
boolean enabled)
Sets the bloom filter enabled/disabled for the specified column.
|
SELF |
withBloomFilterFPP(String columnPath,
double fpp) |
SELF |
withBloomFilterNDV(String columnPath,
long ndv)
Sets the NDV (number of distinct values) for the specified column.
|
SELF |
withByteStreamSplitEncoding(boolean enableByteStreamSplit) |
SELF |
withColumnIndexTruncateLength(int length)
Sets the length to be used for truncating binary values in a binary column index.
|
SELF |
withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
Set the
compression codec used by the constructed writer. |
SELF |
withConf(org.apache.hadoop.conf.Configuration conf)
Set the
Configuration used by the constructed writer. |
SELF |
withDictionaryEncoding(boolean enableDictionary)
Enable or disable dictionary encoding for the constructed writer.
|
SELF |
withDictionaryEncoding(String columnPath,
boolean enableDictionary)
Enable or disable dictionary encoding of the specified column for the constructed writer.
|
SELF |
withDictionaryPageSize(int dictionaryPageSize)
Set the Parquet format dictionary page size used by the constructed writer.
|
SELF |
withMaxPaddingSize(int maxPaddingSize)
Set the maximum amount of padding, in bytes, that will be used to align row groups with
blocks in the underlying filesystem.
|
SELF |
withMaxRowCountForPageSizeCheck(int max)
Sets the maximum number of rows to write before a page size check is done.
|
SELF |
withMinRowCountForPageSizeCheck(int min)
Sets the minimum number of rows to write before a page size check is done.
|
SELF |
withPageRowCountLimit(int rowCount)
Sets the Parquet format page row count limit used by the constructed writer.
|
SELF |
withPageSize(int pageSize)
Set the Parquet format page size used by the constructed writer.
|
SELF |
withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
Enables writing page level checksums for the constructed writer.
|
SELF |
withRowGroupSize(int rowGroupSize)
Deprecated.
Use
withRowGroupSize(long) instead |
SELF |
withRowGroupSize(long rowGroupSize)
Set the Parquet format row group size used by the constructed writer.
|
SELF |
withStatisticsTruncateLength(int length)
Sets the length which the min/max binary values in row groups are truncated to.
|
SELF |
withValidation(boolean enableValidation)
Enable or disable validation for the constructed writer.
|
SELF |
withWriteMode(org.apache.parquet.hadoop.ParquetFileWriter.Mode mode)
Set the
write mode used when creating the backing file for
this writer. |
SELF |
withWriterVersion(org.apache.parquet.column.ParquetProperties.WriterVersion version)
Set the
format version used by the constructed writer. |
protected abstract SELF self()
protected abstract org.apache.parquet.hadoop.api.WriteSupport<T> getWriteSupport(org.apache.hadoop.conf.Configuration conf)
conf
- a configurationpublic SELF withConf(org.apache.hadoop.conf.Configuration conf)
Configuration
used by the constructed writer.conf
- a Configuration
public SELF withWriteMode(org.apache.parquet.hadoop.ParquetFileWriter.Mode mode)
write mode
used when creating the backing file for
this writer.mode
- a ParquetFileWriter.Mode
public SELF withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
compression codec
used by the constructed writer.codecName
- a CompressionCodecName
@Deprecated public SELF withRowGroupSize(int rowGroupSize)
withRowGroupSize(long)
insteadrowGroupSize
- an integer size in bytespublic SELF withRowGroupSize(long rowGroupSize)
rowGroupSize
- an integer size in bytespublic SELF withPageSize(int pageSize)
pageSize
- an integer size in bytespublic SELF withPageRowCountLimit(int rowCount)
rowCount
- limit for the number of rows stored in a pagepublic SELF withDictionaryPageSize(int dictionaryPageSize)
dictionaryPageSize
- an integer size in bytespublic SELF withMaxPaddingSize(int maxPaddingSize)
maxPaddingSize
- an integer size in bytespublic SELF enableDictionaryEncoding()
public SELF withDictionaryEncoding(boolean enableDictionary)
enableDictionary
- whether dictionary encoding should be enabledpublic SELF withByteStreamSplitEncoding(boolean enableByteStreamSplit)
public SELF withDictionaryEncoding(String columnPath, boolean enableDictionary)
columnPath
- the path of the column (dot-string)enableDictionary
- whether dictionary encoding should be enabledpublic SELF enableValidation()
public SELF withValidation(boolean enableValidation)
enableValidation
- whether validation should be enabledpublic SELF withWriterVersion(org.apache.parquet.column.ParquetProperties.WriterVersion version)
format version
used by the constructed writer.version
- a WriterVersion
public SELF enablePageWriteChecksum()
public SELF withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
enablePageWriteChecksum
- whether page checksums should be written outpublic SELF withBloomFilterNDV(String columnPath, long ndv)
columnPath
- the path of the column (dot-string)ndv
- the NDV of the columnpublic SELF withBloomFilterEnabled(boolean enabled)
enabled
- whether to write bloom filterspublic SELF withBloomFilterEnabled(String columnPath, boolean enabled)
withBloomFilterEnabled(boolean)
.columnPath
- the path of the column (dot-string)enabled
- whether to write bloom filter for the columnpublic SELF withMinRowCountForPageSizeCheck(int min)
min
- writes at least `min` rows before invoking a page size checkpublic SELF withMaxRowCountForPageSizeCheck(int max)
max
- makes a page size check after `max` rows have been writtenpublic SELF withColumnIndexTruncateLength(int length)
length
- the length to truncate topublic SELF withStatisticsTruncateLength(int length)
length
- the length to truncate topublic SELF config(String property, String value)
property
- a String property namevalue
- a String property valuepublic ParquetWriter<T> build() throws IOException
ParquetWriter
with the accumulated configuration.ParquetWriter
instance.IOException
- if there is an error while creating the writerCopyright © 2023–2024 The Apache Software Foundation. All rights reserved.