public class ParquetFileReader extends Object implements Closeable
NOTE: The file was copied and modified to support VectoredReadable
.
Modifier and Type | Field and Description |
---|---|
protected ParquetInputStream |
f |
Constructor and Description |
---|
ParquetFileReader(org.apache.parquet.io.InputFile file,
org.apache.parquet.ParquetReadOptions options,
RoaringBitmap32 selection) |
Modifier and Type | Method and Description |
---|---|
void |
appendTo(org.apache.parquet.hadoop.ParquetFileWriter writer) |
void |
close() |
org.apache.parquet.hadoop.BloomFilterReader |
getBloomFilterDataReader(org.apache.parquet.hadoop.metadata.BlockMetaData block) |
org.apache.parquet.hadoop.BloomFilterReader |
getBloomFilterDataReader(int blockIndex) |
org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore |
getColumnIndexStore(int blockIndex) |
org.apache.parquet.hadoop.DictionaryPageReader |
getDictionaryReader(org.apache.parquet.hadoop.metadata.BlockMetaData block) |
org.apache.parquet.hadoop.DictionaryPageReader |
getDictionaryReader(int blockIndex) |
String |
getFile() |
org.apache.parquet.hadoop.metadata.FileMetaData |
getFileMetaData() |
long |
getFilteredRecordCount() |
org.apache.parquet.hadoop.metadata.ParquetMetadata |
getFooter() |
org.apache.parquet.column.page.DictionaryPageReadStore |
getNextDictionaryReader()
Returns a
DictionaryPageReadStore for the row group that would be returned by calling
readNextRowGroup() or skipped by calling skipNextRowGroup() . |
org.apache.hadoop.fs.Path |
getPath()
Deprecated.
will be removed in 2.0.0; use
getFile() instead |
long |
getRecordCount() |
List<org.apache.parquet.hadoop.metadata.BlockMetaData> |
getRowGroups() |
org.apache.parquet.column.values.bloomfilter.BloomFilter |
readBloomFilter(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta)
Reads Bloom filter data for the given column chunk.
|
org.apache.parquet.internal.column.columnindex.ColumnIndex |
readColumnIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column) |
org.apache.parquet.column.page.PageReadStore |
readFilteredRowGroup(int blockIndex)
Reads all the columns requested from the specified row group.
|
org.apache.parquet.hadoop.ColumnChunkPageReadStore |
readFilteredRowGroup(int blockIndex,
RowRanges rowRanges)
Reads all the columns requested from the specified row group.
|
org.apache.parquet.column.page.PageReadStore |
readNextFilteredRowGroup()
Reads all the columns requested from the row group at the current file position.
|
org.apache.parquet.column.page.PageReadStore |
readNextRowGroup()
Reads all the columns requested from the row group at the current file position.
|
org.apache.parquet.internal.column.columnindex.OffsetIndex |
readOffsetIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column) |
org.apache.parquet.column.page.PageReadStore |
readRowGroup(int blockIndex)
Reads all the columns requested from the row group at the specified block.
|
boolean |
rowGroupsFiltered() |
void |
setRequestedSchema(org.apache.parquet.schema.MessageType projection) |
boolean |
skipNextRowGroup() |
protected final ParquetInputStream f
public ParquetFileReader(org.apache.parquet.io.InputFile file, org.apache.parquet.ParquetReadOptions options, @Nullable RoaringBitmap32 selection) throws IOException
IOException
public org.apache.parquet.hadoop.metadata.ParquetMetadata getFooter()
public org.apache.parquet.hadoop.metadata.FileMetaData getFileMetaData()
public long getRecordCount()
public long getFilteredRecordCount()
@Deprecated public org.apache.hadoop.fs.Path getPath()
getFile()
insteadpublic String getFile()
public boolean rowGroupsFiltered()
public List<org.apache.parquet.hadoop.metadata.BlockMetaData> getRowGroups()
public void setRequestedSchema(org.apache.parquet.schema.MessageType projection)
public void appendTo(org.apache.parquet.hadoop.ParquetFileWriter writer) throws IOException
IOException
public org.apache.parquet.column.page.PageReadStore readRowGroup(int blockIndex) throws IOException
blockIndex
- the index of the requested blockIOException
- if an error occurs while readingpublic org.apache.parquet.column.page.PageReadStore readNextRowGroup() throws IOException
IOException
- if an error occurs while readingpublic org.apache.parquet.column.page.PageReadStore readFilteredRowGroup(int blockIndex) throws IOException
blockIndex
- the index of the requested blockIOException
- if an error occurs while readingpublic org.apache.parquet.hadoop.ColumnChunkPageReadStore readFilteredRowGroup(int blockIndex, RowRanges rowRanges) throws IOException
rowRanges
passed in. As the rows are not aligned among the pages of the
different columns row synchronization might be required. See the documentation of the class
SynchronizingColumnReader for details.blockIndex
- the index of the requested blockrowRanges
- the row ranges to be read from the requested blockIOException
- if an error occurs while readingIllegalArgumentException
- if the blockIndex
is invalid or the rowRanges
is nullpublic org.apache.parquet.column.page.PageReadStore readNextFilteredRowGroup() throws IOException
IOException
- if an error occurs while readingpublic org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore getColumnIndexStore(int blockIndex)
public boolean skipNextRowGroup()
public org.apache.parquet.column.page.DictionaryPageReadStore getNextDictionaryReader()
DictionaryPageReadStore
for the row group that would be returned by calling
readNextRowGroup()
or skipped by calling skipNextRowGroup()
.public org.apache.parquet.hadoop.DictionaryPageReader getDictionaryReader(int blockIndex)
public org.apache.parquet.hadoop.DictionaryPageReader getDictionaryReader(org.apache.parquet.hadoop.metadata.BlockMetaData block)
public org.apache.parquet.hadoop.BloomFilterReader getBloomFilterDataReader(int blockIndex)
public org.apache.parquet.hadoop.BloomFilterReader getBloomFilterDataReader(org.apache.parquet.hadoop.metadata.BlockMetaData block)
public org.apache.parquet.column.values.bloomfilter.BloomFilter readBloomFilter(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta) throws IOException
meta
- a column's ColumnChunkMetaData to read the dictionary fromIOException
- if there is an error while reading the Bloom filter.@InterfaceAudience.Private public org.apache.parquet.internal.column.columnindex.ColumnIndex readColumnIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column) throws IOException
column
- the column chunk which the column index is to be returned fornull
if there is no indexIOException
- if any I/O error occurs during reading the file@InterfaceAudience.Private public org.apache.parquet.internal.column.columnindex.OffsetIndex readOffsetIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column) throws IOException
column
- the column chunk which the offset index is to be returned fornull
if there is no indexIOException
- if any I/O error occurs during reading the filepublic void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
IOException
Copyright © 2023–2025 The Apache Software Foundation. All rights reserved.