ParquetFileReader (Paimon : 1.1-SNAPSHOT API)

java.lang.Object
- org.apache.parquet.hadoop.ParquetFileReader

All Implemented Interfaces:

Closeable, AutoCloseable
```
public class ParquetFileReader
extends Object
implements Closeable
```
Internal implementation of the Parquet file reader as a block container.
NOTE: The file was copied and modified to support VectoredReadable.

Field Summary

Fields
Modifier and Type Field and Description

protected ParquetInputStream f

Fields
Modifier and Type	Field and Description
`protected ParquetInputStream`	`f`

Constructor Summary

Constructors
Constructor and Description
`ParquetFileReader(org.apache.parquet.io.InputFile file, org.apache.parquet.ParquetReadOptions options, FileIndexResult fileIndexResult)`

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`void`	`appendTo(org.apache.parquet.hadoop.ParquetFileWriter writer)`
`void`	`close()`
`org.apache.parquet.hadoop.BloomFilterReader`	`getBloomFilterDataReader(org.apache.parquet.hadoop.metadata.BlockMetaData block)`
`org.apache.parquet.hadoop.BloomFilterReader`	`getBloomFilterDataReader(int blockIndex)`
`org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore`	`getColumnIndexStore(int blockIndex)`
`org.apache.parquet.hadoop.DictionaryPageReader`	`getDictionaryReader(org.apache.parquet.hadoop.metadata.BlockMetaData block)`
`org.apache.parquet.hadoop.DictionaryPageReader`	`getDictionaryReader(int blockIndex)`
`String`	`getFile()`
`org.apache.parquet.hadoop.metadata.FileMetaData`	`getFileMetaData()`
`long`	`getFilteredRecordCount()`
`org.apache.parquet.hadoop.metadata.ParquetMetadata`	`getFooter()`
`org.apache.parquet.column.page.DictionaryPageReadStore`	`getNextDictionaryReader()` Returns a `DictionaryPageReadStore` for the row group that would be returned by calling `readNextRowGroup()` or skipped by calling `skipNextRowGroup()`.
`org.apache.hadoop.fs.Path`	`getPath()` Deprecated. will be removed in 2.0.0; use `getFile()` instead
`long`	`getRecordCount()`
`List<org.apache.parquet.hadoop.metadata.BlockMetaData>`	`getRowGroups()`
`org.apache.parquet.column.values.bloomfilter.BloomFilter`	`readBloomFilter(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta)` Reads Bloom filter data for the given column chunk.
`org.apache.parquet.internal.column.columnindex.ColumnIndex`	`readColumnIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column)`
`org.apache.parquet.column.page.PageReadStore`	`readFilteredRowGroup(int blockIndex)` Reads all the columns requested from the specified row group.
`org.apache.parquet.hadoop.ColumnChunkPageReadStore`	`readFilteredRowGroup(int blockIndex, RowRanges rowRanges)` Reads all the columns requested from the specified row group.
`org.apache.parquet.column.page.PageReadStore`	`readNextFilteredRowGroup()` Reads all the columns requested from the row group at the current file position.
`org.apache.parquet.column.page.PageReadStore`	`readNextRowGroup()` Reads all the columns requested from the row group at the current file position.
`org.apache.parquet.internal.column.columnindex.OffsetIndex`	`readOffsetIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column)`
`org.apache.parquet.column.page.PageReadStore`	`readRowGroup(int blockIndex)` Reads all the columns requested from the row group at the specified block.
`boolean`	`rowGroupsFiltered()`
`void`	`setRequestedSchema(org.apache.parquet.schema.MessageType projection)`
`boolean`	`skipNextRowGroup()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - f
```
protected final ParquetInputStream f
```
- Constructor Detail
  - ParquetFileReader
```
public ParquetFileReader(org.apache.parquet.io.InputFile file,
                         org.apache.parquet.ParquetReadOptions options,
                         FileIndexResult fileIndexResult)
                  throws IOException
```
    Throws:
    
    IOException
- Method Detail
  - getFooter
```
public org.apache.parquet.hadoop.metadata.ParquetMetadata getFooter()
```
  - getFileMetaData
```
public org.apache.parquet.hadoop.metadata.FileMetaData getFileMetaData()
```
  - getRecordCount
```
public long getRecordCount()
```
  - getFilteredRecordCount
```
public long getFilteredRecordCount()
```
  - getPath
```
@Deprecated
public org.apache.hadoop.fs.Path getPath()
```
    Deprecated. will be removed in 2.0.0; use getFile() instead
    
    Returns:
    
    the path for this file
  - getFile
```
public String getFile()
```
  - rowGroupsFiltered
```
public boolean rowGroupsFiltered()
```
  - getRowGroups
```
public List<org.apache.parquet.hadoop.metadata.BlockMetaData> getRowGroups()
```
  - setRequestedSchema
```
public void setRequestedSchema(org.apache.parquet.schema.MessageType projection)
```
  - appendTo
```
public void appendTo(org.apache.parquet.hadoop.ParquetFileWriter writer)
              throws IOException
```
    Throws:
    
    IOException
  - readRowGroup
```
public org.apache.parquet.column.page.PageReadStore readRowGroup(int blockIndex)
                                                          throws IOException
```
    Reads all the columns requested from the row group at the specified block.
    
    Parameters:
    
    blockIndex - the index of the requested block
    
    Returns:
    
    the PageReadStore which can provide PageReaders for each column.
    
    Throws:
    
    IOException - if an error occurs while reading
  - readNextRowGroup
```
public org.apache.parquet.column.page.PageReadStore readNextRowGroup()
                                                              throws IOException
```
    Reads all the columns requested from the row group at the current file position.
    
    Returns:
    
    the PageReadStore which can provide PageReaders for each column.
    
    Throws:
    
    IOException - if an error occurs while reading
  - readFilteredRowGroup
```
public org.apache.parquet.column.page.PageReadStore readFilteredRowGroup(int blockIndex)
                                                                  throws IOException
```
    Reads all the columns requested from the specified row group. It may skip specific pages based on the column indexes according to the actual filter. As the rows are not aligned among the pages of the different columns row synchronization might be required. See the documentation of the class SynchronizingColumnReader for details.
    
    Parameters:
    
    blockIndex - the index of the requested block
    
    Returns:
    
    the PageReadStore which can provide PageReaders for each column or null if there are no rows in this block
    
    Throws:
    
    IOException - if an error occurs while reading
  - readFilteredRowGroup
```
public org.apache.parquet.hadoop.ColumnChunkPageReadStore readFilteredRowGroup(int blockIndex,
                                                                               RowRanges rowRanges)
                                                                        throws IOException
```
    Reads all the columns requested from the specified row group. It may skip specific pages based on the rowRanges passed in. As the rows are not aligned among the pages of the different columns row synchronization might be required. See the documentation of the class SynchronizingColumnReader for details.
    
    Parameters:
    
    blockIndex - the index of the requested block
    
    rowRanges - the row ranges to be read from the requested block
    
    Returns:
    
    the PageReadStore which can provide PageReaders for each column or null if there are no rows in this block
    
    Throws:
    
    IOException - if an error occurs while reading
    
    IllegalArgumentException - if the blockIndex is invalid or the rowRanges is null
  - readNextFilteredRowGroup
```
public org.apache.parquet.column.page.PageReadStore readNextFilteredRowGroup()
                                                                      throws IOException
```
    Reads all the columns requested from the row group at the current file position. It may skip specific pages based on the column indexes according to the actual filter. As the rows are not aligned among the pages of the different columns row synchronization might be required. See the documentation of the class SynchronizingColumnReader for details.
    
    Returns:
    
    the PageReadStore which can provide PageReaders for each column
    
    Throws:
    
    IOException - if an error occurs while reading
  - getColumnIndexStore
```
public org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore getColumnIndexStore(int blockIndex)
```
  - skipNextRowGroup
```
public boolean skipNextRowGroup()
```
  - getNextDictionaryReader
```
public org.apache.parquet.column.page.DictionaryPageReadStore getNextDictionaryReader()
```
    Returns a DictionaryPageReadStore for the row group that would be returned by calling readNextRowGroup() or skipped by calling skipNextRowGroup().
    
    Returns:
    
    a DictionaryPageReadStore for the next row group
  - getDictionaryReader
```
public org.apache.parquet.hadoop.DictionaryPageReader getDictionaryReader(int blockIndex)
```
  - getDictionaryReader
```
public org.apache.parquet.hadoop.DictionaryPageReader getDictionaryReader(org.apache.parquet.hadoop.metadata.BlockMetaData block)
```
  - getBloomFilterDataReader
```
public org.apache.parquet.hadoop.BloomFilterReader getBloomFilterDataReader(int blockIndex)
```
  - getBloomFilterDataReader
```
public org.apache.parquet.hadoop.BloomFilterReader getBloomFilterDataReader(org.apache.parquet.hadoop.metadata.BlockMetaData block)
```
  - readBloomFilter
```
public org.apache.parquet.column.values.bloomfilter.BloomFilter readBloomFilter(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData meta)
                                                                         throws IOException
```
    Reads Bloom filter data for the given column chunk.
    
    Parameters:
    
    meta - a column's ColumnChunkMetaData to read the dictionary from
    
    Returns:
    
    an BloomFilter object.
    
    Throws:
    
    IOException - if there is an error while reading the Bloom filter.
  - readColumnIndex
```
@InterfaceAudience.Private
public org.apache.parquet.internal.column.columnindex.ColumnIndex readColumnIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column)
                                                                                                      throws IOException
```
    Parameters:
    
    column - the column chunk which the column index is to be returned for
    
    Returns:
    
    the column index for the specified column chunk or null if there is no index
    
    Throws:
    
    IOException - if any I/O error occurs during reading the file
  - readOffsetIndex
```
@InterfaceAudience.Private
public org.apache.parquet.internal.column.columnindex.OffsetIndex readOffsetIndex(org.apache.parquet.hadoop.metadata.ColumnChunkMetaData column)
                                                                                                      throws IOException
```
    Parameters:
    
    column - the column chunk which the offset index is to be returned for
    
    Returns:
    
    the offset index for the specified column chunk or null if there is no index
    
    Throws:
    
    IOException - if any I/O error occurs during reading the file
  - close
```
public void close()
           throws IOException
```
    Specified by:
    
    close in interface Closeable
    
    Specified by:
    
    close in interface AutoCloseable
    
    Throws:
    
    IOException

Back to Paimon Website

Class ParquetFileReader

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

f

Constructor Detail

ParquetFileReader

Method Detail

getFooter

getFileMetaData

getRecordCount

getFilteredRecordCount

getPath

getFile

rowGroupsFiltered

getRowGroups

setRequestedSchema

appendTo

readRowGroup

readNextRowGroup

readFilteredRowGroup

readFilteredRowGroup

readNextFilteredRowGroup

getColumnIndexStore

skipNextRowGroup

getNextDictionaryReader

getDictionaryReader

getDictionaryReader

getBloomFilterDataReader

getBloomFilterDataReader

readBloomFilter

readColumnIndex

readOffsetIndex

close

Back to Paimon Website