@Public public interface ReadBuilder extends Serializable
TableScan
and TableRead
.
Example of distributed reading:
// 1. Create a ReadBuilder (Serializable)
Table table = catalog.getTable(...);
ReadBuilder builder = table.newReadBuilder()
.withFilter(...)
.withReadType(...);
// 2. Plan splits in 'Coordinator' (or named 'Driver'):
List<Split> splits = builder.newScan().plan().splits();
// 3. Distribute these splits to different tasks
// 4. Read a split in task
TableRead read = builder.newRead();
RecordReader<InternalRow> reader = read.createReader(split);
reader.forEachRemaining(...);
newStreamScan()
will create a stream scan, which can perform continuously planning:
TableScan scan = builder.newStreamScan();
while (true) {
List<Split> splits = scan.plan().splits();
...
}
NOTE: InternalRow
cannot be saved in memory. It may be reused internally, so you need
to convert it into your own data structure or copy it.
Modifier and Type | Method and Description |
---|---|
ReadBuilder |
dropStats()
Delete stats in scan plan result.
|
TableRead |
newRead()
|
TableScan |
newScan()
Create a
TableScan to perform batch planning. |
StreamTableScan |
newStreamScan()
Create a
TableScan to perform streaming planning. |
RowType |
readType()
Returns read row type.
|
String |
tableName()
A name to identify the table.
|
ReadBuilder |
withBucketFilter(Filter<Integer> bucketFilter)
Push bucket filter.
|
default ReadBuilder |
withFilter(List<Predicate> predicates)
Apply filters to the readers to decrease the number of produced records.
|
ReadBuilder |
withFilter(Predicate predicate)
Push filters, will filter the data as much as possible, but it is not guaranteed that it is a
complete filter.
|
ReadBuilder |
withLimit(int limit)
the row number pushed down.
|
ReadBuilder |
withPartitionFilter(Map<String,String> partitionSpec)
Push partition filter.
|
ReadBuilder |
withProjection(int[] projection)
Apply projection to the reader, if you need nested row pruning, use
withReadType(RowType) instead. |
default ReadBuilder |
withProjection(int[][] projection)
Deprecated.
|
ReadBuilder |
withReadType(RowType readType)
Push read row type to the reader, support nested row pruning.
|
ReadBuilder |
withShard(int indexOfThisSubtask,
int numberOfParallelSubtasks)
Specify the shard to be read, and allocate sharded files to read records.
|
String tableName()
RowType readType()
default ReadBuilder withFilter(List<Predicate> predicates)
This interface filters records as much as possible, however some produced records may not satisfy all predicates. Users need to recheck all records.
ReadBuilder withFilter(Predicate predicate)
ReadBuilder withPartitionFilter(Map<String,String> partitionSpec)
ReadBuilder withBucketFilter(Filter<Integer> bucketFilter)
withShard(int, int)
.
Reason: Bucket filtering and sharding are different logical mechanisms for selecting subsets of table data. Applying both methods simultaneously introduces conflicting selection criteria.
ReadBuilder withReadType(RowType readType)
readType
- read row type, can be a pruned type from Table.rowType()
ReadBuilder withProjection(int[] projection)
withReadType(RowType)
instead.@Deprecated default ReadBuilder withProjection(int[][] projection)
ReadBuilder withLimit(int limit)
ReadBuilder withShard(int indexOfThisSubtask, int numberOfParallelSubtasks)
withBucketFilter(Filter)
.
Reason: Sharding and bucket filtering are different logical mechanisms for selecting subsets of table data. Applying both methods simultaneously introduces conflicting selection criteria.
ReadBuilder dropStats()
StreamTableScan newStreamScan()
TableScan
to perform streaming planning.TableRead newRead()
Copyright © 2023–2024 The Apache Software Foundation. All rights reserved.