@Public public interface ReadBuilder extends Serializable
TableScan
and TableRead
.
Example of distributed reading:
// 1. Create a ReadBuilder (Serializable)
Table table = catalog.getTable(...);
ReadBuilder builder = table.newReadBuilder()
.withFilter(...)
.withProjection(...);
// 2. Plan splits in 'Coordinator' (or named 'Driver'):
List<Split> splits = builder.newScan().plan().splits();
// 3. Distribute these splits to different tasks
// 4. Read a split in task
TableRead read = builder.newRead();
RecordReader<InternalRow> reader = read.createReader(split);
reader.forEachRemaining(...);
newStreamScan()
will create a stream scan, which can perform continuously planning:
TableScan scan = builder.newStreamScan();
while (true) {
List<Split> splits = scan.plan().splits();
...
}
NOTE: InternalRow
cannot be saved in memory. It may be reused internally, so you need
to convert it into your own data structure or copy it.
Modifier and Type | Method and Description |
---|---|
TableRead |
newRead()
|
TableScan |
newScan()
Create a
TableScan to perform batch planning. |
StreamTableScan |
newStreamScan()
Create a
TableScan to perform streaming planning. |
RowType |
readType()
Returns read row type, projected by
withProjection(int[]) . |
String |
tableName()
A name to identify the table.
|
ReadBuilder |
withBucketFilter(Filter<Integer> bucketFilter)
Push bucket filter.
|
default ReadBuilder |
withFilter(List<Predicate> predicates)
Apply filters to the readers to decrease the number of produced records.
|
ReadBuilder |
withFilter(Predicate predicate)
Push filters, will filter the data as much as possible, but it is not guaranteed that it is a
complete filter.
|
ReadBuilder |
withLimit(int limit)
the row number pushed down.
|
ReadBuilder |
withPartitionFilter(Map<String,String> partitionSpec)
Push partition filter.
|
default ReadBuilder |
withProjection(int[] projection)
Apply projection to the reader.
|
ReadBuilder |
withProjection(int[][] projection)
Push nested projection.
|
ReadBuilder |
withShard(int indexOfThisSubtask,
int numberOfParallelSubtasks)
Specify the shard to be read, and allocate sharded files to read records.
|
String tableName()
RowType readType()
withProjection(int[])
.default ReadBuilder withFilter(List<Predicate> predicates)
This interface filters records as much as possible, however some produced records may not satisfy all predicates. Users need to recheck all records.
ReadBuilder withFilter(Predicate predicate)
ReadBuilder withPartitionFilter(Map<String,String> partitionSpec)
ReadBuilder withBucketFilter(Filter<Integer> bucketFilter)
withShard(int, int)
.
Reason: Bucket filtering and sharding are different logical mechanisms for selecting subsets of table data. Applying both methods simultaneously introduces conflicting selection criteria.
default ReadBuilder withProjection(int[] projection)
NOTE: Nested row projection is currently not supported.
ReadBuilder withProjection(int[][] projection)
[[0, 2, 1], ...]
specifies to include the 2nd
field of the 3rd field of the 1st field in the top-level row.ReadBuilder withLimit(int limit)
ReadBuilder withShard(int indexOfThisSubtask, int numberOfParallelSubtasks)
withBucketFilter(Filter)
.
Reason: Sharding and bucket filtering are different logical mechanisms for selecting subsets of table data. Applying both methods simultaneously introduces conflicting selection criteria.
StreamTableScan newStreamScan()
TableScan
to perform streaming planning.TableRead newRead()
Copyright © 2023–2024 The Apache Software Foundation. All rights reserved.