public static class FormatReaderMapping.Builder extends Object
FormatReaderMapping
.Constructor and Description |
---|
Builder(FileFormatDiscover formatDiscover,
List<DataField> readTableFields,
java.util.function.Function<TableSchema,List<DataField>> fieldsExtractor,
List<Predicate> filters) |
Modifier and Type | Method and Description |
---|---|
FormatReaderMapping |
build(String formatIdentifier,
TableSchema tableSchema,
TableSchema dataSchema)
There are three steps here to build
FormatReaderMapping : |
public Builder(FileFormatDiscover formatDiscover, List<DataField> readTableFields, java.util.function.Function<TableSchema,List<DataField>> fieldsExtractor, @Nullable List<Predicate> filters)
public FormatReaderMapping build(String formatIdentifier, TableSchema tableSchema, TableSchema dataSchema)
FormatReaderMapping
:
1. Calculate the readDataFields, which is what we intend to read from the data schema. Meanwhile, generate the indexCastMapping, which is used to map the index of the readDataFields to the index of the data schema.
2. Calculate the mapping to trim _KEY_ fields. For example: we want _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g from the data, but actually we don't need to read _KEY_a and a, _KEY_b and b the same time, so we need to trim them. So we mapping it: read before: _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g read after: a, b, _FIELD_SEQUENCE, _ROW_KIND, c, d, e, f, g and the mapping is [0,1,2,3,0,1,4,5,6,7,8], it converts the [read after] columns to [read before] columns.
3. We want read much fewer fields than readDataFields, so we kick out the partition fields. We generate the partitionMappingAndFieldsWithoutPartitionPair which helps reduce the real read fields and tell us how to map it back.
Copyright © 2023–2025 The Apache Software Foundation. All rights reserved.