This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Data Types
Data Types #
A data type describes the logical type of a value in the table ecosystem. It can be used to declare input and/or output types of operations.
All data types supported by Paimon are as follows:
DataType | Description |
---|---|
BOOLEAN |
Data type of a boolean with a (possibly) three-valued logic of TRUE, FALSE, and UNKNOWN. |
CHAR CHAR(n)
|
Data type of a fixed-length character string. The type can be declared using CHAR(n) where n is the number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.
|
VARCHAR VARCHAR(n) STRING
|
Data type of a variable-length character string. The type can be declared using VARCHAR(n) where n is the maximum number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. STRING is a synonym for VARCHAR(2147483647).
|
BINARY BINARY(n) |
Data type of a fixed-length binary string (=a sequence of bytes). The type can be declared using BINARY(n) where n is the number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.
|
VARBINARY VARBINARY(n) BYTES
|
Data type of a variable-length binary string (=a sequence of bytes). The type can be declared using VARBINARY(n) where n is the maximum number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. BYTES is a synonym for VARBINARY(2147483647).
|
DECIMAL DECIMAL(p) DECIMAL(p, s)
|
Data type of a decimal number with fixed precision and scale. The type can be declared using DECIMAL(p, s) where p is the number of digits in a number (precision) and s is the number of digits to the right of the decimal point in a number (scale). p must have a value between 1 and 38 (both inclusive). s must have a value between 0 and p (both inclusive). The default value for p is 10. The default value for s is 0.
|
TINYINT |
Data type of a 1-byte signed integer with values from -128 to 127. |
SMALLINT |
Data type of a 2-byte signed integer with values from -32,768 to 32,767. |
INT |
Data type of a 4-byte signed integer with values from -2,147,483,648 to 2,147,483,647. |
BIGINT |
Data type of an 8-byte signed integer with values from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. |
FLOAT |
Data type of a 4-byte single precision floating point number. Compared to the SQL standard, the type does not take parameters.
|
DOUBLE |
Data type of an 8-byte double precision floating point number. |
DATE |
Data type of a date consisting of year-month-day with values ranging from 0000-01-01 to 9999-12-31. Compared to the SQL standard, the range starts at year 0000.
|
TIME TIME(p)
|
Data type of a time without time zone consisting of hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 00:00:00.000000000 to 23:59:59.999999999. The type can be declared using TIME(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 0.
|
TIMESTAMP TIMESTAMP(p)
|
Data type of a timestamp without time zone consisting of year-month-day hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999. The type can be declared using TIMESTAMP(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6.
|
TIMESTAMP WITH TIME ZONE TIMESTAMP(p) WITH TIME ZONE
|
Data type of a timestamp with time zone consisting of year-month-day hour:minute:second[.fractional] zone with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 +14:59 to 9999-12-31 23:59:59.999999999 -14:59. This type fills the gap between time zone free and time zone mandatory timestamp types by allowing the interpretation of UTC timestamps according to the configured session time zone. A conversion from and to int describes the number of seconds since epoch. A conversion from and to long describes the number of milliseconds since epoch.
|
ARRAY<t> |
Data type of an array of elements with same subtype. Compared to the SQL standard, the maximum cardinality of an array cannot be specified but is fixed at 2,147,483,647. Also, any valid type is supported as a subtype. The type can be declared using ARRAY<t> where t is the data type of the contained elements.
|
MAP<kt, vt> |
Data type of an associative array that maps keys (including NULL) to values (including NULL). A map cannot contain duplicate keys; each key can map to at most one value. There is no restriction of element types; it is the responsibility of the user to ensure uniqueness. The type can be declared using MAP<kt, vt> where kt is the data type of the key elements and vt is the data type of the value elements.
|
MULTISET<t> |
Data type of a multiset (=bag). Unlike a set, it allows for multiple instances for each of its elements with a common subtype. Each unique value (including NULL) is mapped to some multiplicity. There is no restriction of element types; it is the responsibility of the user to ensure uniqueness. The type can be declared using MULTISET<t> where t is the data type of the contained elements.
|
ROW<n0 t0, n1 t1, ...> ROW<n0 t0 'd0', n1 t1 'd1', ...>
|
Data type of a sequence of fields. A field consists of a field name, field type, and an optional description. The most specific type of a row of a table is a row type. In this case, each column of the row corresponds to the field of the row type that has the same ordinal position as the column. Compared to the SQL standard, an optional field description simplifies the handling with complex structures. A row type is similar to the STRUCT type known from other non-standard-compliant frameworks. The type can be declared using ROW<n0 t0 'd0', n1 t1 'd1', ...> where n is the unique name of a field, t is the logical type of a field, d is the description of a field.
|