Skip to main content

Vector Storage

Overview

With the explosive growth of AI scenarios, vector storage has become increasingly important. Paimon provides optimized storage solutions specifically designed for vector data.

Paimon stores vector columns in dedicated files using the Vortex columnar format, which is optimized for vector workloads with high compression ratio and fast scan performance.

Vector Data Type

Paimon supports defining columns of type VECTOR<t, n>, which represents a fixed-length, dense vector column, where:

  • t: The element type. Supports: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE;
  • n: The vector dimension, must be a positive integer not exceeding 2,147,483,647.

Compared to variable-length arrays, dense vectors provide:

  • More natural semantic constraints, preventing mismatched lengths and null elements at the storage layer;
  • Better point-lookup performance, eliminating offset array storage and access;
  • Closer alignment with type representations in specialized vector engines, avoiding memory copies and type conversions.

Notes:

  • Columns of VECTOR type cannot be used as primary key columns, partition columns, or for sorting.
  • If a VECTOR value itself is not null, its elements are not allowed to be null.

Dedicated Vector File Storage

Paimon stores vector columns in separate .vector.vortex files within Data Evolution tables, keeping scalar and vector data independently optimized.

File layout:

table/
├── bucket-0/
│ ├── data-uuid-0.parquet # Scalar columns (id, name, ...)
│ ├── data-uuid-1.blob # Blob data
│ ├── data-uuid-2.vector.vortex # Vector columns in Vortex format
│ └── ...
├── manifest/
├── schema/
└── snapshot/
OptionDescription
vector.file.formatFile format for dedicated vector files. Set to vortex to enable dedicated vector storage.
vector.target-file-sizeTarget file size for vector files. Defaults to 10 * 'target-file-size'.
row-tracking.enabledMust be true for dedicated vector storage.
data-evolution.enabledMust be true for dedicated vector storage.

Create Table

The recommended way to create a vector table in SQL is to use the comment directive __VECTOR_FIELD;dim on the column. Paimon automatically converts the ARRAY type to VECTOR and registers the field in the vector-field option.

-- Comment directive: __VECTOR_FIELD;{dim}; optional comment
CREATE TABLE vector_table (
id BIGINT,
embed ARRAY<FLOAT> COMMENT '__VECTOR_FIELD;128; product embedding'
) WITH (
'vector.file.format' = 'vortex',
'row-tracking.enabled' = 'true',
'data-evolution.enabled' = 'true'
);

-- Multiple vector columns
CREATE TABLE multi_vector_table (
id BIGINT,
embed1 ARRAY<FLOAT> COMMENT '__VECTOR_FIELD;128',
embed2 ARRAY<FLOAT> COMMENT '__VECTOR_FIELD;768'
) WITH (
'vector.file.format' = 'vortex',
'row-tracking.enabled' = 'true',
'data-evolution.enabled' = 'true'
);

Adding a Vector Column

ALTER TABLE vector_table
ADD embed2 ARRAY<FLOAT> COMMENT '__VECTOR_FIELD;768; text embedding';

Write Data

INSERT INTO vector_table VALUES (1, ARRAY[1.0, 2.0, 3.0, ...]);

Read Data

SELECT id, embed FROM vector_table;