Creating Catalogs

Creating Catalogs #

Paimon catalogs currently support two types of metastores:

  • filesystem metastore (default), which stores both metadata and table files in filesystems.
  • hive metastore, which additionally stores metadata in Hive metastore. Users can directly access the tables from Hive.

See CatalogOptions for detailed options when creating a catalog.

Creating a Catalog with Filesystem Metastore #

The following Flink SQL registers and uses a Paimon catalog named my_catalog. Metadata and table files are stored under hdfs://path/to/warehouse.

CREATE CATALOG my_catalog WITH (
    'type' = 'paimon',
    'warehouse' = 'hdfs://path/to/warehouse'
);

USE CATALOG my_catalog;

You can define any default table options with the prefix table-default. for tables created in the catalog.

The following shell command registers a paimon catalog named paimon. Metadata and table files are stored under hdfs://path/to/warehouse.

spark-sql ... \
    --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
    --conf spark.sql.catalog.paimon.warehouse=hdfs://path/to/warehouse

You can define any default table options with the prefix spark.sql.catalog.paimon.table-default. for tables created in the catalog.

After spark-sql is started, you can switch to the default database of the paimon catalog with the following SQL.

USE paimon.default;

Creating a Catalog with Hive Metastore #

By using Paimon Hive catalog, changes to the catalog will directly affect the corresponding Hive metastore. Tables created in such catalog can also be accessed directly from Hive.

To use Hive catalog, Database name, Table name and Field names should be lower case.

Paimon Hive catalog in Flink relies on Flink Hive connector bundled jar. You should first download Flink Hive connector bundled jar and add it to classpath. See here for more info.

The following Flink SQL registers and uses a Paimon Hive catalog named my_hive. Metadata and table files are stored under hdfs://path/to/warehouse. In addition, metadata is also stored in Hive metastore.

If your Hive requires security authentication such as Kerberos, LDAP, Ranger and so on. You can specify the hive-conf-dir parameter to the hive-site.xml file path.

CREATE CATALOG my_hive WITH (
    'type' = 'paimon',
    'metastore' = 'hive',
    'uri' = 'thrift://<hive-metastore-host-name>:<port>',
    'warehouse' = 'hdfs://path/to/warehouse'
);

USE CATALOG my_hive;

You can define any default table options with the prefix table-default. for tables created in the catalog.

Your Spark installation should be able to detect, or already contains Hive dependencies. See here for more information.

The following shell command registers a Paimon Hive catalog named paimon. Metadata and table files are stored under hdfs://path/to/warehouse. In addition, metadata is also stored in Hive metastore.

spark-sql ... \
    --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
    --conf spark.sql.catalog.paimon.warehouse=hdfs://path/to/warehouse \
    --conf spark.sql.catalog.paimon.metastore=hive \
    --conf spark.sql.catalog.paimon.uri=thrift://<hive-metastore-host-name>:<port>

You can define any default table options with the prefix spark.sql.catalog.paimon.table-default. for tables created in the catalog.

After spark-sql is started, you can switch to the default database of the paimon catalog with the following SQL.

USE paimon.default;