This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

DataFrame #

Paimon supports creating table, inserting data, and querying through the Spark DataFrame API.

Create Table #

You can specify table properties with option or set partition columns with partitionBy if needed.

val data: DataFrame = Seq((1, "x1", "p1"), (2, "x2", "p2")).toDF("a", "b", "pt")

data.write.format("paimon")
  .option("primary-key", "a,pt")
  .option("k1", "v1")
  .partitionBy("pt")
  .saveAsTable("test_tbl") // or .save("/path/to/default.db/test_tbl")

Insert #

Insert Into #

You can achieve INSERT INTO semantics by setting the mode to append (by default).

val data: DataFrame = ...

data.write.format("paimon")
  .mode("append")         // by default
  .insertInto("test_tbl") // or .saveAsTable("test_tbl") or .save("/path/to/default.db/test_tbl")

Note: insertInto ignores the column names and just uses position-based write, if you need to write by column name, use saveAsTable or save instead.

Insert Overwrite #

You can achieve INSERT OVERWRITE semantics by setting the mode to overwrite with insertInto.

It supports dynamic partition overwritten for partitioned table. To enable dynamic overwritten you need to set the Spark session configuration spark.sql.sources.partitionOverwriteMode to dynamic.

val data: DataFrame = ...

data.write.format("paimon")
  .mode("overwrite")
  .insertInto("test_tbl")

Replace Table #

You can achieve REPLACE TABLE semantics by setting the mode to overwrite with saveAsTable or save.

It first drops the existing table and then create a new one, so you need to specify the table’s properties or partition columns if needed.

val data: DataFrame = ...

data.write.format("paimon")
  .option("primary-key", "a,pt")
  .option("k1", "v1")
  .partitionBy("pt")
  .mode("overwrite")
  .saveAsTable("test_tbl") // or .save("/path/to/default.db/test_tbl")

Query #

spark.read.format("paimon")
  .table("t") // or .load("/path/to/default.db/test_tbl")
  .show()