SQL Query
This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.

SQL Query #

PyPaimon supports executing SQL queries on Paimon tables, powered by pypaimon-rust and DataFusion.

Installation #

SQL query support requires additional dependencies. Install them with:

pip install pypaimon[sql]

This will install pypaimon-rust and datafusion.

Usage #

Create a SQLContext, register one or more catalogs with their options, and run SQL queries.

Basic Query #

from pypaimon.sql import SQLContext

ctx = SQLContext()
ctx.register_catalog("paimon", {"warehouse": "/path/to/warehouse"})
ctx.set_current_catalog("paimon")
ctx.set_current_database("default")

# Execute SQL and get PyArrow Table
table = ctx.sql("SELECT * FROM my_table")
print(table)

# Convert to Pandas DataFrame
df = table.to_pandas()
print(df)

Table Reference Format #

The default catalog and default database can be configured via set_current_catalog() and set_current_database(), so you can reference tables in two ways:

# Direct table name (uses default database)
ctx.sql("SELECT * FROM my_table")

# Two-part: database.table
ctx.sql("SELECT * FROM mydb.my_table")

Filtering #

table = ctx.sql("""
    SELECT id, name, age 
    FROM users 
    WHERE age > 18 AND city = 'Beijing'
""")

Aggregation #

table = ctx.sql("""
    SELECT city, COUNT(*) AS cnt, AVG(age) AS avg_age
    FROM users
    GROUP BY city
    ORDER BY cnt DESC
""")

Join #

table = ctx.sql("""
    SELECT u.name, o.order_id, o.amount
    FROM users u
    JOIN orders o ON u.id = o.user_id
    WHERE o.amount > 100
""")

Subquery #

table = ctx.sql("""
    SELECT * FROM users
    WHERE id IN (
        SELECT user_id FROM orders
        WHERE amount > 1000
    )
""")

Cross-Database Query #

# Query a table in another database using two-part syntax
table = ctx.sql("""
    SELECT u.name, o.amount
    FROM default.users u
    JOIN analytics.orders o ON u.id = o.user_id
""")

Multi-Catalog Query #

SQLContext supports registering multiple catalogs for cross-catalog queries:

from pypaimon.sql import SQLContext

ctx = SQLContext()
ctx.register_catalog("a", {"warehouse": "/path/to/warehouse_a"})
ctx.register_catalog("b", {
    "metastore": "rest",
    "uri": "http://localhost:8080",
    "warehouse": "warehouse_b",
})
ctx.set_current_catalog("a")
ctx.set_current_database("default")

# Cross-catalog join
table = ctx.sql("""
    SELECT a_users.name, b_orders.amount
    FROM a.default.users AS a_users
    JOIN b.default.orders AS b_orders ON a_users.id = b_orders.user_id
""")

Supported SQL Syntax #

The SQL engine is powered by Apache DataFusion, which supports a rich set of SQL syntax including:

  • SELECT, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT
  • JOIN (INNER, LEFT, RIGHT, FULL, CROSS)
  • Subqueries and CTEs (WITH)
  • Aggregate functions (COUNT, SUM, AVG, MIN, MAX, etc.)
  • Window functions (ROW_NUMBER, RANK, LAG, LEAD, etc.)
  • UNION, INTERSECT, EXCEPT

For the full SQL reference, see the DataFusion SQL documentation.

Edit This Page
Copyright © 2025 The Apache Software Foundation. Apache Paimon, Paimon, and its feather logo are trademarks of The Apache Software Foundation.