This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
FUSE Support #
When using PyPaimon REST Catalog to access remote object storage (such as OSS, S3, or HDFS), data access typically goes through remote storage SDKs. However, in scenarios where remote storage paths are mounted locally via FUSE (Filesystem in Userspace), users can access data directly through local filesystem paths for better performance.
This feature enables PyPaimon to use local file access when FUSE mount is available, bypassing remote storage SDKs.
Configuration #
| Option | Type | Default | Description |
|---|---|---|---|
fuse.enabled |
Boolean | false |
Whether to enable FUSE local path mapping |
fuse.root |
String | (none) | FUSE mounted local root path, e.g., /mnt/fuse/warehouse |
fuse.validation-mode |
String | strict |
Validation mode: strict, warn, or none |
Usage #
from pypaimon import CatalogFactory
catalog_options = {
'metastore': 'rest',
'uri': 'http://rest-server:8080',
'warehouse': 'oss://my-catalog/',
'token.provider': 'xxx',
# FUSE local path configuration
'fuse.enabled': 'true',
'fuse.root': '/mnt/fuse/warehouse',
'fuse.validation-mode': 'strict'
}
catalog = CatalogFactory.create(catalog_options)
Validation Modes #
Validation is performed on first data access to verify FUSE mount correctness. The validation-mode controls behavior when the local path does not exist:
| Mode | Behavior | Use Case |
|---|---|---|
strict |
Throw exception, block operation | Production, safety first |
warn |
Log warning, fallback to default FileIO | Testing, compatibility first |
none |
Skip validation, use directly | Trusted environment, performance first |
Note: Configuration errors (e.g., fuse.enabled=true but fuse.root not configured) will throw exceptions directly, regardless of validation mode.
How It Works #
- When
fuse.enabled=true, PyPaimon attempts to use local file access - On first data access, validation is triggered (unless mode is
none) - Validation fetches the
defaultdatabase location and converts it to local path - If local path exists, subsequent data access uses
FuseLocalFileIO - Path translation uses database/table logical names: remote path
oss://<catalog-id>/<db-id>/<table-id>→ local path<root>/<db-name>/<table-name> - If validation fails, behavior depends on
validation-mode
Example Scenario #
Assume you have:
- Remote storage paths use UUIDs:
oss://clg-paimon-xxx/db-xxx/tbl-xxx - FUSE mount:
/mnt/fuse/warehouse(mounted topvfs://demo_catalog) - FUSE exposes logical names:
/mnt/fuse/warehouse/my_db/my_table
from pypaimon import CatalogFactory
catalog = CatalogFactory.create({
'metastore': 'rest',
'uri': 'http://rest-server:8080',
'warehouse': 'oss://my-catalog/',
'fuse.enabled': 'true',
'fuse.root': '/mnt/fuse/warehouse',
'fuse.validation-mode': 'none'
})
# When reading table 'my_db.my_table', PyPaimon will:
# 1. Convert "oss://clg-paimon-xxx/db-xxx/tbl-xxx" to "/mnt/fuse/warehouse/my_db/my_table"
# 2. Use FuseLocalFileIO to read from local path
table = catalog.get_table('my_db.my_table')
reader = table.new_read_builder().new_read()
Limitations #
- Only catalog-level FUSE mount is supported (single
fuse.rootconfiguration) - Validation only checks if local path exists, not data consistency
- If FUSE mount becomes unavailable after validation, file operations may fail