This documentation is for an unreleased version of Apache Paimon. We recommend you use the latest stable version.
Functions #
Paimon introduces a Function abstraction designed to support functions in a standard format for compute engine, addressing:
-
Unified Column-Level Filtering and Processing: Facilitates operations at the column level, including tasks such as encryption and decryption of data.
-
Parameterized View Capabilities: Supports parameterized operations within views, enhancing the dynamism and usability of data retrieval processes.
Types of Functions Supported #
Currently, Paimon supports three types of functions:
-
File Function: Users can define functions within a file, providing flexibility and modular support for function definition.
-
Lambda Function: Empowering users to define functions using Java lambda expressions, enabling inline, concise, and functional-style operations.
-
SQL Function: Users can define functions directly within SQL, which integrates seamlessly with SQL-based data processing.
File Function Usage in Flink #
Paimon functions can be utilized within Apache Flink to execute complex data operations. Below are the SQL commands for creating, altering, and dropping functions in Flink environments.
Create Function #
To create a new function in Flink SQL:
-- Flink SQL
CREATE FUNCTION mydb.parse_str
AS 'com.streaming.flink.udf.StrUdf'
LANGUAGE JAVA
USING JAR 'oss://my_bucket/my_location/udf.jar' [, JAR 'oss://my_bucket/my_location/a.jar'];
This statement creates a Java-based user-defined function named parse_str
within the mydb
database, utilizing specified JAR files from an object storage location.
Alter Function #
To modify an existing function in Flink SQL:
-- Flink SQL
ALTER FUNCTION mydb.parse_str
AS 'com.streaming.flink.udf.StrUdf2'
LANGUAGE JAVA;
This command changes the implementation of the parse_str
function to use a new Java class definition.
Drop Function #
To remove a function from Flink SQL:
-- Flink SQL
DROP FUNCTION mydb.parse_str;
This statement deletes the existing parse_str
function from the mydb
database, relinquishing its functionality.
Lambda Function Usage in Spark #
Create Function #
-- Spark SQL
CALL sys.create_function(`function` => 'my_db.area_func',
`inputParams` => '[{"id": 0, "name":"length", "type":"INT"}, {"id": 1, "name":"width", "type":"INT"}]',
`returnParams` => '[{"id": 0, "name":"area", "type":"BIGINT"}]',
`deterministic` => true,
`comment` => 'comment',
`options` => 'k1=v1,k2=v2'
);
Alter Function #
-- Spark SQL
CALL sys.alter_function(`function` => 'my_db.area_func',
`change` => '{"action" : "addDefinition", "name" : "spark", "definition" : {"type" : "lambda", "definition" : "(Integer length, Integer width) -> { return (long) length * width; }", "language": "JAVA" } }'
);
-- Spark SQL
select paimon.my_db.area_func(1, 2);
Drop Function #
-- Spark SQL
CALL sys.drop_function(`function` => 'my_db.area_func');