Snowflake UDFs—Comprehensive Guide to User-Defined Functions (2024)
UDFs, or User-Defined Functions, allow users to define their own custom functions when an operation they're trying to perform isn't natively available as a built-in function. Zooming into Snowflake, UDFs take on a special role. They provide customized solutions, guaranteeing that users are not restricted by Snowflake’s native capabilities. Their true value lies in their adaptability, enabling more flexible and enhanced data operations within Snowflake.
In this article, we will explore what Snowflake UDFs are, the types of Snowflake UDFs, how to create and call UDFs using a variety of programming languages, examples, use cases, performance considerations—and a whole lot more!
What are UDFs (User-Defined Functions)?
User-Defined Function (UDF) in SQL is a custom function that you can create and use within SQL. UDFs allow you to enhance SQL's capabilities by writing your own logic that is run in the same way as a standard function call. For example, you may create a UDF to do complex computations or data transformations that would be time-consuming to accomplish in SQL. The main advantage of UDFs is that once you define them, you can call them anywhere in your SQL statement, just like with regular built-in functions, making your queries more readable and maintainable by abstracting complex logic into reusable modules.
UDFs encapsulate reusable logic written in a variety of programming languages, such as JavaScript, Python, and Java, and return scalar values or result sets that may be used in SQL queries. UDFs allow users to add business logic, implement complex calculations, automate data transformations, enforce data validation, and a whole lot more. Once created, UDFs can be called in SELECT statements, JOIN clauses, WHERE clauses, and other places within a SQL query, just like any native function.
Want to take Chaos Genius for a spin?
It takes less than 5 minutes.
What are the benefits of using UDFs?
The key benefits of using UDFs are:
- UDFs allow for modular programming, breaking down complex tasks into simple, manageable chunks
- UDFs can enhance performance and minimize code size by reusing the same function in multiple scripts
- UDFs might run faster compared to some other database routines
- UDFs make your code simpler and easier to take care of
- UDFs can be used in different parts of SQL statements, making coding more flexible
- UDFs that return tables can be treated as another rowset, facilitating joins with other tables
- UDFs allow parameter passing for customized returns, making it simpler to invoke in SQL statements
Now that we have a general overview of what UDFs are, let's dive into Snowflake UDFs.
What are Snowflake UDFs?
Snowflake User-Defined Functions (UDFs) are custom functions that you create to perform specific tasks within your Snowflake environment. They allow you to encapsulate complex or repetitive logic into a single function, making your code more organized and easier to manage. You can create Snowflake UDFs using SQL, JavaScript, Python, Java, Scala. Once created, they can be used just like any built-in function in Snowflake, helping to extend Snowflake's capabilities beyond its native functions.
What are the Types of Snowflake UDFs?
There are two types of Snowflake UDFs:
1) Scalar Snowflake UDFs
2) Tabular Snowflake UDFs
1) Scalar Snowflake UDFs
Scalar Snowflake UDFs return one output row for each input row. For example, a scalar Snowflake UDF could accept a user's first and last name as input and return a single string concatenating the two names.
Some common use cases for scalar UDFs include data transformations, cleaning, formatting, advanced calculations, data validation, etc. The logic can leverage complex conditional logic, loops, external calls, and more to generate a single return value.
2) Tabular Snowflake UDFs
Also known as Snowflake User-Defined Table Functions (UDTFs), tabular Snowflake UDFs return a tabular value for each input row. For example, a UDTF could be designed to process a dataset of sales transactions, where each input row represents a transaction. The UDTF could be structured to aggregate sales data by region within each partition and return a table summarizing sales totals and averages for each region.
Tabular UDFs are commonly used to encapsulate joins or complex table operations that can be reused across queries. They provide efficiency and abstraction by consolidating complex SQL operations into a reusable function.
Supported Programming Languages in Snowflake UDFs
Snowflake allows UDF logic to be written using a variety of programming languages, providing flexibility to leverage existing skills and code. These are the following supported languages for UDFs in Snowflake:
- Java (Strongly typed, object-oriented language commonly used for backend applications)
- JavaScript (Dynamic scripting language commonly used for front-end applications)
- Python (Interpreted scripting language popular for analytics and data science)
- Scala (General purpose language that combines OOPS and functional concepts)
- SQL (Snowflake variant of SQL with scripting capabilities)
The choice fully depends on your use case, knowledge, and the types of processing needed within UDFs.
How to create Snowflake UDFs?
Snowflake provides the CREATE FUNCTION syntax to define and create new UDFs. The key parts of the statement include:
- <name>: Unique name for the function
- <return_data_type]: Specifying return value data type
- <language>: UDF logic language (SQL, JavaScript, Java, Python, Scala, etc)
- <handler>: Entry point of the UDF, specifying the function to be executed (needed for Java, Scala, and Python)
- <Actual code>: The executable logic of the UDF
Here are the prerequisites for creating Snowflake User-Defined Functions (UDFs):
- An active Snowflake account with the necessary user permissions to create DB objects
- Familiarity with the Snowflake User Interface (Snowsight) to navigate and manage database operations
- Knowledge of supported languages for writing Snowflake UDFs
Here is the basic syntax to create Snowflake UDFs:
CREATE OR REPLACE FUNCTION <name>([ <arg_name> <arg_data_type> ] [ , ... ])
RETURNS <result_data_type>
LANGUAGE <language>
AS
$$
<actual_code>
$$
;
The syntax used for creating Snowflake UDFs is identical to that of Snowflake stored procedures. We've previously published an article on Snowflake Stored Procedures; for a deeper dive into the topic, please refer to that.
Below are examples of creating simple Snowflake UDFs to add two numbers together, each written in a supported Snowflake UDF language:
1) SQL—Snowflake UDFs syntax
CREATE FUNCTION <name>([ <arg_name> <arg_data_type> ] [ , ... ])
RETURNS <result_data_type>
LANGUAGE SQL
AS
$$
<actual_code>
$$
;
2) Java—Snowflake UDFs syntax
CREATE OR REPLACE FUNCTION <name>([ <arg_name> <arg_data_type> ] [ , ... ])
RETURNS <result_data_type>
LANGUAGE JAVA
HANDLER= ’SomeClass.SomeFunction’
AS $$
class SomeClass {
public static int SomeFunction(<arg_name1>, <arg_name2>) {
return arg_name1+ arg_name2 ;
}
}
$$;
3) JavaScript—Snowflake UDFs syntax
CREATE OR REPLACE FUNCTION <name>([ <arg_name> <arg_data_type> ] [ , ... ])
RETURNS <result_data_type>
LANGUAGE JAVASCRIPT
AS $$
function add(<arg1>, <arg2>) {
return arg1 + arg2 ;
}
return add(arg_name1, arg_name2 );
$$;
4) Python—Snowflake UDFs syntax
create or replace function <name>([ <arg_name> <arg_data_type> ] [ , ... ])
RETURNS <result_data_type>
LANGUAGE PYTHON
runtime_version = '<3.8 or 3.9 or 3.10>’
HANDLER= ’SomeFunction’
as
$$
def SomeFunction(num1, num2):
return num1 + num2
$$;
4)Scala —Snowflake UDFs syntax
CREATE OR REPLACE FUNCTION <name>([ <arg_name> <arg_data_type> ] [ , ... ])
RETURNS <result_data_type>
LANGUAGE SCALA
RUNTIME_VERSION = 2.12
HANDLER= 'SomeClass.SomeFunction'
AS $$
class {
def SomeFunction(a: Float, b: Float): Float = {
a + b
}
}
$$;
🚨 Note: When creating Snowflake UDFs using Python or Scala, it's important to specify the runtime version for the UDF. This is to make sure that the UDF executes correctly in the desired language environment.
For Python UDFs: Snowflake currently supports writing UDFs in the following versions of Python: 3.8, 3.9, 3.10.
For Scala UDFs: Snowflake currently supports writing UDFs in the following version of Scala: 2.12.
Make sure to specify the correct runtime version based on the language (Python and Scala) and version you are utilizing for your UDF to avoid any execution issues.
How to Call Snowflake UDFs?
Once created, UDFs can be invoked in SQL queries just like any other Snowflake function.
For scalar Snowflake UDFs, you simply call the UDF by name and pass the required arguments. For example:
SELECT udf_name([ <arg_name> <arg_data_type> ] [ , ... ])
FROM users;
For tabular Snowflake UDFs, you use the TABLE keyword to call the UDTF and reference it as a data source, e.g.:
SELECT *
FROM TABLE( udtf_name (udtf_args) )
This allows UDFs to be seamlessly leveraged anywhere within a SQL query.
How to List All Snowflake UDFs?
Snowflake provides metadata views and commands to list existing UDFs defined in your account:
- SHOW USER FUNCTIONS: List UDFs for current user
- SHOW FUNCTIONS: List UDFs for current database & schema
- SHOW EXTERNAL FUNCTIONS: Lists all the external functions created for your account
- FUNCTIONS view: This Information Schema view displays a row for each Snowflake UDFs or external function defined in the specified db.
For example:
SHOW USER FUNCTIONS;
This returns a list of all UDFs available to the currently logged-in user along with all metadata
Scalar Vs. Tabular—Snowflake UDFs Examples
Let's look at some examples of creating and using UDFs in Snowflake for both scalar and tabular types.
1) Scalar Snowflake UDFs Examples
a) Calling Snowflake UDFs inside a SELECT query using SQL Snowflake UDFs
Let's create a CUSTOMERS table that contains information about customers, including their names and birth dates.
CREATE OR REPLACE TABLE CUSTOMERS(
customer_name VARCHAR,
birth_date DATE
);
-- Inserting some data into the CUSTOMERS table
INSERT INTO CUSTOMERS VALUES
('Johnny', '1985-07-15'),
('Jammie', '1990-08-25'),
('Chaos', '1975-04-22'),
('Genius', '2000-03-10');
Now, let's create a UDF to calculate the age of each customer based on their birth date.
-- Creating a Snowflake UDFs to calculate age based on birth date
CREATE OR REPLACE FUNCTION calculate_age(birth_date DATE)
RETURNS INT
LANGUAGE SQL
AS $$
FLOOR(DATEDIFF('Year', birth_date, CURRENT_DATE()))
$$;
Then, we'll call this Snowflake UDF in a SELECT query to retrieve the names and ages of all customers.
-- Using calculate_age UDF in a SELECT query to retrieve names + ages of all customers
SELECT
customer_name,
calculate_age(birth_date) AS age
FROM
CUSTOMERS;
b) Calculating Average Temperature—Snowflake UDFs with a Query Expression with SELECT Statement using SQL Snowflake UDFs
-- Creating TEMPERATURES table
CREATE OR REPLACE TABLE TEMPERATURES(
recorded_temp FLOAT
);
-- Inserting some dummy data into the TEMPERATURES table
INSERT INTO TEMPERATURES VALUES
(68.5),
(72.3),
(65.4),
(70.1);
-- Creating Snowflake UDFs to calculate the average temperature
CREATE OR REPLACE FUNCTION get_average_temp()
RETURNS FLOAT
LANGUAGE SQL
AS
$$
SELECT AVG(recorded_temp) FROM TEMPERATURES
$$;
-- Calling the UDF using SELECT
SELECT get_average_temp();
As you can see, we first created a TEMPERATURES table and inserted some temperature data. Next, we created a get_average_temp Snowflake UDFs to calculate the avg temperature from the TEMPERATURES table. Finally, we invoked the get_average_temp UDFs using a SELECT statement to retrieve the average temperature.
c) Concatenate first_name and last_name using Javascript Snowflake UDFs
CREATE OR REPLACE FUNCTION ConcatNames(first_name STRING, last_name STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
AS $$
return first_name.concat(" ", last_name);
$$
;
As you can see, this creates a Snowflake UDF named ConcatNames that accepts first_name and last_name strings as input and returns the concatenated string. The UDF logic is written in JavaScript and contained within the $$ delimiters.
d) Creating a Java Snowflake UDFs to Generate a Greeting
CREATE OR REPLACE FUNCTION greetUser(name VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVA
HANDLER ='Greeter.greet'
AS
$$
public class Greeter {
public static String greet(String name) {
return "Hello " + name + "!";
}
}
$$;
SELECT greetUser('pramit')
As you can see in this example, a Java UDF greetUser is defined, which accepts a VARCHAR name input and returning a VARCHAR greeting. It uses JAVA language and the Greeter.greet handler to concatenate the input name into a greeting string.
e) Concatenating Strings with a Python UDFs
CREATE OR REPLACE FUNCTION concatStrings(str1 STRING, str2 STRING)
RETURNS STRING
language python
runtime_version = '3.8'
handler = 'concat'
as
$$
def concat(str1, str2):
return str1 + " " + str2
$$;
SELECT concatStrings('Chaos', 'Genius');
As you can see in this example, a Python Snowflake UDF concatStrings is defined accepting two STRING and returning the concatenated STRING. It uses PYTHON language and the concat handler to join the input strings.
f) Calculating Factorials Using a Scala Snowflake UDFs
CREATE OR REPLACE FUNCTION factorial(num INT)
RETURNS INT
LANGUAGE SCALA
RUNTIME_VERSION = 2.12
HANDLER ='Factorial.compute'
AS
$$
object Factorial {
def compute(num: Int): Int = {
if (num <= 1)
1
else
num * compute(num - 1)
}
}
$$;
SELECT factorial(10);
As you can see in this example, a Scala UDF factorial is created which accepts an INT input and returning the INT factorial. It uses SCALA and the Factorial.compute handler to calculate the factorial of the input.
Note: When creating Python or Scala Snowflake UDFs, it's important to specify the runtime version.
2) Tabular Snowflake UDFs Examples
When you're setting up Tabular Snowflake UDFs(UDTFs), you'll need to specify TABLE(.....). Inside the parentheses, list the names of the output columns and the types of data they'll hold.
a) Creating a Snowflake UDTFs to return tabular data
CREATE OR REPLACE TABLE employees (
id INT,
name VARCHAR,
dept VARCHAR
);
INSERT INTO employees VALUES
(1, 'johnnyy', 'Software Engineering'),
(2, 'Jimmyy', 'Mathematics'),
(3, 'Chaos Genius', 'Data Engineering');
CREATE OR REPLACE FUNCTION get_employees_by_dept(dept_name VARCHAR)
RETURNS TABLE (id INT, name VARCHAR, dept VARCHAR)
LANGUAGE SQL
AS $$
SELECT id, name, dept
FROM employees
WHERE dept = dept_name
$$;
SELECT * FROM TABLE(get_employees_by_dept('Data Engineering'));
In this example, as you can see, we initially created a table named employees and populated it with some dummy data. Following that, we defined a User-Defined Table Function (UDTF) called get_employees_by_dept, which accepts a VARCHAR department name and returns a TABLE consisting of ID, name, and department columns. These columns are selected from the employee table and filtered by the input department. This UDTF can be invoked in a query using the FROM TABLE() syntax to return tabular rows for a specific department, thereby encapsulating the filtering logic in reusable Snowflake UDTFs.
- JOIN Operation in the Snowflake UDTF—Retrieving Orders with Currencies
-- Creating the orders table
CREATE OR REPLACE TABLE orders (
id NUMBER,
date DATE,
country STRING,
amount NUMBER
);
-- Inserting dummy data into the orders table
INSERT INTO orders VALUES
(1, '2022-01-01', 'US', 100.00),
(2, '2022-02-01', 'US', 200.00),
(3, '2022-03-01', 'UK', 150.00),
(4, '2022-04-01', 'UK', 250.00);
-- Creating currency table
CREATE OR REPLACE TABLE currencies (
country STRING,
currency STRING
);
-- Inserting dummy data
INSERT INTO currencies VALUES
('US', 'USD'),
('UK', 'GBP');
-- SalesWithCurrency Snowflake UDTFs
CREATE OR REPLACE FUNCTION SalesWithCurrency(country_code STRING)
RETURNS TABLE(id NUMBER, date DATE, amount NUMBER, currency STRING)
LANGUAGE sql
AS
$$
SELECT o.id, o.date, o.amount, c.currency
FROM orders o
JOIN currencies c ON o.country = c.country
WHERE o.country = country_code
$$;
-- Using SalesWithCurrency function to fetch data for the US
SELECT * FROM TABLE(SalesWithCurrency('US'));
In this example, as you can see, we initially create the order and 'customer' tables and populate them with some dummy data. After that, we define a User-Defined Table Function (UDTF) named SalesWithCurrency that accepts a STRING representing the country code and returns a TABLE. This table consists of id, date, amount, and currency columns selected from a joined view of the orders and currencies tables, filtered by the input country code. This UDTF can be invoked in a query using the FROM TABLE() syntax to return tabular rows for a specific country. This arrangement encapsulates the join and filtering logic in reusable Snowflake UDTFs, simplifying the retrieval of sales data along with the corresponding currency information based on the country of interest.
Check out this documentation if you want to learn more about creating tabular Snowflake UDFs (UDTFs) in the following languages:
Common Use Cases of Snowflake UDFs
There are many valuable use cases for Snowflake UDFs, they are:
- Encapsulation of Complex Logic: Snowflake UDFs allow for the encapsulation of complex logic into reusable modules, which promotes code reusability and maintainability.
- Data Transformation Operations: Snowflake UDFs enable data transformation operations like converting data types (e.g., string to date), and formatting/sanitizing data (e.g., trimming, lowercasing), which are essential for data cleaning and preparation.
- Advanced String Manipulation: Snowflake UDFs facilitate advanced string manipulation like pattern matching, parsing, and concatenation to handle text data, aiding in text processing and analysis.
- Common Logic Sharing across Queries: Snowflake UDFs allow for the sharing of common logic across multiple queries through reusable functions, promoting consistency and reducing code duplication.
- Code Modularization: Snowflake UDFs assist in modularizing code into logical units of work to improve organization and maintainability, making the codebase easier to manage and debug.
- Optimization of Expensive Operations: Snowflake UDFs can be used to optimize expensive operations and queries by tuning the logic, improving performance and reducing resource consumption.
- Extension of Native SQL Capabilities: Snowflake UDFs extend Snowflake's native SQL capabilities by adding custom programming logic, enhancing the flexibility and functionality of Snowflake.
Advantages of Using Snowflake UDFs
Some of the key advantages of using UDFs in Snowflake include:
- Custom Logic Implementation: Snowflake UDFs allow creating and running custom logic to meet specific needs, useful for complex calculations that standard SQL functions can't handle.
- Code Reusability and Maintenance: Snowflake UDFs make it easy to create reusable functions, simplifying code management in larger projects.
- Data Transformation: Snowflake UDFs help clean, format, and prepare data, ensuring it's ready for analysis.
- Seamless Integration with External Libraries: Snowflake UDFs can work with external libraries and services, expanding Snowflake's native capabilities.
- Data Security: Snowflake UDFs keep data processing within Snowflake, ensuring data doesn't leave the platform, which is safer for handling sensitive information.
Critical Considerations for Snowflake UDFs and UDTFs
While extremely useful, there are critical considerations to be aware of when working with Snowflake UDFs and UDTFs, such as::
- Queries with Snowflake UDFs that access staged files will fail if they also query a view containing any UDFs or UDTFs, even if those functions don't access staged files.
- Snowflake UDTFs can process multiple files in parallel, but UDFs process files serially currently. Group rows in subquery as a workaround.
- If staged files referenced in a query are modified/deleted during query execution, the Snowflake UDFs / UDTF call will fail with an error.
- Snowflake UDFs that query tables should handle transactional consistency carefully.
- Always monitor Snowflake UDFs performance and optimize expensive functions.
- Design idempotent Snowflake UDFs to allow retry on failures.
- Parameters should be sanitized and validated.
- Follow security best practices around data access in UDFs.
What is the Difference between Snowflake Stored Procedures and Snowflake UDFs?
Snowflake also provides stored procedures for reusable SQL logic. Key differences vs UDFs are:
Snowflake UDFs | Snowflake Stored Procedures |
---|---|
Snowflake UDFs primarily calculate and return a value. It's mandatory for them to return a value. | Stored Procedures are used to perform administrative operations by executing SQL statements, and they do not necessarily need to return a value. |
The values returned by Snowflake UDFs can be used directly in SQL statements. | The values returned by Stored Procedures may not be directly usable in SQL statements. |
Snowflake UDFs only support SELECT statements within the function body. | Stored Procedures support both Data Definition Language (DDL) and Data Manipulation Language (DML) queries inside the procedure body, in addition to SELECT statements. |
Snowflake UDFs are invoked within the context of another statement, typically within a SELECT statement. | Stored Procedures are invoked as independent statements using the CALL statement. |
Multiple Snowflake UDFs can be called within a single SQL statement. | A single executable statement can call only one Stored Procedure. |
Can be called within other SQL statements or within other UDFs, but not within stored procedures. | Can call other stored procedures within its body. Can store result sets in temporary or permanent tables for later use. |
Supported Languages: Java, JavaScript, Python, Scala, SQL | Supported Languages: Java, JavaScript, Python, Scala, Snowflake Scripting |
Save up to 30% on your Snowflake spend in a few minutes!
Conclusion
And that's a wrap! Snowflake User-Defined Functions (UDFs) are a powerful feature, enabling users to create custom functions using diverse programming languages such as SQL, Java, Python, JavaScript, and Scala. UDFs facilitate the encapsulation of intricate logic into reusable modules, which can then be effortlessly invoked within SQL queries. These functions are invaluable for tasks like data transformation, implementation of business rules, optimization of resource-intensive operations, and augmentation of Snowflake's inherent functionalities.
In this article, we covered:
- Nature of Snowflake UDFs
- Various types of Snowflake UDFs
- Programming languages supported by Snowflake UDFs
- The procedure for creating Snowflake UDFs
- How to invoke Snowflake UDFs
- Methods to list all available Snowflake UDFs
- Examples and use-cases for both Scalar and Tabular UDFs
- The benefits of using Snowflake UDFs
- Important factors to consider when working with Snowflake UDFs
- Difference between Snowflake Stored Procedures and Snowflake UDFs
FAQs
What are Snowflake UDFs?
Snowflake UDFs (User Defined Functions) are custom functions created by users to perform specific tasks within Snowflake beyond its native capabilities. They allow adding custom logic using languages like SQL, Java, Python, JavaScript, and Scala.
What are the benefits of Snowflake UDFs?
Key benefits include custom logic implementation, code reusability, data transformation, integration with external libraries, improved security, and extending Snowflake's native SQL capabilities.
How do Snowflake UDFs work?
When a user calls a UDF, they pass the UDF name and arguments to Snowflake. Snowflake then calls the associated code to execute the UDFs logic. The method returns the output to Snowflake, which is then passed back to the client.
What are the types of Snowflake UDFs?
There are two types - Scalar UDFs that return a single value per input row, and Tabular UDFs (UDTFs) that return a table for each input row.
Can you write Snowflake UDFs in Python?
Yes, Snowflake allows users to write the handler for a user-defined function in Python. Snowflake currently supports Python versions 3.8, 3.9, and 3.10.
What languages can you use to write Snowflake UDFs?
SQL, Java, JavaScript, Python, and Scala. Choose based on use case and development skills.
When should you use Snowflake UDFs over procedures?
UDFs are best for reusable scalar logic that can be embedded in SQL queries. Procedures suit admin tasks and complex DML/DDL operations.
Can UDFs call other UDFs?
Yes, Scalar UDFs can call other Scalar UDFs. But Tabular UDFs cannot call other UDFs.
How do you monitor the performance of UDFs?
Use Snowflake query history, query profiles, and UDF-specific metrics to analyze UDF execution times, resource usage, etc., and optimize as needed.
What are the limitations of Snowflake UDFs?
Key limitations are the inability to handle DML operations, restrictions around calling UDFs from procedures, and serialized handling of staged files.
Can you call external services from Snowflake UDFs?
Yes, UDFs can call external APIs, libraries, and services to integrate external capabilities into Snowflake.
How do you handle errors and exceptions in UDFs?
Use TRY/CATCH blocks and other error-handling logic to gracefully handle exceptions raised within UDFs.
Are there any security considerations for UDFs?
Follow least privilege principles and sanitize inputs to UDFs. Don't enable UDFs to access sensitive data unless absolutely necessary.
Can UDFs access Snowflake stages & file formats?
Yes, UDFs can query stage files and also leverage Snowflake file formats like CSV, JSON, Avro, etc.
Can you write unit tests for Snowflake UDFs?
Yes, write test cases to validate UDF logic and expected outputs. Useful for ensuring correct functioning.
What are some common use cases for Snowflake UDFs?
Data transformation, business logic implementation, text processing, shared logic across queries, and code modularization are some common Snowflake UDF use cases.