Snowflake offers a variety of functions to perform analysis and manipulation. One of these functions is the Snowflake LISTAGG function, a column aggregation function that concatenates all values within a column to a list, using a defined delimiter to separate the elements. Snowflake LISTAGG function can be useful for creating comma-separated lists, combining values based on conditions, ordering elements in the list—and so much more. All of these capabilities assist in transforming/formatting your data to make it easier to read and understand.
In this article, we will provide an in-depth overview of Snowflake LISTAGG function. On top of that, we will cover everything from its syntax, arguments, data types, practical examples, limitations, best practices—and so much more!!
What Does LISTAGG Do in Snowflake?
Snowflake LISTAGG function in Snowflake concatenates multiple string values from input rows into a single string. The concatenated values are separated by a specified <delimiter>
such as a comma or space.
Basically, Snowflake LISTAGG function takes all the data that may be spread across multiple rows and “aggregates” it into a single string output. This is extremely useful for condensing large data sets into a single value that can be easily exported, processed, or even analyzed.
Some example use cases are:
- Combining values from multiple rows into a single cell in reports
- Aggregating long text inputs
- Creating delimited lists of IDs, names—or other attributes from table rows
So, if you use Snowflake LISTAGG function carefully, you can save significant time in data manipulation and get better insights into aggregated string data.
Save up to 30% on your Snowflake spend in a few minutes!
How does LISTAGG function in Snowflake Work?
As we've covered already, Snowflake LISTAGG function is a column aggregation function that concatenates all values within a column into a list, using a defined delimiter to separate the elements. But, it's important to note that the Snowflake LISTAGG function can be used either as an aggregate function or a window function, depending on the context.
The basic syntax of Snowflake LISTAGG function is as follows:
Aggregate function:
LISTAGG(<expr1>, <delimiter> [, <expr2>]) [[WITHIN GROUP] (ORDER BY <orderby_clause>)]
Window function:
LISTAGG( <expr1> [, <delimiter> ] )
[ WITHIN GROUP ( <orderby_clause> ) ]
OVER ( [ PARTITION BY <expr2> ] )
Let’s break down Snowflake LISTAGG() syntax:
- <expr1>: The column or expression containing the values to concatenate
- <delimiter>: The delimiter used to separate the string values (e.g. comma, space). It can be any string literal or expression that evaluates to a string. If omitted, the default delimiter is a comma (,).
- <expr2>: Optional secondary sort expression for ordering
- <orderby_clause>: Optional statement to define ordering before concatenation. It can be any valid ORDER BY clause, such as ORDER BY <expr2> [ASC|DESC]. If omitted, the order of the values is arbitrary.
When Snowflake LISTAGG() is called, it processes rows in groups defined by the query (similar to SUM() or COUNT()) and returns a single string per group.
Here is one simple example of how to use Snowflake LISTAGG function:
First, let’s create a sample table called STUDENTS and insert some dummy data:
As you can see, this table has student ID, first name, and last name.
Now lets use Snowflake LISTAGG function to concatenate the last names together, separated by commas:
As you can see, this groups all the rows and sorts alphabetically by last name before concatenating.
To break this down:
- LAST_NAME is the column value I want to concatenate
- (,) provides a comma and space <delimiter> between values
- ORDER BY LAST_NAME defines the sort order before concatenating alphabetically.
The final output is a string with all the last names combined together in a defined order. Take a look at the last part of this section to learn more about the output format of the Snowflake LISTAGG function.
Supported Data Types
Snowflake LISTAGG function supports the following data types as input:
1) STRING, VARCHAR, CHAR, TEXT
These are string data types that can be directly concatenated by the function.
Example:
The example of the string data type is exactly the one we mentioned above ☝️.
2) NUMERIC, INTEGER, FLOAT, DOUBLE, DECIMAL
These are numeric data types that can be implicitly converted to strings and concatenated by Snowflake LISTAGG function.
Example:
Here is one simple example:
First, let’s create a sample table called numbers and insert some dummy data:
Now let's use Snowflake LISTAGG function to concatenate integer column together, separated by commas:
Again, let's use Snowflake LISTAGG function to concatenate double column together, separated by commas:
3) DATE, TIME, TIMESTAMP
These are date and time data types that can be implicitly converted to strings and concatenated by Snowflake LISTAGG function. The format of the output string depends on the session parameters and the data type.
Example:
Here is one simple example:
First, let’s create a sample table called dates and insert some dummy data:
Now let's use Snowflake LISTAGG function to concatenate dates column together, separated by commas:
Again, let's use Snowflake LISTAGG function to concatenate time column together, separated by commas:
4) BOOLEAN
This is a logical data type that can be implicitly converted to strings and concatenated by the function. The output string is either TRUE or FALSE.
Example:
Here is one simple example:
First, let’s create a sample table called bools and insert some dummy data:
Now let's use Snowflake LISTAGG function to concatenate boolean column together, separated by commas:
Remember this: the delimiter parameter must be a single-character string. Common delimiters are commas (‘,’), spaces (‘ ‘), hyphens (‘-’), and line breaks ('\n' or ''), but any string of characters is valid and allowed.
Which data types are not supported by the LISTAGG function?
Snowflake LISTAGG function does not support the following data types as input:
- GEOSPATIAL(GEOMETRY, GEOGRAPHY): These are geospatial data types that cannot be concatenated by the Snowflake LISTAGG function (Error will be raised if these data types are used as input).
- BINARY, VARBINARY: These are binary data types that cannot be concatenated by the Snowflake LISTAGG function (Error will be raised if these data types are used as input).
- VARIANT, ARRAY, OBJECT: These are complex data types that cannot be concatenated by the Snowflake LISTAGG function (Error will be raised if these data types are used as input).
Output of Snowflake LISTAGG function
Snowflake LISTAGG function returns a string that includes all of the non-NULL input values, separated by the <delimiter>. The output string has a maximum size of 16,777,216 bytes or 16 MB. If the output string exceeds this limit, an error will be raised. To avoid this error, you can use the DISTINCT to remove duplicate values, or the WITHIN GROUP option to limit the number of values in the list.
How LISTAGG function Handles Null or Empty Values?
Snowflake LISTAGG function handles NULL and empty values as follows:
- If the input is empty, meaning that there are no rows to aggregate, the function returns an empty string ('').
- If all input expressions evaluate to NULL, meaning that there are no non-NULL values to concatenate, the function also returns an empty string ('').
- If some but not all input expressions evaluate to NULL, meaning that there are some non-NULL values to concatenate, the function returns a string that contains all non-NULL values and excludes the NULL values. The delimiter is only inserted between the non-NULL values, and not before or after them.
What Are the Restrictions of LISTAGG When Used as Window Function?
When used as a window function, the LISTAGG function does not support the following features:
- ORDER BY sub-clause in the OVER() clause: Snowflake LISTAGG function cannot use an ORDER BY sub-clause within the OVER() clause to specify the order of the values in the list. The order of the values is determined by the partition and the order of the rows in the result set.
- Window frames: Snowflake LISTAGG function cannot use window frames to define the subset of rows to be aggregated within each partition. The function aggregates all the rows within each partition.
Practical Examples
Now, let's delve into a practical example of the Snowflake LISTAGG function. In this section, we will demonstrate how to use the Snowflake LISTAGG function in various scenarios with different parameters.
To demo the usage of the Snowflake LISTAGG function, first, we will create a table called orders table and insert some sample data into it. We will use this sample table and data throughout this example.
Example 1—Concatenating data with a hyphen "-" and comma ","
In this example, we will use Snowflake LISTAGG function to concatenate the product IDs and the quantities for each customer, using a hyphen (-) to separate the product ID and the quantity, and a comma (,) to separate the pairs.
To do so, we can use the following SQL query:
As you can see, we use the concatenation operator (||) to combine the product ID and the quantity with a hyphen, and then we pass this expression as the first argument to the Snowflake LISTAGG function. We also specify the comma and the space as the delimiter for the Snowflake LISTAGG function. Finally, we then group the rows by the customer name and select the customer name and the products as the output columns.
The output of this query is:
Example 2—Concatenating data with '|’ using the LISTAGG
In this example, we will use the Snowflake LISTAGG function to concatenate the product IDs for each customer, using a vertical bar '|' as the delimiter, and ordering them alphabetically.
SELECT
customer_name,
LISTAGG(product_id, '|') WITHIN GROUP (ORDER BY product_id) AS product_list
FROM orders_table
GROUP BY customer_name;
As you can see, we use the vertical bar '|' as the delimiter in the LISTAGG function. The WITHIN GROUP (ORDER BY product_id) clause make sures that the product IDs are concatenated in alphabetical order for each customer.
The output of this query is:
Example 3—Concatenating data with empty space
Now, in this example, we will use Snowflake LISTAGG function to concatenate the product IDs separated by a space for each customer_id group.
As you can see, this query concatenates product_ids separated by a space for each customer_id group. Unlike previous example, no ORDER BY is included so it preserves the original order of rows. The output is a space-separated list of all products purchased by each unique customer ID.
The output of this query is:
Example 4—Concatenating order details in specific order
Now, in this example, we will use Snowflake LISTAGG function to concatenate the order IDs for each customer, using a comma (,) delimiter. But, we want to order the order IDs by the price in descending order, so that the most expensive order comes first.
To do so, we can use the following SQL query:
As you can see, in this query, we use WITHIN GROUP clause to specify the order of the values in the list. We then use the ORDER BY sub-clause to order the order IDs by the price in descending order. The rest of the query is the same as in the previous examples.
The output of this query is:
Example 5—Concatenating values based on conditions
In this example we will demonstrate conditional concatenation capability of Snowflake LISTAGG function.
As you can see below, this query demonstrates conditional concatenation by checking the purchase quantity and classifying products as either "HIGH" or "LOW" product quantity. These labels are concatenated along with the product_id by customer name, which allows segmentation of products into high/low volume for analysis. The final output string groups each customer's products by purchase frequency.
The output of this query is:
Watch this video for an in-depth hands on guide on how to use the Snowflake LISTAGG function.
Advanced Tips and Techniques
Snowflake LISTAGG function offers robust data aggregation capabilities. If you are planning to maximize its potential, here are some advanced techniques to follow. These tips will lift your concatenation skills to the next level.
1) Use Snowflake DISTINCT
Snowflake DISTINCT allows you to remove duplicate values from the list, so that each value appears only once, which can be useful when you want to create a unique list of values from a column that contains repeated values.
2) Use WITHIN GROUP Option
WITHIN GROUP option allows you to order the values in the list, using a valid ORDER BY clause, which can be extremely useful when you want to sort the values in the list by some criteria, such as alphabetical order, numerical order—or custom order.
3) Combine Snowflake LISTAGG() with Other Aggregate Functions
Snowflake LISTAGG function can be combined with other functions in Snowflake to perform more complex operations and transformations on the data.
For example, you can use the following functions with Snowflake LISTAGG function:
- Snowflake aggregation functions like MAX(), MIN(), SUM() if you nest it inside, like this:
- Snowflake CONCAT or || operator to concatenate the values with other strings or expressions before or after the list.
- SPLIT or SPLIT_PART functions to split the output string into an array or a single element based on the delimiter.
4) Use TABLESAMPLE to test Snowflake LISTAGG()
When dealing with extremely large tables, sampling data with TABLESAMPLE can help test and refine your Snowflake LISTAGG formulas faster. Let’s try this one out in one of our previous examples.
As you can see, this samples just 2 rows to test the concatenation.
Limitations and Best Practices
Now, in this section, we will discuss some of the limitations and best practices of the Snowflake LISTAGG function, and how to overcome or avoid them.
What Is the Limitation of LISTAGG?
Snowflake LISTAGG function has the following limitations that you should be aware of:
1) Max size of the output string
The output string of the LISTAGG function has a maximum size of 16,777,216 bytes or 16 MB. If the output string exceeds this limit, an error will be raised and the query will fail. This can often happen when you have a massive number of values to concatenate, or when the values are extremly long.
2) NULL impact
Snowflake LISTAGG function handles NULL values differently depending on the context. If the input is empty, or all input expressions evaluate to NULL, the output is an empty string. If some but not all input expressions evaluate to NULL, the output contains all non-NULL values and excludes the NULL values. To handle NULL values, you can use the COALESCE functions or the CASE expression to conditionally return a value or NULL.
3) Performance implication
Snowflake LISTAGG function can have a significant impact on the performance of your queries, especially when you use it on large tables or with complex expressions. It requires a lot of memory and CPU resources to process and concatenate the values—and it can cause data skew and spilling issues if the output string is too large or unevenly distributed. To improve the performance, follow the tips below:
- Use DISTINCT
- Use WITHIN GROUP
- Use PARTITION BY clause
- Use LIMIT clause
- Use FILTER clause
- Use OVER clause to use Snowflake LISTAGG function as a window function and avoid grouping the entire table.
- Use TABLESAMPLE
What Are the Best Practices for Using Snowflake LISTAGG?
To use the LISTAGG function effectively and efficiently, you should follow some best practices, such as:
1) Use DISTINCT to remove duplicate values
DISTINCT option allows you to remove duplicate values from the list, so that each value appears only once. This can help you to create a unique list of values from a column that contains repeated values.
2) Use appropriate delimiters
Delimiter is the expression that specifies the separator to be used to separate the values in the list. It can be any string literal or expression that evaluates to a string. You should choose a delimiter that is suitable for your data and your purpose—and that does not conflict with the values in the list.
3) Expressions within the LISTAGG function must be convertible to string
Snowflake LISTAGG function accepts any data type that can be implicitly converted to a string, such as numeric, date, or boolean. But, if you use a data type that cannot be converted to a string, such as geospatial, binary, or variant, an error will be raised and the query will fail. To avoid this, you should always use a data type that can be converted to a string, or use an explicit conversion function, such as TO_VARCHAR, TO_DATE, or TO_BOOLEAN, to convert the values to strings before passing ‘em.
4) Document each and every query
Snowflake LISTAGG function can be used to create complex and dynamic queries that perform various operations and transformations on the data.But remember that, these queries can also be quite difficult to understand and maintain, especially when you use multiple parameters and expressions. To make your queries more readable and understandable, you should document your queries by adding comments, using descriptive names.
Real-World Use Case Scenarios
Snowflake LISTAGG function is powerful and can be used in a variety of real-world settings across industries and fields. Here are some typical use case scenarios where Snowflake LISTAGG can be really useful:
1) Improving Report Readability
Imagine a report of customer orders. Each order could contain several product IDs. LISTAGG allows you to combine all product IDs from an order into a single cell, enhancing readability and conserving space.
2) Combining long text inputs
Some data fields may include lengthy descriptions or comments. Snowflake LISTAGG allows you to combine these lengthy items into a single cell, making data handling and analysis easier.
3) Creating Delimited Lists
If you need to export data for further processing in another application, LISTAGG can be useful. You may easily move data between systems by creating comma-separated lists of IDs, names, or various other properties.
4) Simplifying Data Exploration
Data analysts frequently work with complex datasets with several columns. LISTAGG may aid simplify exploration by condensing relevant information into a single list, allowing for quicker identification of patterns and anomalies.
These are just a few instances of how to use the Snowflake LISTAGG function in real-world scenarios across multiple domains. Its ability to concatenate and aggregate data makes it an effective tool for data processing, analysis, and reporting.
What Is the Difference Between LISTAGG and Snowflake CONCAT?
Snowflake offers a variety of string functions to work with string data, and Snowflake CONCAT is one of them. It helps concatenate two or more string values into a single string output. Here's a table highlighting the key differences between Snowflake CONCAT and Snowflake LISTAGG functions:
Snowflake CONCAT | Snowflake LISTAGG |
Snowflake CONCAT concatenates two or more strings into a single string | Snowflake LISTAGG concatenates all values from a group into a delimited string |
Syntax for Snowflake CONCAT is: CONCAT( <expr1> [ , <exprN> ... ] ) ] | Syntax for Snowflake LISTAGG is: LISTAGG( [ DISTINCT ] <expr1> [, <delimiter> ] ) [ WITHIN GROUP ( <orderby_clause> ) ] |
Snowflake CONCAT is a string function that operates on individual rows | Snowflake LISTAGG is an aggregation function that operates on groups of rows |
Snowflake CONCAT accepts one or more string expressions as input arguments | Snowflake LISTAGG accepts a single column/expression as input, along with an optional delimiter |
If any input argument is NULL, it returns NULL as the output | It excludes NULL values from the concatenated output string |
The output of Snowflake CONCAT is a single concatenated string without any delimiters or ordering | The output of Snowflake LISTAGG is a single delimited string containing all non-NULL values from the group |
Does not provide built-in capabilities for ordering, removing duplicates, or handling delimiters | Supports ordering via the WITHIN GROUP (ORDER BY) clause and provides a DISTINCT option to remove duplicate values |
No explicit limit on the size of the output string | Maximum size of the output string is limited to 16MB |
Snowflake CONCAT is efficient for basic string concatenations and text manipulations | Snowflake LISTAGG is useful for creating comma/delimiter-separated lists, aggregating long texts, and report formatting, but can be resource-intensive for large datasets or complex expressions |
TL;DR: Snowflake CONCAT is ideal for simple string joining, while Snowflake LISTAGG is more powerful for advanced concatenation requirements, especially when dealing with grouped data.
What Is the Difference Between Snowflake ARRAY_AGG and Snowflake LISTAGG?
The ARRAY_AGG function also aggregates row data into a single output, but returns an array rather than string. Here are some differences between Snowflake LISTAGG and ARRAY_AGG:
Snowflake LISTAGG | Snowflake ARRAY_AGG |
Snowflake LISTAGG returns the concatenated input values, separated by the delimiter string. | Snowflake ARRAY_AGG returns the input values, pivoted into an ARRAY. |
Syntax for Snowflake LISTAGG is: LISTAGG( [ DISTINCT ] <expr1> [, <delimiter> ] ) [ WITHIN GROUP ( <orderby_clause> ) ] | Syntax for Snowflake ARRAY_AGG is: ARRAY_AGG( [ DISTINCT ] <expr1> ) [ WITHIN GROUP ( <orderby_clause> ) ] |
Output size limit of Snowflake LISTAGG is 16MB | Output size limit of Snowflake ARRAY_AGG is 16MB |
NULL values are omitted from the output. | NULL values are omitted from the output. |
Delimiter is required, default is an empty string | Delimiter is not required |
Follows the collation of the input and the session. | Collation is not applicable |
Performance is very fast on large data sets | Performance might get slower with more array overhead |
As you can see, Snowflake LISTAGG outputs a simple string perfectly suited for concatenation tasks like reporting. ARRAY_AGG allows post-processing like indexing but adds more overhead.
Want to take Chaos Genius for a spin?
It takes less than 5 minutes.
Conclusion
And that’s a wrap! Snowflake LISTAGG function is extremely powerful if you use it carefully. It is an extremely versatile function that helps for concatenating values within a column to a list, using a defined delimiter to separate the elements. It can help you to transform and format your data in a way that is easier to read and understand—and to solve common problems and scenarios.
In this article, we covered:
- What Does LISTAGG() Do in Snowflake?
- How Snowflake LISTAGG function Works?
- How Snowflake LISTAGG function Handles Null or Empty Values?
- Practical Examples and Use Cases of Snowflake LISTAGG function
- Advanced Tips and Techniques with Snowflake LISTAGG()
- Limitations and Best Practices of Snowflake LISTAGG()
- Difference between Snowflake CONCAT and Snowflake LISTAGG?
- Difference between Snowflake ARRAY_AGG and Snowflake LISTAGG?
—and so much more!
Think of the Snowflake LISTAGG function as a glue stick that sticks the values together into one single string, using a separator/delimiter of your choice.
FAQs
What does Listagg do in Snowflake?
Snowflake LISTAGG is an aggregate function that concatenates multiple string values from input rows into a single delimited string.
How does Snowfklake LISTAGG function work ?
Snowflake LISTAGG processes rows in groups based on the query and returns a single concatenated string per group. The ORDER BY clause sorts the rows before concatenating them.
What data types can be used as input to Snowflake LISTAGG?
LISTAGG supports STRING, VARCHAR, NUMERIC, DATE, TIME, TIMESTAMP and BOOLEAN data types.
What is the syntax of LISTAGG?
The basic syntax of Snowflake LISTAGG function is as follows:
Aggregate function:
LISTAGG(<expr1>, <delimiter> [, <expr2>]) [[WITHIN GROUP] (ORDER BY <orderby_clause>)]
Window function:
LISTAGG( <expr1> [, <delimiter> ] )
[ WITHIN GROUP ( <orderby_clause> ) ]
OVER ( [ PARTITION BY <expr2> ] )
What is the maximum size for Snowflake LISTAGG output?
Output string is limited to 16 MB by default.
How does Snowflake LISTAGG handle NULL values in input data?
NULL values are excluded from the final concatenated output string.
Can Snowflake LISTAGG remove duplicate values?
Yes, the DISTINCT keyword can eliminate duplicates in the concatenated list.
When should ARRAY_AGG be used over Snowflake LISTAGG?
If post-processing of output as an array is needed, ARRAY_AGG may be a better choice.
What are some common use cases for the Snowflake LISTAGG function?
Common uses include combining multiple rows of data into a single cell, aggregating text strings like comments/notes, and creating delimited IDs or names.
Can Snowflake LISTAGG concatenate values from multiple columns?
Yes, multiple columns can be concatenated by using the concatenation operator || between column names.
Can Snowflake LISTAGG output be ordered without affecting the query groups?
No, LISTAGG ordering always applies within the existing groups defined by the query.
Can window functions be used for Snowflake LISTAGG?
Yes, Snowflake LISTAGG can act as a window function using the OVER() clause.
Does Snowflake LISTAGG allow concatenation of geospatial data?
No, geospatial data types cannot be handled by LISTAGG.
Does Snowflake LISTAGG guarantee order without ORDER BY clause?
No, without ORDER BY the concatenate order should not be relied upon.
What is CONCAT in Snowflake?
Snowflake CONCAT joins two or more string values together into a single string output. It merges multiple text arguments sequentially in the order specified.
What is Array_agg in Snowflake?
Snowflake ARRAY_AGG function aggregates row data into a single output, but returns an array rather than string.
What is DISTINCT in LISTAGG Snowflake?
In Snowflake, DISTINCT is a keyword that can be used with the LISTAGG function to remove duplicate values from the concatenated string.