Snowflake Copilot 101: Turning Plain Text Into SQL Magic (2024)
Snowflake Copilot—released in April 2024—is an LLM-powered assistant that simplifies the data analysis process. This AI tool simplifies complex tasks by converting natural language into SQL queries, making data exploration more intuitive and easy. It's not just smart—it's seamlessly integrated into your Snowflake workflow. Whether you're working in Snowflake SQL Worksheets or Snowflake Notebooks, Copilot is right there with you, ready to help at a moment's notice. And in terms of security, rest assured—Snowflake has you covered. Snowflake Copilot is built with robust security and governance features baked right in. It fully respects your existing Role-Based Access Control (RBAC) configuration, guaranteeing that it only provides suggestions based on the datasets you have permission to access.
In this article, we will cover everything you need to know about Snowflake Copilot. We'll explore what it is, how it works, and, most importantly, how you can fully utilize it to supercharge your data analysis workflow. We'll walk through its capabilities, discuss its limitations, and provide you with a step-by-step guide and best practices to get the most out of this powerful tool.
What is Snowflake Copilot?
Alright, let's get down to brass tacks. What exactly is Snowflake Copilot? In the simplest terms, Snowflake Copilot is an LLM-powered assistant designed to simplify and enhance your data analysis workflow within the Snowflake ecosystem, available at your fingertips 24/7. Its main purpose is to make your life a whole lot easier by helping you navigate, understand, and extract insights from your data more efficiently.
But Snowflake Copilot isn't just any AI tool. It's built on a serious tech stack. The engine that powers Snowflake Copilot is Snowflake Cortex, an intelligent, fully managed AI service that runs securely within the Snowflake ecosystem. The models underlying Snowflake Copilot have been fine-tuned by Snowflake to understand the details of data analysis, SQL queries, and the Snowflake platform itself. This isn't a generic chatbot—it's a specialized tool that speaks the language of data fluently.
In terms of security, Snowflake Copilot takes security very seriously. One of the key features that sets it apart is that your enterprise data and metadata always stay securely inside Snowflake. But it doesn't stop there. Snowflake Copilot is also fully compliant with your existing Role-Based Access Control (RBAC) settings, which means that Snowflake Copilot only has access to the data that you have access to. If you don't have permission to view a certain dataset, neither does Copilot.
Check out this video where Vivek explains the development process and behind-the-scenes details of Snowflake Copilot.
Note: As of now, Snowflake Copilot is available in select regions. Specifically, it's accessible to accounts in:
➤ AWS us-east-1
➤ AWS us-west-2
➤ AWS eu-central-1
If your Snowflake account is in one of these regions, you're in luck! If not, don't worry—Snowflake will expand its availability in the future.
Power of Snowflake Copilot—Capabilities and Use Cases
Now that we've covered what Snowflake Copilot is, it's time to dive into what it can actually do. Let's break it down and explore each of its key features and use cases.
1) Data Exploration
First up, let's talk about data exploration. You know that feeling when you're faced with a new dataset and you're not quite sure where to start? Snowflake Copilot is here to be your guide. Now with the help of Snowflake Copilot, you can ask open-ended questions about your dataset, and it will help you understand its structure, content, and patterns.
For example, you can ask Snowflake Copilot, "What kind of information is in this dataset?" or "What are the main tables, and how are they related?" Snowflake Copilot will analyze the schema and provide you with a clear, concise overview.
2) SQL Query Generation
This is where Snowflake Copilot really shines. If you've ever found yourself staring at a blank SQL editor, trying to figure out how to write a complex query, you're going to love this feature. Snowflake Copilot will automatically generate the SQL query by simply describing what you want to do in plain English.
For example, if you want to fetch the details of students of a certain age group, you might say, "Generate an SQL query to retrieve details of students whose ages are greater than 20". Snowflake Copilot will take that request and translate it into a perfectly formed SQL query. Not only that, Snowflake Copilot can handle complex requests too. Need a query with multiple joins, subqueries, and window functions? No problem. Just describe what you need, and Copilot will do its best to generate the appropriate SQL.
3) SQL Query Execution
Generating SQL queries is great, but Snowflake Copilot doesn't stop there. Once it's generated a query for you, you have a couple of options.
First, you can run the query right away with just a simple click. If the query looks good and you're ready to see the results, Snowflake Copilot makes it easy to execute it immediately.
Also, if you want to tweak the query or perhaps use it as a starting point for further analysis, you can add it to your Snowflake Worksheet. This allows you to modify the query, combine it with other queries, or save it for later use.
4) Complex Query Building and Refinement
Sometimes, getting to the data you need is an iterative process. You might start with a basic query and then refine it based on the results. Snowflake Copilot excels at this kind of iterative query building.
You can have a conversation with Snowflake Copilot, asking it to modify and refine your queries step by step.
This conversational approach to query building can be an absolute game-changer, especially for complex analyses that require multiple steps or iterations.
5) Snowflake Learning Resource
Snowflake Copilot isn't just a tool for working with your data—it's also an excellent resource for learning about Snowflake itself. If you're new to Snowflake or trying to understand a specific feature, Snowflake Copilot can help.
You can ask Snowflake Copilot questions about Snowflake concepts, features, or best practices, and it will provide clear, concise explanations. This feature can be particularly useful for teams adopting Snowflake, as it provides an always-available resource for answering questions and clarifying concepts.
6) Query Performance Optimization
Writing a query that gets the right results is one thing, but writing an efficient query that runs quickly is another. This is where Snowflake Copilot's query optimization capabilities come in.
You can ask Snowflake Copilot to analyze your given queries for efficiency and suggest optimizations. It might recommend adding indexes, rewriting joins, or using different Snowflake features to improve performance.
For example, you could ask, "How can I make this query run faster?" and Snowflake Copilot will analyze the query and suggest improvements.
8) Custom Instructions
Snowflake Copilot offers a feature called Custom Instructions. This allows you to personalize how Snowflake Copilot interacts with you and provide it with specific context or preferences.
For example, you might set a custom instruction like "Always use CTE instead of subqueries in your SQL suggestions" or "Always add comments explaining each part of the queries you generate". You could even provide specific business rules or naming conventions that you want Snowflake Copilot to follow.
These custom instructions help tailor Copilot's responses to your exact specific needs and working style, making it an even more powerful and personalized assistant.
7) Feedback Integration
Last but certainly not least, Snowflake Copilot is designed to learn and improve over time, with user feedback being a crucial part of this process. You can provide feedback by detailing your experience through the built-in feedback mechanism or by upvoting or downvoting responses. This feedback helps the development team identify areas for improvement and implement updates that enhance the tool's performance and usability.
As you can see, Snowflake Copilot has a wide range of capabilities. Snowflake Copilot is a powerful LLM-powered assistant that can improve practically every part of your data analysis process, from helping you explore and understand your data to producing and optimizing complex SQL queries and offering as a learning resource for Snowflake itself.
Next, we'll look at how to get started with Snowflake Copilot and how to take advantage of its amazing capabilities. But first, let's take a brief look at the current pricing structure of this tool.
Pricing and Cost Structure of Snowflake Copilot
Snowflake Copilot is free to use for eligible customers until July 31, 2024, allowing users to integrate it into their workflows without additional costs. The long-term pricing structure remains unannounced, but this period offers an opportunity to evaluate Snowflake Copilot's organizational value.
Snowflake is expected to provide advance notice of any pricing changes. Users are free to explore Copilot's capabilities during this cost-free period.
TL:DR;
➤ Free usage: No extra charges for Copilot until July 31, 2024.
➤ Future pricing: Not yet announced by Snowflake.
➤ Pricing factors: May consider usage frequency, query complexity, and data volume.
➤ Expected pricing model: Likely to be flexible and consumption-based, aligning with Snowflake's typical current approach.
Step-by-Step Guide to Getting Started with Snowflake Copilot
Alright, now that we've covered what Snowflake Copilot is and what it can do, let's roll up our sleeves and get started with this powerful tool. Let's walk through the process step-by-step so that even if you're new to Snowflake, you'll be chatting with your LLM-powered assistant in no time.
Prerequisites:
Before we dive in, let's make sure you have everything you need:
1) Access to a Snowflake Account: This might seem obvious, but you'll need an active Snowflake account. If you don't have one yet, you can sign up for a free trial.
2) Availability in Your Region: Remember, Snowflake Copilot is currently available in select regions:
- AWS us-east-1
- AWS us-west-2
- AWS eu-central-1
Make sure your Snowflake account is in one of these regions.
3) Appropriate Permissions: You'll need the necessary permissions to access Snowflake Copilot. If you're not sure, check with your Snowflake admin.
4) A Database and Schema to Work With: Snowflake Copilot needs a context to operate in, so you'll need to have at least one database and schema set up in your Snowflake account.
Got all that? Great! Let's get started.
Step 1—Login to Snowflake/Snowsight
First things first, you need to log into your Snowflake account. Go to your Snowflake login page. Enter your credentials and click "Log In".
Step 2—Create or open a Snowflake Worksheet or Notebook
Snowflake Copilot integrates seamlessly with both Snowflake SQL Worksheets and Snowflake Notebooks. For this guide, let's use a Snowflake SQL Worksheet.
In the Snowsight interface, look for the "Worksheets" option in the left sidebar. Click the "+ Create" and select "SQL Worksheet" to create a new worksheet, or select an existing one if you prefer.
Step 3—Access Snowflake Copilot
Now, let's bring our LLM-powered assistant into the conversation.
Look for the "Ask Copilot" option in the lower-right corner of your worksheet.
Click on it to open the Snowflake Copilot panel on the right side of your screen.
If you don't see the "Ask Copilot" option, double-check that you're in a supported region and that you have the necessary permissions.
Step 4—Select the appropriate Database and Schema
Before interacting with Snowflake Copilot, make sure you're working in the correct context by selecting the appropriate database and schema.
Method 1: Using the UI
At the top of your worksheet, you should see dropdown menus for selecting your database and schema. Choose the database and schema you want to work with.
Method 2: Using SQL Commands
You can also do this using SQL commands if you prefer:
Using Snowflake Sample Dataset
For this article, we will be using the sample dataset TPCH_SF1 available from Snowflake. To access this dataset, select the SNOWFLAKE_SAMPLE_DATA database and use the TPCH_SF1 schema with the following commands:
OR
Using Snowflake Marketplace Data (Optional)
If you prefer to use pre-existing datasets, you can acquire them from the Snowflake Marketplace. Follow these steps to acquire and install the free Dataset from Snowflake Marketplace:
1) Navigate to Snowflake Marketplace.
2) Search for "Free Company Dataset" or any dataset you prefer (simply search for "Dataset").
3) Click the "Get" button next to the dataset.
4) In the "Options" dropdown menu, specify:
- Name of the database where you want to install the dataset
- Roles that should have access to it
5) Click "Get" to initiate the installation.
After installation, select the new database and its schema using either the UI dropdowns or SQL commands as described above.
Step 5—Enter Your Query
Now for the fun part—let's ask Snowflake Copilot something! In the Copilot panel, you'll see a message box at the bottom. Type in your question or request; you can use natural English language.
For example, you can type: "What tables are available in this schema?" Hit Enter or click the send button to submit your query.
Step 6—Review Snowflake Copilot's Response
Snowflake Copilot will process your request and provide a response. Let's look at how to interact with this response.
Read through Snowflake Copilot's answer carefully. If the response includes a SQL query:
- You'll see options to "Run" or "Add" the query.
- "Run" will execute the query immediately in your worksheet.
- "Add" will add the query to your worksheet without running it, allowing you to modify it first if needed.
If you're satisfied with the response, great! If not, you can ask follow-up questions to refine or clarify.
Step 7—Use Custom Instructions (Optional)
Want to personalize how Copilot interacts with you? Let's set up some custom instructions.
First, click on the Copilot menu at the top of the Copilot panel. Select "Custom instructions" from the dropdown menu.
You'll see a toggle to enable custom instructions for new chats and a text box. Enable the toggle and enter your preferences in plain English.
For example, you can write: "Always use CTEs instead of subqueries and include comments explaining each part of the query".
Click "Save" when you're done.
Step 8—Advanced Usage Scenarios
Now that you're up and running, let's explore some more advanced ways to use Snowflake Copilot.
1) Data Exploration
Ask Snowflake Copilot about the structure of your data.
For example: "How many unique customers are there in the 'customer' table?"
Or
Inquire about data distributions: "What is the distribution of customers across different market segments?"
2) Creating and Refining SQL Statements
Start with a simple request: "Which customers have the highest account balance?"
Then refine it: "Among the customers with the highest account balance, which ones belong to the 'BUILDING' market segment?"
3) Obtaining SQL Statement Explanations
If you have an existing SQL query, ask Snowflake Copilot to explain it: "Can you explain what this query does?" followed by pasting in your SQL.
4) Querying about General SQL and Snowflake Concepts
You can make use of Snowflake Copilot as a learning tool.
For example, you can ask "What's the difference between INNER JOIN and LEFT JOIN?"
Or
Ask about Snowflake-specific features: "How do Snowflake's micro-partitions work?"
Note: Snowflake Copilot is designed to be conversational. Don't hesitate to ask follow-up questions or request clarifications. The more you interact with it, the more you'll discover its capabilities.
That’s it! If you take these steps, you should now be well on your way to leveraging the power of Snowflake Copilot in your data analysis workflows. As you continue to use it, you'll likely discover even more ways it can assist you and streamline your work.
Limitations of Snowflake Copilot—What You Need to Know
As amazing as Snowflake Copilot is, it's important to understand that, like any tool, it has its limitations. Being aware of these constraints will help you use Snowflake Copilot more effectively and set realistic expectations. Let's break down the key limitations you should keep in mind:
1) Language support
First up, let's talk about language support:
a) Natural Language
Currently, Snowflake Copilot only supports English for natural language interactions. If you're more comfortable working in another language, you'll need to formulate your questions and instructions in English.
b) Query Language
When it comes to generating and working with queries, Snowflake Copilot supports SQL-only.
2) Data Access and Query Constraints
Next, let's look at some important constraints related to data access and querying:
a) No Direct Access to Table Data
This is a big one. Snowflake Copilot doesn't have direct access to the actual data in your tables. It can see your table structures, column names, and data types, but it can't peek at the values in your tables. This is great for data security, but it means Copilot can't answer questions like "What’s the top rated product" or "What's the best-selling product?" without you running a SQL query.
b) Unsupported Cross-Database or Cross-Schema Queries
As of now, Snowflake Copilot can't generate queries that span multiple databases or schemas. If you need to join tables from different databases or schemas, you'll need to create a view that combines this data first, or write those parts of the query yourself.
3) Performance Considerations
There are a couple of things to keep in mind regarding Snowflake Copilot's performance:
a) Potential Response Delays
Depending on the complexity of your request and system load, Snowflake Copilot might take a second or two to respond. While it's generally quite snappy, don't expect instantaneous responses 100% of the time.
b) SQL Suggestion Inaccuracies
Snowflake Copilot might be impressively accurate most of the time, but it's not flawless. The SQL queries it suggests might sometimes contain syntax errors or reference non-existent tables or columns. So, always review the suggested queries before running them, and be prepared to do some troubleshooting or refinement.
Remember, Snowflake Copilot is an LLM-powered assistant. It's there to help streamline your work, but it doesn't replace the need for your expertise and oversight.
4) Scope Limitations
Snowflake Copilot has some limitations in terms of how much it can "see" at once:
a) Limited Table and Column Consideration
When generating a response, Snowflake Copilot first searches for the most relevant tables and columns for your request. It then ranks these by relevancy and only considers the top 10 tables and top 10 columns from each of those tables. This helps Snowflake Copilot provide focused, relevant responses, but it means it might miss less obvious connections in very complex schemas.
b) Delay in Recognizing New Database Objects
There's a lag of about 3-4 hours before Snowflake Copilot recognizes newly created databases, schemas, and tables. So if you've just set up a new table, don't expect Snowflake Copilot to know about it right away.
5) Custom Instructions Limitations
While the Snowflake Copilot’s custom instructions feature is powerful, it does have some constraints:
a) 2000 Character Limit
You've got 2000 characters to work with for your custom instructions. For most use cases, this is plenty, but if you have very complex or detailed instructions, you might need to prioritize what's most important.
b) Plain English Requirement
Custom instructions need to be in plain English. You can't use code or special formatting here.
c) User-Specific Nature of Instructions
Custom instructions are specific to the user who set them. This is great for personalization, but it means that if multiple people are working on the same project, they might get slightly different results from Snowflake Copilot unless they coordinate their custom instructions.
So, always be very concise and clear when setting custom instructions, and you might need to coordinate with team members if you're all using Snowflake Copilot for shared projects.
Snowflake Copilot Best Practices—Tips for Achieving Optimal Results
Now that we've covered the capabilities and limitations of Snowflake Copilot, let's talk about how to use it most effectively. Following these tips will help you work around some of these limitations and get the most out of this powerful tool. Let's dive right in!
1) Create Curated Views for Improved Performance and Results
One of the most effective ways to enhance your experience with Snowflake Copilot is to create curated views of your data. Here's why and how:
a) Simplify Complex Schemas
If you're working with a complex data model spread across multiple tables, create views that join these tables together. This can help Snowflake Copilot understand the relationships between your data more easily.
b) Predefine Common Metrics
If there are calculations or metrics you use frequently, include these in your views. For example, if you often calculate profit as (revenue - cost), create a view with a profit column that does this calculation.
c) Overcome Cross-Schema Limitations
Remember that Snowflake Copilot cannot generate queries across multiple schemas. You can overcome this limitation by creating views that integrate data from several schemas.
2) Use Descriptive Names for Databases, Schemas, Tables, and Columns
Snowflake Copilot relies heavily on the names of your database objects to understand what data is available and how it's structured. Using clear, descriptive names can significantly improve Snowflake Copilot's ability to generate relevant and accurate responses.
a) Be Specific
Instead of generic names like "table1" or "col1", use names that describe what the data represents. For example, "customer_orders" or "total_revenue".
b) Use Consistent Naming Conventions
Stick to a consistent naming convention across your database. This could be snake_case, camelCase, or whatever works for you.
c) Avoid Abbreviations
While "cust" might be clear to you as an abbreviation for "customer", it might not be as clear to Snowflake Copilot. Use full words wherever it's possible.
3) Make sure to use Appropriate Data Types for Columns
Using the correct Snowflake data types for your columns not only helps with data integrity and query performance but also helps Snowflake Copilot in understanding your data structure better.
a) Use Date/Time Types Appropriately
If a column represents a date or time, use DATE, TIME, or TIMESTAMP types instead of VARCHAR.
b) Use Numeric Types for Numbers
For columns containing numerical data, use appropriate numeric types (INT, FLOAT, DECIMAL) rather than storing them as strings.
c) Consider Using BOOLEAN for Flag Columns
If you have columns that represent yes/no or true/false values, consider using BOOLEAN type instead of INT or VARCHAR.
4) Be Specific and Guided in Your Questions
When interacting with Snowflake Copilot, the more specific and guided your questions are, the better results you'll get.
a) Start Broad, Then Narrow Down
Start with a general question, then ask follow-up questions to refine the results.
Reference Specific Tables or Columns: When asking about specific data, mention the table or column names. Snowflake Copilot recognizes names prefixed with '@', so use this syntax for clarity.
b) Provide Context
If your question relates to a specific business rule or calculation, include that information in your request.
Conclusion
And that's a wrap! Snowflake Copilot is an LLM-powered assistant that revolutionizes data analysis by converting plain natural English language into SQL queries, simplifying complex tasks, and making data exploration intuitive. Seamlessly integrated into your Snowflake workflow—whether in SQL Worksheets or Notebooks—Copilot is always ready to assist. Its robust security and governance features guarantee that it respects your Role-Based Access Control (RBAC) settings, providing suggestions based only on accessible datasets. Now that you have a thorough grasp of Snowflake Copilot's features, restrictions, and best practices, you can use it to its fullest potential to improve your data analysis procedures.
In this article, we have covered:
- What is Snowflake Copilot?
- Power of Snowflake Copilot—Capabilities and Use Cases
- Pricing and Cost Structure of Snowflake Copilot
- Step-by-Step Guide to Getting Started with Snowflake Copilot
- Snowflake Copilot Best Practices—Tips for Achieving Optimal Results
…and so much more!
FAQs
What is Snowflake Copilot?
Snowflake Copilot is an LLM-powered assistant designed to simplify and enhance your data analysis workflow within the Snowflake ecosystem.
What technology powers Snowflake Copilot?
Snowflake Copilot is powered by Snowflake Cortex, a fully managed AI service that runs within the Snowflake ecosystem.
In which regions is Snowflake Copilot currently available?
Snowflake Copilot is available in AWS us-east-1, AWS us-west-2, and AWS eu-central-1 regions.
How does Snowflake Copilot handle security?
Snowflake Copilot respects existing Role-Based Access Control (RBAC) settings and only provides suggestions based on datasets the user has permission to access.
Can Snowflake Copilot generate SQL queries?
Yes, Snowflake Copilot can generate SQL queries based on natural language descriptions of the desired data.
Does Snowflake Copilot support languages other than English?
No, currently Snowflake Copilot only supports English for natural language interactions.
Can Snowflake Copilot execute the SQL queries it generates?
Yes, users can execute generated SQL queries directly from the Copilot interface or add them to their Snowflake worksheets.
Does Snowflake Copilot have access to the actual data in tables?
No, Snowflake Copilot can see table structures, column names, and data types, but not the actual data values.
Can Snowflake Copilot generate queries across multiple databases or schemas?
No, currently Snowflake Copilot cannot generate queries that span multiple databases or schemas.
How long does it take for Snowflake Copilot to recognize newly created database objects?
There's a lag of about 3-4 hours before Snowflake Copilot recognizes newly created databases, schemas, and tables.
What is the character limit for custom instructions in Snowflake Copilot?
The custom instructions feature has a 2000 character limit.
Is Snowflake Copilot currently free to use?
Yes, Snowflake Copilot is free to use for eligible customers until July 31, 2024.
Can Snowflake Copilot optimize existing SQL queries?
Yes, users can ask Copilot to analyze queries for efficiency and suggest optimizations.
How many tables and columns does Snowflake Copilot consider when generating a response?
Snowflake Copilot considers the top 10 most relevant tables and top 10 columns from each of those tables.
Can Snowflake Copilot explain existing SQL queries?
Yes, users can ask Snowflake Copilot to explain what a given SQL query does.
Does Snowflake Copilot support query languages other than SQL?
No, Snowflake Copilot only supports SQL for query generation and analysis.
Can Snowflake Copilot be used as a learning resource for Snowflake concepts?
Yes, users can ask Copilot questions about Snowflake concepts, features, or best practices.
Are custom instructions in Snowflake Copilot user-specific?
Yes, custom instructions are specific to the user who set them and do not apply to other users.
Can Snowflake Copilot create views or modify database structures?
No, Snowflake Copilot cannot create views or modify database structures. It can only generate and execute queries based on existing structures.
Can Snowflake Copilot access my actual data?
No, Snowflake Copilot does not have direct access to the actual data in your tables. It can only see table structures, column names, and data types.
How accurate are Snowflake Copilot's SQL suggestions?
While generally accurate, Snowflake Copilot's SQL suggestions may occasionally contain syntax errors or reference non-existent tables or columns. Users should always review suggested queries before running them.
How can I provide feedback on Snowflake Copilot's performance?
You can provide feedback on Snowflake Copilot's performance through the built-in feedback mechanism in the Snowflake Copilot interface.