HOW TO: Import & Run Databricks Notebook From Another Notebook (2024)
Databricks Notebook is an interactive development environment where you can write and run code collaboratively across languages like Python, R, SQL, and Scala. It's great for exploring, analyzing, and visualizing data, as well as machine learning tasks. But managing complex projects in a single notebook can become cluttered and slow. One of the primary advantages of Databricks Notebooks is the ability to split your code into smaller, modular chunks. You can do this by importing and running one Databricks Notebook from another notebook. This approach makes your code more reusable, organized, and maintainable. It also reduces redundancy, makes your code more readable, and speeds up the debugging process. Breaking down complex workflows into smaller notebooks allows you to more easily maintain, debug, and collaborate with others. Plus, this approach also clarifies your code, allowing you to develop and maintain it more quickly.
In this article, we will cover everything you need to know on how to import and execute one Databricks Notebook from another, focusing on two key techniques: the Databricks %run
command and the dbutils.notebook.run()
method.
What is Databricks Notebook?
Databricks Notebook is a collaborative workspace where you can write and run code, visualize data, and work with others in real time. You can use it with multiple programming languages— Python, R, SQL, Scala, and more—all within the same environment—allowing you to develop data science and machine learning workflows easily. Plus, Databricks Notebooks simplify the process of creating and sharing data-driven projects with features like real-time coauthoring, built-in visualizations, and automatic versioning.
Here are some key notable characteristics of Databricks Notebooks:
➥ Multi-language support — Databricks Notebook allows you to write code in multiple languages like Python, Scala, R, and SQL within the same notebook.
➥ Cell-based execution — Databricks Notebook organizes code into cells. You can run individual cells independently, which allows you to test and debug smaller code sections before executing the full notebook.
➥ Interactive execution — Databricks Notebook supports interactive execution, letting you run code in real time and view outputs instantly.
➥ Easy collaboration — Databricks Notebook allows real-time collaboration, so you and your team can work on the same notebook simultaneously. Comments and version control simplify teamwork, reducing conflicts and streamlining workflows.
➥ Code modularization — Databricks Notebook supports code modularization by allowing you to import and run other Databricks Notebooks.
➥ Built-in visualizations — Databricks Notebook allows you to generate visualizations directly from your data. You can create charts and graphs without writing extensive code, helping you share data insights faster.
➥ Version control — Databricks Notebook automatically tracks changes. You can revert to previous versions if needed, so that you can have better control over your code history.
➥ Magic Commands — Databricks Notebooks support special commands (magic commands) that simplify certain tasks like running SQL queries directly from Python cells or managing notebook lifecycle.
➥ Dashboard creation — Databricks Notebook enables you to build dashboards to present/display results.
➥ MLflow Integration — For machine learning tasks, Databricks Notebooks integrate with MLflow for tracking experiments, managing models, and deploying them to production.
➥ Easy Scheduling — Databricks Notebook can be easily scheduled to run at specific times, useful for automating data refresh, reports, or model retraining.
… and so much more!
Check out this article for a more detailed understanding of Databricks Notebooks.
Databricks Notebooks pack a lot of features. They are great for developing, testing, and collaborating on data projects.
Save up to 50% on your Databricks spend in a few minutes!
Now that you have a solid understanding of Databricks Notebooks and their role in modularizing code, let’s explore some techniques on how to import and run one Databricks Notebook from another notebook.
Step-by-Step Guide to Import & Run One Databricks Notebook From Another Notebook
In this section, we’ll go through three different techniques for importing and running one Databricks Notebook from another.
We’ll start with the simplest and most straightforward method—using the Databricks %run
command. If you’re looking for quick code inclusion without any complexities, this is the technique for you. So, let’s dive right in!
Prerequisites
First things first, make sure that the following prerequisites are met:
- You have to have access to a Databricks workspace.
- Store both notebooks in the same directory or make sure you know their relative paths.
- The notebook you intend to import (child notebook) has the necessary permissions for reading and execution.
- Your Databricks Notebooks are attached to an active Databricks cluster. Code execution requires a running cluster.
- You have the required permissions to read and execute the target notebook.
- Verify that your environment's Databricks runtime version supports the
%run
command anddbutils.notebook.run()
method. - If the called notebook uses Databricks Widgets for parameter input, ensure they're properly configured to accept parameters when invoked.
Note: Although Databricks provides tools like Jobs for orchestration and workspace files for code modularization, the methods below are great for situations where these options aren't suitable, such as when dealing with dynamic parameters or lacking workspace file access.
🔮 Technique 1—Run Databricks Notebook From Another Notebook Using Databricks %run Command
Databricks %run
command is designed to include and execute another notebook's code within the context of the current notebook. This method is particularly useful for code modularization, where you might want to separate utility functions or data preprocessing logic into distinct notebooks. It doesn't support passing parameters or returning values, but it excels in scenarios where you just need to incorporate one notebook's content into another's execution environment.
Step 1—Log in to Databricks
First things first—head over to your Databricks account and log in with your credentials to access your workspace.
Step 2—Navigate to the Workspace
Once logged in, find the Databricks Workspace section in the left sidebar where your notebooks are stored.
Step 3—Locate the Child Notebook You Want to Run
It’s time to locate the notebook you want to include—let’s call this the child notebook. If you have already created it, navigate through your workspace folders, find it, and confirm its location. If you don’t have a child notebook yet, no problem—you can quickly create one for this demo.
Here’s how you can create a new child notebook:
Head over to the Databricks Workspace section, click on the dropdown arrow next to your desired folder or directory, then select Create > Notebook.
Name it, for example, child_notebook_demo, and select a programming language like Python.
Next, add the following sample code to your child notebook:
# Child Notebook: child_notebook
def greet(name):
return f"Hello there, {name}!"
def add(a, b):
return a + b
# Print a message to indicate child notebook has run
print("🔮 Child notebook has run successfully 🔮")
Save the notebook once you’ve added the code.
Now you have a fully functional child notebook to include in your parent notebook. In the next step, we’ll identify the path of this notebook and use it to execute it seamlessly.
Step 4—Identify the Child Notebook Path
Right-click on your child notebook in the workspace. Choose “Copy Path” or “Copy URL/File Path” to get the notebook's path. This path is essential for the Databricks %run
command to locate and execute the notebook.
For example, the path might look like this:
/Users/<user-name>@gmail.com/Import_Run_Databricks_Notebook_From_Another_Notebook/child_notebook_demo
Step 5—Configure Databricks Compute
Before running notebooks, make sure you have a compute resource (cluster or SQL warehouse) set up or selected. If you don’t have a cluster, go to “Compute” in the left sidebar, click “Create Compute”, and set up one with suitable configurations for your job.
Attach your Databricks Notebook to this cluster or an existing one that's running.
Step 6—Open the Parent Notebook to Run the Command From
Navigate to or create the notebook from which you want to run the child notebook, let's call this parent_notebook_demo.
Step 7—Execute Databricks %run
Command
In your parent notebook, add a new cell with the following command, replacing the path with your child notebook's path:
%run /Users/<user-name>@gmail.com/Import_Run_Databricks_Notebook_From_Another_Notebook/child_notebook_demo
Executing this cell will run all the code in the child notebook, making its functions and variables available in the parent Databricks notebook's scope.
Step 8—Import Function from Another Notebook
Test the execution by calling a function or printing a variable from the child notebook. Here is one very simple example:
a = 10
b = 20
sum_result = add(a, b)
print(f"The sum of {a} and {b} is {sum_result}.")
name = "Elon Musk"
greeting_message = greet(name)
print(greeting_message)
Step 9—Run the Databricks Notebook Cell
Execute the cell containing the Databricks %run
command along with the subsequent code testing the imported functions.
a = 10
b = 20
sum_result = add(a, b)
print(f"The sum of {a} and {b} is {sum_result}.")
name = "Elon Musk"
greeting_message = greet(name)
print(greeting_message)
Remember, when using Databricks %run command:
🔮 All functions, variables, and other definitions from the child notebook are now in the parent's scope.
🔮 There's no direct way to return values or pass parameters to the child notebook; everything is executed in the same context.
🔮 If your child notebook has dependencies or uses specific libraries, ensure they are installed on your cluster or defined within the notebooks themselves.
Now that you’ve mastered the Databricks %run
command, let’s take things up a notch. What if you need to pass parameters or retrieve values from the child notebook? That’s where dbutils.notebook.run()
comes into play. Let’s explore this next!
🔮 Technique 2—Run Databricks Notebook From Another Notebook Using Databricks dbutils.notebook.run()
Method
The dbutils.notebook.run()
method in Databricks allows you to run another notebook as a separate job, passing parameters to it and capturing its return value. This technique is especially useful for creating dynamic workflows or when you need to manage data flow between notebooks with specific input and output requirements.
Step 1—Log in to Databricks
Log in to your Databricks workspace using your credentials.
Step 2—Navigate to the Databricks Workspace
Use the sidebar to navigate to the “Workspace” section where all your notebooks are stored.
Step 3—Locate the Child Notebook You Want to Run
Similar to before, find or create the notebook you want to execute from another notebook, which we'll refer to as the "child notebook".
a) Create a New Notebook
- Click on the dropdown next to your chosen folder, select Create > Notebook.
- Name your notebook, for example, child_notebook_param, and choose Python.
b) Set Up Databricks Widgets to Accept Parameters
Use dbutils.widgets to define parameters for your notebook. Databricks widgets allow the parent notebook to pass inputs dynamically. Here’s an example:
# Import necessary modules for widgets
import json
# Set up widgets to accept parameters
dbutils.widgets.text("name", "Default Name", "Enter name")
dbutils.widgets.text("age", "30", "Enter age")
# Retrieve widget values
name = dbutils.widgets.get("name")
age = dbutils.widgets.get("age")
# Example processing with parameters
result = f"{name} is {age} years old."
# Return the result as a string using dbutils.notebook.exit
dbutils.notebook.exit(result)
Step 4—Identify the Child Notebook Path
To run the child notebook from your parent notebook, you need its path. For that, right-click on your child notebook in the workspace. Select Copy Path to get the notebook's path, which will look something like:
/Users/<user-name>@gmail.com/notebooks/child_notebook_param
Optional—Configure Parameters for the Databricks Notebook
Prepare any parameters you want to pass to the child notebook as a dictionary in your parent notebook:
# In parent notebook
parameters = {
"name": "Jeff Bezos",
"age": "58"
}
Step 5—Configure Databricks Compute
Now make sure you have a compute resource set up. To do so, head over to “Compute” on the sidebar, and if needed, click “Create Compute” to set up a new one or select an existing one. Make sure your cluster is running or set it to start automatically if it's idle.
Step 5—Call the Child Notebook Using dbutils.notebook.run()
The syntax for dbutils.notebook.run()
is as follows:
dbutils.notebook.run(notebook_path, timeout_seconds, arguments)
Here:
- notebook_path is the path to the notebook you want to run.
- timeout_seconds is the maximum time to wait for the notebook to run, in seconds. A value of 0 means no timeout.
- arguments is a dictionary of key-value pairs representing the parameters to pass to the target notebook's widgets. Both keys and values must be strings.
Now, in your parent notebook, call the child notebook using dbutils.notebook.run()
like this:
# Parent Notebook
# Define parameters to pass
parameters = {
"name": "Jeff Bezos",
"age": "58"
}
# Run the child notebook and capture its return value
result = dbutils.notebook.run("/Users/<user-name>@gmail.com/notebooks/child_notebook_param", timeout_seconds=60, arguments=parameters)
# Print the result to confirm execution
print(f"Child notebook returned: {result}")
Here, 60
is the timeout in seconds for the child notebook execution. If the child notebook doesn't finish within this time, an exception will be thrown.
Step 6—Parse and Handle Return Values
Since dbutils.notebook.run()
returns a string, you might need to parse this if you're expecting structured data:
# in the parent notebook
# Assuming the child notebook returned a JSON formatted string
try:
parsed_result = json.loads(result)
print(f"Parsed result: {parsed_result}")
except json.JSONDecodeError:
print("The returned value is not JSON formatted.")
# Or if it's just a simple string:
print(f"The result from the child notebook: {result}")
Remember, when using Databricks dbutils.notebook.run() method:
🔮 dbutils.notebook.run() method runs the child notebook in a new job context, meaning variables and functions defined in the child notebook are not directly accessible in the parent.
🔮 Child notebook must have widgets set up to accept the parameters you pass, or those parameters will be ignored.
🔮 Return value from a child notebook is limited to a string, so complex data types like DataFrames need to be serialized (to JSON) before exiting.
🔮 If the child notebook fails, you'll need to handle exceptions in the parent notebook, as dbutils.notebook.run() will throw an exception upon failure.
Databricks %run
Command vs dbutils.notebook.run()
Method—Which One is Better?
Use Databricks %run
Command:
➥ Databricks %run
command copies and executes the contents of another notebook directly within the context of the current notebook. All functions, variables, and imports from the child notebook become part of the parent notebook's execution environment.
➥ Databricks %run
command is great for modularizing code. You can keep helper functions, data preprocessing steps, or common configurations in separate notebooks and %run
them where needed. This promotes code reuse and maintainability.
➥ The execution of Databricks %run
command is synchronous; the parent notebook waits for the child to finish before continuing.
➥ Databricks %run
does not support passing parameters or returning values directly. All data must be shared through the notebook's shared scope.
➥ Since the code from the child notebook runs in the same context, there's a potential for naming conflicts if not managed properly.
Use dbutils.notebook.run()
Method:
➥ Databricks dbutils.notebook.run()
method runs the child notebook as a separate job. Each notebook maintains its own Spark session, which means variables and functions are not directly accessible in the parent notebook.
➥ Databricks dbutils.notebook.run()
method allows you to pass parameters to the child notebook using widgets. This is crucial for dynamic workflows where different parameters might be needed each run.
➥ Databricks dbutils.notebook.run()
method can return a single string from the child notebook back to the parent, which can then be parsed or used as needed. This supports more complex workflow management where you might need to act on the result of a notebook's execution.
➥ You can specify a timeout, giving you control over how long you're willing to wait for the child notebook to execute.
➥ Databricks dbutils.notebook.run()
method is typically synchronous, with additional orchestration (like using jobs), you could achieve asynchronous behavior or manage dependencies between notebooks.
➥ Each notebook run in isolation reduces the risk of scope pollution but requires explicit data passing.
So, which one is better?
🔮 Databricks %run is simpler for including another notebook's code directly in your current context, but it lacks parameter passing and return values. Databricks dbutils.notebook.run() adds complexity with parameter passing and return values but offers more control over execution, especially in orchestrated workflows.
🔮 If you need isolation or are concerned about scope, Databricks dbutils.notebook.run() method is preferable. If direct code reuse and sharing context are more important, Databricks %run command is the way to go.
🔮 Databricks %run might be faster since there's no job overhead, but for larger notebooks or those with extensive Spark operations, Databricks dbutils.notebook.run() might optimize resource usage by running each as a separate job.
TL;DR: Neither is strictly “better” as both serve different purposes. Your choice should be based on your specific needs for modularity, parameter handling, workflow complexity, and data isolation within your Databricks environment. For simple, straightforward inclusion of code, Databricks%run
is better. For building complex data pipelines or workflows with dynamic parameters, Databricksdbutils.notebook.run()
provides more flexibility.
🔮 Technique 3—Import Databricks Notebook Using Databricks UI
Finally, let's dive into the UI method, here in this method we will export and import an entire Databricks Notebook file into your Databricks workspace, not executing one notebook from another. This particular technique is crucial for moving notebooks between different workspaces, backing up notebooks, or even sharing them in a format that can be easily imported by others.
Note that this technique does not actually "import" in the sense of code execution but rather imports the entire notebook into the workspace.Step 1—Log into Databricks
First, start by logging into your Databricks workspace with your credentials.
Step 2—Navigate to the Databricks Workspace
From the left sidebar, select Databricks Workspace to access all of your Databricks Notebooks.
Step 3—Locate the Notebook and Export It
Find the Databricks Notebook you wish to export in your workspace. To do that, right-click on the notebook name. From the dropdown menu, select “Download as”. Then, choose the "DBC archive" (Databricks Archive) format. This format includes the notebook's source code and cell outputs if not cleared.
Here's how the process looks:
Right-click on notebook_to_export and then click Download > DBC Archive.
Save the file to your local machine or a shared location.
Step 4—Open the Workspace
Navigate back to your Databricks workspace or to the folder where you want to import the notebook. Then, right-click on the workspace or a folder where you want to place the exported notebook.
Step 5—Import the Child Notebook
Select Import from the context menu. This will open a file dialog where you can navigate to where you saved the .dbc file. Select the .dbc file you just exported (e.g., notebook_to_export.dbc) and click Open.
Databricks Notebook will now be imported into your workspace. The structure of the .dbc
file will be recreated in your selected directory, meaning if the .dbc
contained a folder structure, it will be mirrored in your workspace.
Step 6—Configure Databricks Compute
Now make sure you have a compute resource set up. To do so, head over to “Compute” on the sidebar, and if needed, click “Create Compute” to set up a new one or select an existing one. Make sure your cluster is running or set it to start automatically if it's idle.
Step 7—Run the Databricks Notebook
Once you have set up the Databricks compute clusters and your Databricks Notebook has been imported. Find the newly imported Databricks Notebook in your workspace. Open it by clicking on its name.
Attach the Databricks Compute that you have just created. And finally, execute it.
To execute it, you can use the "Run All" button at the top of the notebook interface for running all cells. Or, run individual cells by clicking the run button for each cell or pressing Shift + Enter.
Want to take Chaos Genius for a spin?
It takes less than 5 minutes.
Conclusion
And that's a wrap! Importing and running one Databricks Notebooks from another not only streamlines your workflow but also speeds up code modularity, code reuse, and collaborative efficiency. Opt for the Databricks %run
command when you need simple code inclusion, leverage dbutils.notebook.run()
for more complex, parameter-driven workflows, and use the UI import method for sharing or moving notebooks between different environments.
In this article, we have covered:
- What is Databricks Notebook?
- Step-By-Step Guide to Import & Run One Databricks Notebook From Another Notebook
- 🔮 Technique 1—Run Databricks Notebook From Another Notebook Using Databricks %run Command
- 🔮 Technique 2—Run Databricks Notebook From Another Notebook Using Databricks dbutils.notebook.run() Method
- Databricks %run Command vs dbutils.notebook.run() Method—Which One is Better?
- 🔮 Technique 3—Import One Databricks Notebook From Another Notebook Using Databricks UI
… and so much more
FAQs
What is the purpose of Databricks Notebook?
Databricks notebooks provide an interactive, collaborative environment for data scientists and engineers to write, execute, and share code in languages like Python, SQL, Scala, and R, focusing on data analytics, machine learning, and ETL processes.
How to run a notebook in Databricks?To run a notebook in Databricks, open the desired notebook, attach it to a cluster, and click the Run All button to execute all cells sequentially. Or run individual cells by clicking the Run button within each cell or using the keyboard shortcut Shift + Enter.
Is Databricks Notebook the same as Jupyter Notebook?
No, Databricks Notebooks and Jupyter Notebooks are not the same. Databricks Notebook is a part of the Databricks platform, offering seamless integration with Apache Spark, real-time collaboration, and built-in features like version control. Jupyter Notebook, on the other hand, is open source, flexible, and supports multiple environments, but requires separate setup and management.
What is the main function of Databricks Notebook?
The main function is to enable users to create, test, and document code in a collaborative, interactive environment while leveraging Databricks’ distributed computing capabilities for data engineering, machine learning, and analytics.
What is the %run command in Databricks?
Databricks %run
command is used to execute another notebook within the context of the current notebook, allowing for code modularization by including and running external notebook scripts.
How do I use the %run command to import a notebook?
Use %run /path/to/notebook
in a cell of your current notebook to execute the content of the specified notebook within the current execution context.
Can I pass parameters to the imported notebook using %run?
No, the Databricks %run
command does not support passing parameters. For this functionality, use dbutils.notebook.run()
instead.
Can I use %run and dbutils.notebook.run() together?
Yes, you can use both methods within the same notebook, but they serve different purposes. Databricks %run command for direct code inclusion, dbutils.notebook.run()
for running notebooks as separate jobs with parameter passing and return values.
Are there any limitations when using %run?
Yes, Databricks %run
command does not support passing parameters or returning values to the calling notebook. It also runs the imported code in the same context, potentially leading to naming conflicts.
How can I import a function from one notebook into another?
Use the %run command to include the notebook containing the function, after which you can directly call the function within your current notebook.
Can I run a notebook from another notebook on a different cluster?
No, notebooks must be run on the same cluster when using Databricks %run
command or dbutils.notebook.run()
. For cross-cluster execution, consider using Databricks Jobs for workflow orchestration across clusters.