Databricks Runtime 101—A Comprehensive Overview (2024)

Databricks Runtime (DBR) is like the engine that drives your Databricks cluster's brainpower. It's built around Apache Spark plus a bunch of other tools for data processing, file systems, and other supporting technologies—all packaged together neatly. In this article, we will cover everything you need to know about Databricks Runtime in detail, covering its core components, and key features. We'll also explore the different Databricks Runtime versions, including their features, updates, and support lifecycles. Plus, we'll walk you through how to upgrade to the latest 15.x version.

What is Databricks Runtime?

Databricks Runtime is the set of core software components that run on the compute clusters managed by Databricks. It is built on top of a highly optimized version of Apache Spark but also adds a number of enhancements and additional components and updates that substantially improve the usability, performance, and security of big data analytics.

So, what's inside Databricks Runtime? Here are its key features and main components:

1) Optimized Apache Spark: Apache Spark is the powerhouse behind Databricks Runtime. It's a robust framework for big data processing and analytics. Every Databricks Runtime version comes with its own Apache Spark version, plus some additional improvements and fixes. Take Databricks Runtime 15.3, for example—it's got Apache Spark version 3.5.0, which means you get a bunch of bug fixes, ML features, and performance boosts.

2) DBIO (Databricks I/O) module: To speed things up, we've got the DBIO (Databricks I/O) module, which optimizes the I/O performance of Spark in the cloud, meaning you can process data faster and more efficiently.

3) DBES (Databricks Enterprise Security): For added security, DBES modules provide features like data encryption, fine-grained access control, and auditing. This guarantees you meet enterprise security standards.

4) Delta Lake: Delta Lake is a storage layer built on top of Apache Spark. This means you get: ACID transactions, scalable metadata handling, unified streaming + batch data processing, and more.

5) MLflow Integration: Databricks Runtime includes MLflow for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment.

6) Rapid Releases and Early Access: Databricks provides quicker release cycles compared to open source releases, offering the latest features and bug fixes to its users early.

7) Pre-installed Libraries: Databricks Runtime has pre-installed Java, Scala, Python, and R libraries. They enable a wide range of data science and machine learning tasks.

8) GPU Support: Need to speed up machine learning and deep learning? Databricks Runtime's got you covered with GPU libraries that do just that.

9) Databricks Services Integration: Databricks Runtime works seamlessly with other Databricks services, like notebooks, jobs, and cluster management, making your workflow smoother.

10) Auto-scaling: Automatically scales compute resources up and down based on workload demand, reducing operational complexity and cost

As of the time of writing, the latest version of Databricks Runtime is 15.3—we're skipping 15.4 LTS since it's still in beta. This new version has some useful upgrades and features, including:

  • Updated Apache Spark 3.5.0 core
  • Performance improvements for Delta Lake operations
  • Enhanced support for GPU-accelerated computing
  • New Python and R packages for data science and machine learning
  • Several improvements to metadata processing performance
  • Fixes for the Python streaming data source timeout issue with large trigger intervals
  • Performance improvement for some window functions
  • Improved integration with cloud services

… and so much more!

These continuous updates keep Databricks Runtime at the cutting edge of big data and machine learning.

Want to take Chaos Genius for a spin?

It takes less than 5 minutes.

Enter your work email
Enter your work email

Databricks Runtime Versions—Features, Updates, Support Lifecycle

Databricks is always pushing out fresh versions of its runtime to bring you new features, better performance, and tighter security. To get the most from your Databricks setup, you need to know how the different versions work and their status. We have a list of all the recent Databricks Runtime versions. It includes their types, what Apache Spark version they use, when they came out, and when support will end.

1. Databricks Runtime 15.4 LTS (Beta)

  • Editions/Types:
  • Apache Spark Version: 3.5.0
  • Release Date: July 23, 2024
  • End of Support (EOS) Date: Not Specified

2. Databricks Runtime 15.3

  • Editions/Types:
  • Apache Spark Version: 3.5.0
  • Release Date: June 24, 2024
  • End of Support (EOS) Date: December 24, 2024

3. Databricks Runtime 15.2

  • Editions/Types:
  • Apache Spark Version: 3.5.0
  • Release Date: May 22, 2024
  • End of Support (EOS) Date: November 22, 2024

4. Databricks Runtime 15.1

  • Editions/Types:
  • Apache Spark Version: 3.5.0
  • Release Date: April 30, 2024
  • End of Support (EOS) Date: October 30, 2024

5. Databricks Runtime Version 14.3 LTS

  • Editions/Types:
  • Apache Spark Version: 3.5.0
  • Release Date: February 1, 2024
  • End of Support (EOS) Date: February 1, 2027

6. Databricks Runtime Version 14.2

  • Editions/Types:
  • Apache Spark Version: 3.5.0
  • Release Date: November 22, 2023
  • End of Support (EOS) Date: October 1, 2024

7. Databricks Runtime Version 14.1

  • Editions/Types:
  • Apache Spark Version: 3.5.0
  • Release Date: October 11, 2023
  • End of Support (EOS) Date: October 1, 2024

8. Databricks Runtime Version 13.3 LTS

  • Editions/Types:
  • Apache Spark Version: 3.4.1
  • Release Date: August 22, 2023
  • End of Support (EOS) Date: August 22, 2026

9. Databricks Runtime Version 12.2 LTS

  • Editions/Types:
  • Apache Spark Version: 3.3.2
  • Release Date: March 1, 2023
  • End of Support (EOS) Date: March 1, 2026

10. Databricks Runtime Version 11.3 LTS

  • Editions/Types:
  • Apache Spark Version: 3.3.0
  • Release Date: October 19, 2022
  • End of Support (EOS) Date: October 19, 2025

11. Databricks Runtime Version 10.4 LTS

  • Editions/Types:
  • Apache Spark Version: 3.2.1
  • Release Date: March 18, 2022
  • End of Support (EOS) Date: March 18, 2025

12. Databricks Runtime Version 9.1 LTS

  • Editions/Types:
  • Apache Spark Version: 3.1.2
  • Release Date: September 23, 2021
  • End of Support (EOS) Date: September 23, 2024

What is LTS in Databricks Runtime?

LTS stands for Long-Term Support in the context of Databricks Runtime. LTS versions are specially designated releases that receive extended support and maintenance from Databricks. These versions are designed for customers who prioritize stability and predictability in their data infrastructure.

Here are the benefits of Long-Term support versions:

  • Long-term Support: You get support for two to three whole years, which is way longer than the one year you get with standard/regular releases.
  • Stable: LTS versions are super stable because they're put through extra testing. That makes them perfect for production environments.
  • Easily Predictable Update/Release Cycle: With Long-Term Support, you know exactly when you'll need to upgrade, so you can plan ahead and avoid constant major changes.
  • Frequent Security Updates: LTS versions get critical bug fixes and security patches during their extended support period.

So, what's the main difference between LTS and regular releases? It's all about how you plan to use them and how long you'll get support. Regular releases bring new features faster, but you'll get updates more often and support won't last as long. They're perfect for anyone who wants the latest and greatest and don't mind updating frequently. LTS versions are better for organizations that need stability and support for a longer time, even if it means they won't get the newest features right away.

How to Upgrade Databricks Runtime to the Latest 15.x Version?

Want the latest features, improved performance, and better security? Upgrade your Databricks Runtime to the newest version. Here's how to do it in a few easy steps:

Step 1—Review Release Notes and Compatibility

First and foremost, don't migrate until you've reviewed the release notes and compatibility guides for both Databricks Runtime x.x (current version) and 15.x. Take a close look at the new features, improvements, and any changes in 15.x that might affect your workflow.

See Databricks Runtime release notes versions and compatibility

Step 2—Login to Databricks

Start by opening your web browser and heading to your Databricks workspace URL. Next, log in to your Databricks account with your credentials.

Databricks Login Page

Step 3—Access the Databricks Cluster Page

Once logged in, head over to the Compute page in your Databricks workspace. You'll land on the Databricks Clusters page, where you'll see a few tabs. Hit the All-Purpose compute tab. Now you'll see a list of your existing cluster resources, their status, and details. Look to the top right corner—there's a "Create Compute" button. Click it to start setting up a brand new isolated Spark environment.

Creating a New Databricks Cluster

You can get to the same place to create a new cluster by clicking +NewCluster or using the above option. Either way, you'll end up on the New Databricks Cluster page.

Creating a New Databricks Cluster

Step 4—Select or Create a Databricks Cluster

If you're upgrading a cluster, find and select it from the list. If you're creating a new cluster with the latest runtime, click “Create Compute”.

Creating a New Databricks Cluster
See Step-By-Step Guide to Create Databricks Cluster

Step 5—Create a Test/Dev Environment (Not in Prod)

Set up a development/test environment with a cluster running Databricks Runtime 15.x. This way, you can test your workflows, libraries, and dependencies in the new runtime without messing with your production environment.

Step 6—Configure Cluster for Databricks Runtime 15.x

Head to the cluster configuration page and find the “Databricks Runtime Version” dropdown. Pick the 15.x version you want—like 15.4 LTS Beta, 15.3, 15.2, or 15.1.

Picking a Databricks Runtime Version

If you're using specific editions/types like ML, make sure you select the correct one.

Picking a Databricks ML Runtime Version

Step 7—Apply and Restart Cluster

Pick your new runtime version and make any other changes you need to. Then, hit Edit Apply Changes / Confirm.

Editing Databricks cluster configurations
Confirming Databricks cluster configurations

If you're updating an existing cluster, you'll need to restart it to make the changes count. Just click "Restart" to do that. For a brand new cluster, click "Create Cluster" to get it started with the new runtime.

Step 8—Verify the Upgrade

After your cluster is up and running, double-check that it's got the right Databricks Runtime version. To do that, head to the configuration page and look for the "Databricks Runtime Version" field. Or, you can run a simple notebook cell and type in the following simple command to print the Databricks Runtime version:

import os
os.environ["DATABRICKS_RUNTIME_VERSION"]
Verifying Databricks Runtime upgrades

As you can see, this should output the current Databricks Runtime version (which is 15.4 beta)

Step 9—Update Libraries and Dependencies

Make sure the libraries and dependencies in your workflows work with Databricks Runtime 15.x. You might need to update them to the latest versions or find new ones that work. For instance, if you're moving to 15.3, take a look at these:

Step 10—Validate Workflows and Jobs

Test your current workflows and jobs in the dev environment to make sure they're working as they should with Databricks Runtime 15.x. Keep an eye out for any issues/errors or performance issues and fix them as needed.

Next, run your ETL pipelines, data processing scripts, and machine learning models. Check the output and performance to see how they're doing. Also, take a look at the log files for any warnings or errors that might have popped up.

Step 11—Migrate to Production

Now that you've tested the migration in development, it's time to update production to Databricks Runtime 15.x. Just follow the same steps you did in development, but this time with your production clusters and workflows. Make sure to back up all your critical data and settings—you don't want to lose anything!

Next, update the runtime version for your production clusters. Finally, roll out the update bit by bit to avoid downtime.

Step 12—Monitor and Optimize

After the migration, closely monitor the performance and stability of your clusters and workflows. Use Databricks observability tools like Chaos Genius to track resource usage, performance, and any potential issues.

Want to see what Chaos Genius can do? Give it a shot—it's free!

Save up to 30% on your Snowflake spend in a few minutes!

Enter your work email
Enter your work email

Further Reading

If you want to get more info about Databricks Runtime, here are some great resources:

Conclusion

And that’s a wrap! Databricks Runtime is the set of software artifacts that run on the clusters of machines managed by Databricks. It includes Apache Spark. But, it also adds a number of components and updates that substantially improve the usability, performance, and security of big data analytics. Databricks Runtime comes in many versions, each with its own set of features. It's crucial to know what each version offers so you can pick the right one for your data project. If you can wrap your head around how the runtime versions work and how to upgrade, you'll unlock the full power of Databricks for your projects.

In this article, we have covered:

  • What is Databricks Runtime?
  • Databricks Runtime Versions—Features, Updates, Support Lifecycle
  • What is LTS in Databricks Runtime?
  • How to Upgrade Databricks Runtime to the Latest 15.x Version?

… and more!

FAQs

What is Databricks Runtime?

Databricks Runtime is a set of software artifacts that run on Databricks clusters. It includes Apache Spark and additional components that enhance performance, usability, and security for big data analytics.

What is the purpose of LTS versions?

LTS versions are maintained for a longer period, which provides stability and extended support for critical production environments.

What components are included in Databricks Runtime?

Besides Apache Spark, it includes DBIO for improved I/O performance, DBES for enhanced security, and features for reducing operational complexity.

How does Databricks Runtime improve performance?

Through optimizations like the Databricks I/O module (DBIO), which significantly enhances the performance of Spark in cloud environments.

What security features does Databricks Runtime offer?

Databricks Enterprise Security (DBES) provides data encryption, fine-grained access control, and compliance features such as HIPAA and SOC2.

How often are new Databricks Runtime versions released?

Databricks frequently releases new versions, offering rapid access to the latest features and bug fixes ahead of open source releases.

What versions of Databricks Runtime are available?

Databricks releases multiple versions, including long-term support (LTS) versions for stability.

  • Databricks Runtime 15.4 LTS
  • Databricks Runtime 15.3
  • Databricks Runtime 15.2
  • Databricks Runtime 15.1
  • Databricks Runtime 14.3 LTS
  • Databricks Runtime 14.2
  • Databricks Runtime 14.1
  • Databricks Runtime 13.3 LTS
  • Databricks Runtime 12.2 LTS
  • Databricks Runtime 11.3 LTS
  • Databricks Runtime 10.4 LTS
  • Databricks Runtime 9.1 LTS

Can I upgrade my Databricks Runtime version?

Yes, you can upgrade to newer versions to take advantage of improvements and new features. The process typically involves restarting the cluster with the new runtime version.

What is the role of Apache Spark in Databricks Runtime?

Apache Spark is the core engine for large-scale data processing, and Databricks Runtime builds upon it with additional enhancements.

How does Databricks Runtime handle compatibility?

Each runtime version is compatible with specific versions of Spark, Python, Java, and other libraries, ensuring a stable and tested environment.

What are some common use cases for Databricks Runtime?

Common use cases include data engineering, machine learning, real-time analytics, and big data processing.

How does Databricks Runtime support machine learning?

It includes specialized versions like Databricks Runtime for Machine Learning, which comes with pre-installed ML libraries and tools.

What operating systems does Databricks Runtime support?

It typically runs on Linux distributions, such as Ubuntu 20.04 LTS, within the Databricks-managed environment.

Can I customize the environment in Databricks Runtime?

Yes, users can install additional libraries and packages to tailor the runtime environment to their specific needs.

Does Databricks Runtime include Apache Spark updates?

Yes, each Databricks Runtime version comes with its own Apache Spark version, plus additional improvements and fixes.