Documentation
- Documentation
- Headless API

Databricks

3 Minutes to read

Share
Dark
Light

Databricks

3 Minutes to read

Share
Dark
Light

Article summary

Did you find this summary helpful?

Thank you for your feedback!

The Databricks Lakehouse Platform combines the best of data lakes and data warehouses, simplifying the modern data stack and eliminating data silos. Built on open standards and open source, the platform provides a common approach to data management, security, and governance, enabling businesses to operate more efficiently, innovate faster, and achieve the full potential of their analytics and AI initiatives.

Prerequisites

You have a running Databricks SQL warehouse.

Architecture Consideration

In general, there are two main ways to set automatic data load to Databricks using Dataddo.

Using intermediate object storage such as Amazon S3 or Azure Blob Storage. We recommend using this option when you need to write large volumes of the data in low-frequencies (e.g. more than 1M rows once a day). You will need to
- Configure the flows these destinations using Parquet format and
- Configure the Auto Loader in Databricks Delta Lake.
Using Databricks as direct destination. Dataddo Databricks writer uses an SQL layer which means no further configuration on Databricks side is required. We recommend using this option when you need to load relatively low-volume of the data in high-frequencies or to achieve CDC style data replication.

This page is applicable when using Databricks as a direct destination for your Dataddo flow. In case you are considering loading the data via Amazon S3 or Azure Blob Storage or Azure Blob Storage, please navigate to corresponding articles.

Authorize Connection to Databricks

In Databricks

Create an SQL Warehouse

Login to the Databricks workspace.
Click on SQL Warehouses on the sidebar.
Enter a Name for the warehouse and accept the default warehouse settings.
Click on Create.

Configure the Access for the SQL Warehouse

In the Databricks workspace click on SQL and then SQL Warehouses.
Choose the warehouse and navigate to the Connection Details tab.
Get the full DSN connection string, you will need to provide this to Dataddo.

In Dataddo

In the Authorizers tab, click on Authorize New Service and select Databricks.
Select that you want to connect via DSN connection string.
You will be asked to fill the following fields:
1. DSN Connection String: The value obtained during SQL warehouse access configuration step.
2. Catalog: Sets the initial catalog name for the connection. The default value is hive_metastore.
3. Schema: Sets the initial schema name. The default value is default.
Save the authorization details.

Create a New Databricks Destination

On the Destinations page, click on the Create Destination button and select the destination from the list.
Select your authorizer from the drop-down menu.
Name your destination and click on Save.

Need to authorize another connection?

Click on Add new Account in drop-down menu during authorizer selection and follow the on-screen prompts. You can also go to the Authorizers tab and click on Add New Service.

Creating a Flow to Databricks

Navigate to Flows and click on Create Flow.
Click on Connect Your Data to add your source(s).
Click on Connect Your Data Destination to add the destination.
Choose the write mode and fill in the other required information.
Check the Data Preview to see if your configuration is correct.
Name your flow and click on Create Flow to finish the setup.

Table Naming Convention

When naming your table, please make sure the table name:

Is all in lowercase
Starts with a lowercase letter or an underscore
Contains only
- Letter
- Numbers
- Underscores

Troubleshooting

Failed Databricks Action

ERROR MESSAGE

Action failed: stream transfer: write data from stream: initializing writer instance: connecting to database: pinging database server: databricks: execution error: failed to execute query: context deadline exceeded"

Databricks clusters may enter a timeout after some time in idle mode. When restarting, Databricks cluster will take some time to restart. As such, this issue may be caused by situations when the cluster restart time overlaps with Dataddo actions causing the actions to fail.

To avoid this, make sure Dataddo actions are scheduled during cluster uptime schedule.

Was this article helpful?

What's Next

Firebird

Table of contents

Prerequisites
Architecture Consideration
Authorize Connection to Databricks
Create a New Databricks Destination
Creating a Flow to Databricks
Troubleshooting

Databricks

Prerequisites

Architecture Consideration

Authorize Connection to Databricks

In Databricks

Create an SQL Warehouse

Configure the Access for the SQL Warehouse

In Dataddo

Create a New Databricks Destination

Creating a Flow to Databricks

Table Naming Convention

Troubleshooting

Failed Databricks Action

Related Articles

What's Next