---
title: "Databricks"
slug: "databricks"
description: "Automate data integration with Dataddo's Databricks connector. Reliable and secure data sync. Learn how to quickly and easily connect your data to Databricks."
updated: 2024-06-14T09:28:50Z
published: 2024-06-14T09:28:50Z
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dataddo.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks

**The Databricks Lakehouse Platform** combines the best of data lakes and data warehouses, simplifying the modern data stack and eliminating data silos. Built on open standards and open source, the platform provides a common approach to data management, security, and governance, enabling businesses to operate more efficiently, innovate faster, and achieve the full potential of their analytics and AI initiatives.

## Prerequisites

          
          

- You have a [running Databricks SQL warehouse](/docs/databricks#in-databricks).

## Architecture Consideration

In general, there are two main ways to set automatic data load to Databricks using Dataddo.

- Using **intermediate object storage** such as [Amazon S3](/docs/s3) or [Azure Blob Storage](/docs/azure-blob-storage). We recommend using this option when you need to **write large volumes of the data in low-frequencies** (e.g. more than 1M rows once a day). You will need to
  - Configure the flows these destinations using **Parquet** format and
  - Configure the [Auto Loader](https://docs.databricks.com/ingestion/auto-loader/index.html) in Databricks Delta Lake.
- Using **Databricks as direct destination**. Dataddo Databricks writer uses an SQL layer which means no further configuration on Databricks side is required. We recommend using this option when you need to load **relatively low-volume of the data in high-frequencies** or to achieve **CDC style [data replication](/docs/database-replication)**.

          
          

This page is applicable when using **Databricks as a direct destination** for your Dataddo flow. In case you are considering loading the data via [Amazon S3](/docs/s3) or [Azure Blob Storage](/docs/azure-blob-storage) or [Azure Blob Storage](/docs/azure-blob-storage), please navigate to corresponding articles.

## Authorize Connection to Databricks

### In Databricks

#### Create an SQL Warehouse

1. Login to the Databricks workspace.
2. Click on **SQL Warehouses** on the sidebar.
3. Enter a **Name** for the warehouse and accept the default warehouse settings.
4. Click on **Create**.

#### Configure the Access for the SQL Warehouse

1. In the Databricks workspace click on **SQL** and then **SQL Warehouses**.
2. Choose the warehouse and navigate to the **Connection Details** tab.
3. Get the full **DSN connection string**, you will need to provide this to Dataddo.

### In Dataddo

1. In the **Authorizers** tab, click on [**Authorize New Service**](https://app.dataddo.com/service/new) and select **Databricks**.
2. Select that you want to connect via **DSN connection string**.
3. You will be asked to fill the following fields:
  1. **DSN Connection String**: The value obtained during [SQL warehouse access configuration](/docs/databricks#in-databricks) step.
  2. **Catalog**: Sets the initial catalog name for the connection. The default value is **hive_metastore**.
  3. **Schema**: Sets the initial schema name. The default value is **default**.
4. **Save** the authorization details.

## Create a New Databricks Destination

1. On the **Destinations** page, click on the [**Create Destination**](https://app.dataddo.com/destinations) button and select the destination from the list.
2. Select your ***authorizer*** from the drop-down menu.
3. Name your ***destination*** and click on **Save**.

          Need to authorize another connection?

          

Click on **Add new Account** in drop-down menu during ***authorizer*** selection and follow the on-screen prompts. You can also go to the **Authorizers** tab and click on [**Add New Service**](https://app.dataddo.com/service/new).

## Creating a Flow to Databricks

1. Navigate to **Flows** and click on [**Create Flow**](https://app.dataddo.com/flow/new).
2. Click on **Connect Your Data** to add your ***source(s)***.
3. Click on **Connect Your Data Destination** to add the ***destination***.
4. Choose the [write mode](https://docs.dataddo.com/docs/data-storages#write-modes) and fill in the other required information.
5. Check the **Data Preview** to see if your configuration is correct.
6. **Name** your flow and click on **Create Flow** to finish the setup.

### Table Naming Convention

When naming your table, please make sure the table name:

- Is all in lowercase
- Starts with a lowercase letter or an underscore
- Contains only
  - Letter
  - Numbers
  - Underscores

---

## Troubleshooting

### Failed Databricks Action

**ERROR MESSAGE**

```
Action failed: stream transfer: write data from stream: initializing writer instance: connecting to database: pinging database server: databricks: execution error: failed to execute query: context deadline exceeded"
```

Databricks clusters may enter a timeout after some time in idle mode. When restarting, Databricks cluster will take some time to restart. As such, this issue may be caused by situations when the **cluster restart time** overlaps with **Dataddo actions** causing the actions to fail.

To avoid this, make sure Dataddo actions are scheduled **during cluster uptime** schedule.

## Related Articles

- [Data Backfilling to Storages](https://docs.dataddo.com/docs/data-backfilling-to-storages)
- [Write Modes](https://docs.dataddo.com/docs/data-storages#write-modes)
- [Implementation of Batch Ingestion to Data Lakes](https://docs.dataddo.com/docs/batch-ingestion-to-data-lakes)
- [Network Access Control List (ACL) Configuration](https://docs.dataddo.com/docs/network-acl)
- [SSH Tunnelling](https://docs.dataddo.com/docs/ssh-tunnelling)
- [Data Transformations](https://docs.dataddo.com/docs/data-transformations)
- [Data Quality Firewall](https://docs.dataddo.com/docs/data-quality-firewall)
