Databricks Integration Demo

Goal: Provide guidance on how to use Attimis OneBucket in Databricks and clearly demonstrate the value it delivers. Show why Attimis OneBucket improves Databricks workflows through simplified configuration, improved reliability, and better performance.

How to Use Attimis OneBucket

This guide demonstrates how to configure s3.attimis.cloud as a single S3-compatible endpoint in Databricks using Spark. Once configured, Databricks can access multiple underlying data sources through one logical endpoint. After setup, reading, writing, and deleting data works exactly like native S3, without needing to know where the data physically lives.

Prerequisites

  • A Databricks workspace
  • Bucket access credentials stored in Databricks Secrets
  • Spark cluster with required maven libraries installed

1. Configure the S3-compatible endpoint (one-time setup)

This configuration is set once at the cluster level. After this, all data sources behind Attimis OneBucket are automatically available to every notebook and job on the cluster.

Why this matters:

Databricks only supports one S3 endpoint per Spark cluster at the infrastructure level. Without Attimis OneBucket, users are forced to configure endpoints inside notebooks, which must be re-run every time a Spark session restarts. Attimis eliminates this entirely by providing one globally configured endpoint.

2. Reading Data from OneBucket

Once configured, reading data is identical to reading from any S3-compatible storage. The difference is that multiple datasets from different storage systems are accessed through the same endpoint.

3. Transforming and Joining Data in OneBucket

Data accessed through Attimis OneBucket can be transformed and joined like any other Spark dataset, without requiring separate endpoints or credentials.

4. Writing Results Back to OneBucket

Writes are automatically distributed to all backing storage systems managed by Attimis OneBucket.

5. Deleting Data from OneBucket

Deletes are applied consistently across all data sources behind the unified endpoint.

Why Attimis OneBucket

Databricks works extremely well with cloud-native object storage. However, enterprises rarely operate in a single-cloud only world. Many organizations must keep data on-prem or across multiple providers due to security, cost, or architectural constraints. As a result, teams must rely on legacy Spark configurations and manually managed credentials to access their data. This approach is operationally inefficient and prone to issues.

Attimis OneBucket modernizes the legacy approach by providing an S3-compatible unified data access layer in Databricks, allowing organizations to access their data from anywhere.

Unified Data Access

Attimis OneBucket acts as a unified data access layer. By consolidating multiple storage sources behind a single S3-compatible endpoint, s3.attimis.cloud, Attimis OneBucket dramatically simplifies connectivity and configuration within Databricks.

Without Attimis OneBucket, connecting Spark to multiple storage providers requires:

  • Separate S3A endpoint confi in in gurations for each provider
  • Multiple Databricks secret scopes to manage distinct access and secret keys
  • Spark configuration updates whenever a new data source is added.

This approach increases operational overhead and the risk of misconfiguration

With Attimis OneBucket, we provide:

  • A single endpoint (s3.attimis.cloud)
  • A single configuration
  • A unified namespace

Once the unified endpoint is configured, new data sources can be added behind the scenes within Attimis OneBucket, without requiring any configuration changes to Databricks clusters or notebooks.

Bucket-Specific Configuration Overhead Scenario

Without Attimis OneBucket, to avoid resetting Spark configurations on every read, write, or delete operation, users must configure endpoints at the bucket level. This binds each endpoint to a single bucket. As a result, each bucket requires: its own S3 endpoint configuration, must be explicitly defined in Spark, and dataframes become dependent on specific buckets. If a pipeline needs to interact with ten different buckets, users must create and manage ten separate configuration entries, each with its own endpoint and credentials.

With Attimis OneBucket, this problem completely disappears. Any bucket can be accessed through the same unified endpoint, with no additional Spark configuration required.

Simplicity in Multi-Source Queries

Attimis presents all storage locations through a single S3-compatible endpoint.

This allows

  • Spark configurations to be set once at a global level
  • Queries applied across all data sources
  • Reduced risk of credential mismanagement

Silent Failure Scenario

Without Attimis OneBucket, if a user unintentionally configures the wrong endpoint, Spark does not immediately error. The job may hang for several minutes, attempting to connect and there is no automatic fallback to another provider. This results in pipelines stalling with no clear indication of failure.

With Attimis OneBucket, the endpoint never changes. Backend routing is handled automatically, resulting in dataframes successfully created in seconds.

File Not Found Scenario

In a scenario, the yellow taxi dataset exists in AWS, but does not exist in Wasabi. The Spark session is configured to use the Wasabi S3 endpoint. The Dataframe then becomes explicitly dependent on the Wasabi endpoint, and when Spark attempts to read the yellow dataset path, it throws an error, even though the data exists in AWS.Because Spark is bound to a single endpoint, it cannot transparently access the AWS copy of the dataset. The failure is not caused by missing data, but by the endpoint dependency introduced by per-bucket configuration.

Improved Performance Across Multiple Endpoints

By using a single S3-compatible endpoint, Spark avoids initializing and managing multiple S3A connections. This results in more consistent query execution and improved overall performance compared to managing multiple endpoints.

Query Example 1

Using the green and yellow public NYC taxi trip data to calculate the average number of passengers per trip.

With Attimis OneBucket: 11.91 second runtime

Without AttimisOneBucket: 37.62 second runtime

Query Example 2, using the green and yellow public NYC taxi trip data to calculate the average trip distance for each taxi.

With Attimis OneBucket: 11.79 second runtime

Without AttimisOneBucket: 41.45 second runtime

Simplifying Migrations and Accelerating Time-To-Value

Data migrations are slow and disruptive. In a traditional Databricks deployment, data must be fully migrated before it can be accessed. Entire teams are often dedicated solely to planning, executing, and validating migrations, and must wait months before they can begin to derive value. In real-world enterprise environments, large-scale data migrations are said to take as long as 18 months.

Attimis OneBucket supports opportunistic migrations, allowing data to move over time without disrupting workflows.

  • Data is accessed where it currently lives
  • Migrations occur in the background.
  • There is no need to refactor pipelines as data moves.
  • Data is accessible in real-time through the same paths and the same endpoint throughout the migration lifecycle.

Ultimately, time-to-value is reduced from months to minutes.

Security and Governance

Attimis OneBucket centralizes security and governance by removing the need to expose or manage separate credentials for each data source.

Key benefits include:

  • One credential used by Databricks
  • Secure backend connection management by Attimis
  • Consistent access controls across all data sources
  • Reduced credential sprawl

This is especially valuable for enterprises with on-prem or hybrid environments, where the legacy approach requires

  • Manually Spark configuration per source
  • Multiple secret scopes
  • Broad credential exposure

Attimis OneBucket reduces security risks while simplifying compliance and governance.