Write Data to Unattached Lakehouses with Fabric Notebooks

Prathy Kamasani
Oct 14, 2024
2 min read

Updated: Nov 14

It's one of those blog posts I write for my own reference as I often find myself needing this code. I hope it helps someone out there too 🙂

Working with Multiple Lakehouses

Regardless of which architecture we follow, during stages of data integration and transformation there's always a step to move data from one location to another.

And we work with multiple tables, schemas, and even lakehouses. Same goes with Fabric Notebooks. I often find myself in scenarios where I don't want to attach a Lakehouse to my notebook, but I do want to read or write data from various lakehouses.

I recently blogged about a way to achieve this as part of Documenting your workspaces. In that post, I described how to write data to a workspace that wasn't attached to the notebook. I used MsSparkUtil (renamed to NotebookUtils) to mount and then write data in the Lakehouse as Delta tables.

Mounting Unattached Lakehouses

To access an unattached Lakehouse, you'll use mssparkutils.lakehouse.get() and mssparkutils.fs.mount(). Here's how:

# Define Lakehouse name and description
LH_Name = "LH_Fabric_Documentation"
LH_desc = "Lakehouse for Fabric Documentation"

# Mount the Lakehouse for direct file system access
lakehouse = mssparkutils.lakehouse.get(LH_Name)
mssparkutils.fs.mount(
    lakehouse.get("properties").get("abfsPath"),
    f"/{LH_Name}"
)

# Retrieve and store local and ABFS paths of the mounted Lakehouse
local_path = mssparkutils.fs.getMountPath(f"/{LH_Name}")
lh_abfs_path = lakehouse.get("properties").get("abfsPath")

print(f'Local path: {local_path}')
print(f'ABFS path: {lh_abfs_path}')

Once mounted, you can write data to the Lakehouse using the ABFS path:

Table_Name = "XXXX"

spark_df_Fabric_all_items_type.write.format("delta").option("mergeSchema", "true").mode("append").save(f"{lh_abfs_path}/Tables/{Table_Name}")

print(Table_Name, "created at :", f"{lh_abfs_path}/Tables/{Table_Name}")

Writing Directly with ABFS Paths

In scenarios where your Lakehouse is in a different workspace, or when you prefer not to mount it, you can write directly to OneLake using ABFS paths.

This approach is inspired by Aitor Murguzur's excellent blog post on writing to OneLake from Databricks.

workspace_id = "<workspace_id>"
lakehouse_id = "<lakehouse_id>"
table_name = "XXXX"

# Write data to Lakehouse
df.write.format("delta").mode("overwrite").save(
    f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}"
)

Using Lakehouse Schemas

Lakehouses now support Schemas, which let you organise your tables into logical groupings. To write data to a table within a specific schema, just include the schema name in your table path using dot notation.

Example with Schemas

# Sample DataFrame creation
data = [("John", 28), ("Jane", 34), ("Mike", 45)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Display the first 5 rows
df.show(5)

# Define workspace and lakehouse IDs
workspace_id = "xxxxxxx"
lakehouse_id = "xxxxxxx"

# Define the table name with schema.tableName format
table_name = "test.schemaTable"

# Write the DataFrame to the Lakehouse as Delta tables
df.write.format("delta").mode("overwrite").save(
    f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/{table_name}"
)

Quick Summary

Working with unattached Lakehouses in Fabric Notebooks is straightforward once you know the approach:

Mounting approach: Use mssparkutils.lakehouse.get() and mssparkutils.fs.mount() for easier path management (works best within the same workspace)
Direct ABFS path: Write directly using abfss:// paths for cross-workspace scenarios or when you prefer not to mount
Schema support: Organise tables into schemas using dot notation (schema.tableName) in table names

These patterns work across various Fabric scenarios, from Notebooks to pipelines, making data integration flexible and powerful.

Keep Smiling, Prathy