top of page

Document #PowerBI Workspaces with #MicrosoftFabric #Notebooks

Updated: Nov 14

If you're a consultant like me, you know how hard it can be to access Power BI Admin API or Service Principal. Sometimes, you need to see all the workspaces you have permission for and what’s inside them. Well, I found with MS Fabric, we can use notebooks and achieve it with a few steps:

  1. Get Tennant ID

  2. Connect to Tennant for POWER BI

  3. Call POWER BI REST API, get data and save data in Lake House

  4. Report using Power BI

If you're new to Microsoft Fabric and Notebooks, checkout this series on MS Docs – How to use notebooks – Microsoft Fabric | Microsoft Learn.

Get Tennant ID

There might be simple ways to get Tennant ID using Python, but the below code works to get Tennant ID of the current Notebook. 

import re

conf_list = spark.sparkContext.getConf().getAll()

# Filter the configuration parameters to get the driver log URL
filtered_list = filter(lambda x: x[0] == 'spark.tracking.driverLogUrl', conf_list)
value = next(filtered_list, None)
url = value[1]
parts = url.split('/')
index = parts.index('sparkui')
TennantID = parts[index + 1]
print(TennantID)

Connect to Tennant for Power BI

First, I need an access token, so I use the authority URL to authenticate with my login. This gives me the workspaces I can access, but you can also use Azure APP and Service Principal authentication.

# Import the necessary libraries
%pip install adal
from adal import AuthenticationContext
import requests

# Set the required parameters
TENANT_ID = TennantID
CLIENT_ID = '1aea3f97-edc6-4453-a59b-b88b0b803711' # this is Power BI Client Integrations client ID
RESOURCE = 'https://analysis.windows.net/powerbi/api'

# Set the authority URL and create an authentication context
authority_url = f'https://login.microsoftonline.com/{TENANT_ID}'
context = AuthenticationContext(authority_url)

# Acquire a user code and device code to authenticate the user
code_response = context.acquire_user_code(RESOURCE, CLIENT_ID)
print(code_response['message'])
token_response = context.acquire_token_with_device_code(RESOURCE, code_response, CLIENT_ID)
access_token = token_response['accessToken']

The code above uses the Power BI Client Integration Client ID to connect to the service. It prompts me to enter a code to authenticate myself when I run it. This is fine for occasional or manual scripts but not very convenient otherwise.

Screenshot 1 of 3 showing document powerbi workspaces - connect to tennant

Another option is to create an app and a service principal in Azure. This lets you authenticate and authorise your app to access Azure resources. To learn how to do this, check out this link –Power BI REST APIs for embedded analytics and automation – Power BI REST API | Microsoft Learn

%pip install azure.identity
from azure.identity import ClientSecretCredential
import json
import requests
import pandas as pd

tenant_id = TennnatID
client_id = 'Your ClientID'
client_secret = 'Your Client Secret' # Good to not hardcode client secret here, try to see how you can keep in Azure Key Vault safely
ResourceURL = 'https://analysis.windows.net/powerbi/api'
APIURL = 'https://analysis.windows.net/powerbi/api/.default'

auth = ClientSecretCredential(
 authority='https://login.microsoftonline.com/',
 tenant_id=tenant_id,
 client_id=client_id,
 client_secret=client_secret
)
access_token = auth.get_token(APIURL)
access_token = access_token.token

Following Call Power BI REST APIs and save them in Lakehouse

In this example, I’m only fetching the workspace ID, the dataset ID and the report ID, but you can use any language that supports HTTP requests.

# Set the base URL and headers for the API call
Baseurl = 'https://api.powerbi.com/v1.0/'
headers = {'Authorization': f'Bearer {access_token}'}

# Get a list of all workspaces in the organization
response = requests.get(Baseurl + 'myorg/groups', headers=headers)

if response.status_code == 200:
    workspaces = response.json()['value']
    
    # Get list of all workspaces and their IDs and keep in a dataframe
    import pandas as pd
    
    workspaces_df = pd.DataFrame(workspaces)
    display(workspaces_df)
else:
    print(f"Error: {response.status_code} - {response.text}")

# Create a dataframe to hold the list of all workspaces
workspaces_df = pd.DataFrame(workspaces)
workspaces_df = workspaces_df.fillna('')

# Replace spaces in columns with _
workspaces_df.columns = workspaces_df.columns.str.replace(' ', '_')

# Create a spark dataframe from the workspaces_df
workspaces_spark_df = spark.createDataFrame(workspaces_df)
spark.sql("DROP TABLE IF EXISTS Workspaces")
workspaces_spark_df.write.format("delta").mode("overwrite").saveAsTable("Workspaces")

# Get datasets
Datasets = []

# Get a list of all datasets in the organization
for workspace in workspaces:
    response = requests.get(Baseurl + f"myorg/groups/{workspace['id']}/datasets", headers=headers)
    
    if response.status_code == 200:
        datasets = response.json()['value']
        
        # Get list of all datasets and their IDs
        for dataset in datasets:
            Datasets.append(dataset)
            # Add workspace name and ID to the dataset
            dataset['Workspace Name'] = workspace['name']
            dataset['Workspace ID'] = workspace['id']
    else:
        print(f"Error: {response.status_code} - {response.text}")

# Create a dataframe from the Datasets list
Datasets_df = pd.DataFrame(Datasets)

# Select key columns
Datasets_df = Datasets_df[['name', 'id', 'Workspace Name', 'Workspace ID', 'configuredBy', 'webUrl', 'createdDate']]

# Rename name and id to Dataset Name and Dataset ID
Datasets_df = Datasets_df.rename(columns={'name': 'Dataset Name', 'id': 'Dataset ID'})

# Replace the null values with empty string
Datasets_df = Datasets_df.fillna('')

# Replace spaces in columns with _
Datasets_df.columns = Datasets_df.columns.str.replace(' ', '_')

# Create a spark dataframe from the Datasets_df
datasets_spark_df = spark.createDataFrame(Datasets_df)
spark.sql("DROP TABLE IF EXISTS Datasets")
datasets_spark_df.write.format("delta").mode("overwrite").saveAsTable("Datasets")

# Get reports
Reports = []

# Get a list of all reports in the organization
for workspace in workspaces:
    response = requests.get(Baseurl + f"myorg/groups/{workspace['id']}/reports", headers=headers)
    
    if response.status_code == 200:
        reports = response.json()['value']
        
        # Get list of all reports and their IDs
        for report in reports:
            Reports.append(report)
            # Add workspace name and ID to the report
            report['Workspace Name'] = workspace['name']
            report['Workspace ID'] = workspace['id']
    else:
        print(f"Error: {response.status_code} - {response.text}")

# Create a dataframe from the Reports list
Reports_df = pd.DataFrame(Reports)

# Select key columns
Reports_df = Reports_df[['name', 'id', 'Workspace Name', 'Workspace ID', 'webUrl', 'datasetId', 'datasetWorkspaceId']]

# Rename name and id to Report Name and Report ID
Reports_df = Reports_df.rename(columns={'name': 'Report Name', 'id': 'Report ID'})

# Replace the null values with empty string
Reports_df = Reports_df.fillna('')

# Replace spaces in columns with _
Reports_df.columns = Reports_df.columns.str.replace(' ', '_')

# Create a spark dataframe from the Reports_df
reports_spark_df = spark.createDataFrame(Reports_df)
spark.sql("DROP TABLE IF EXISTS Reports")
reports_spark_df.write.format("delta").mode("overwrite").saveAsTable("Reports")

 In the code snippet above, I started by defining the Base URL to add the REST API URLs to it later. Then, I used the response status to handle any possible API issues. After that, I stored the response in the panda’s data frame and converted it to a spark data frame before saving it in the Lakehouse as a delta table. I also removed spaces from the column names because Spark tables don’t allow them. This is the same pattern that I followed for all the other APIs. 

Create Report

When I create a Lakehouse in Microsoft Fabric, it will automatically create some default options.

Screenshot 2 of 3 showing document powerbi workspaces - create report
Screenshot 3 of 3 showing document powerbi workspaces - create report
When I click on the ellipsis of two default options, Dataset(Default) or SQL endpoint, there are various ways to analyse data.

You might be overwhelmed with all these different visualisation options available. Each one does serve a different purpose if you know the difference between them. But for my use case, I want a simple list of all workspaces, reports, who created them, configured them, etc. So,

I selected the Dataset(Default) option and clicked Create Paginated Report.

I dragged and dropped the columns I wanted to see in the table, such as workspace name, report name, owner, created date, etc.

That’s it! I now have a simple and handy list of all my workspaces in Microsoft Fabric. I could export, share or distribute this report to further audiences.

This is a simple use case, but it has much potential. Instead of using the default dataset, I could have created some relationships. I could have even used TOM to create a Dataset, relationships, measures, etc. I didn’t have enough time to explore that. Additionally, I wasn’t sure how to do it in Python. Maybe someone can try it and let me know how it goes. 

I was curious; I started everything with embedded API and how it could read data from different sources in Power BI. I wanted to document Workspaces, Datasets and Reports with all the details with Embedded API. But, I encountered a problem: “Embedding a DirectLake dataset isn't supported with V1 embed token“. Maybe this will change, and then we can do amazing things. We could document various artifacts in a workspace, create reports and publish them in Power BI App—an intelligent way to document. For now, I used REST operations.

Another option is to use the Admin API and scanner API, start a trace and get the dataset metadata, document measures, etc. I just wanted to show these notebooks open doors to many amazing things. 🙂

Until next time,

Prathy 🙂

Comments


bottom of page