I Built a Tool to Size Fabric SKUs

2 days ago
11 min read

Every Fabric project I've worked on, every single one, has had questions around SKU sizing, capacity, capacity metrics, monitoring, utilisation. The list goes on. I've been working with Power BI since its early days and I've been in the BI world for many years before that. In all that time, I don't think I've come across a single product or project where it's easy to say "this is the right size." There's never really a right or wrong answer. I sometimes wonder if that's where the famous "it depends" comes from.

To be able to answer those questions properly for my clients, I decided to teach myself how Fabric capacity actually works. Not just the docs, but the mechanics: how CUs are evaluated, how smoothing works, what triggers throttling. I'm building a course on it at the moment. Still figuring out whether to make it free or paid. I was leaning towards free, but we'll see.

While I was putting the course together, I had one of those moments: what if I had a single notebook I could take to any client and just run it to get the sizing of their Fabric capacity? That's what led to the Fabric SKU Advisor. I've run it on a few capacities now and it works pretty well. I'm not sure it works perfectly everywhere though, and I'd genuinely love to see how it handles yours.

The short version

If you don't have time for the full breakdown, here's the gist:

The tool comes in two forms: a Fabric notebook that connects to your Capacity Metrics semantic model via Semantic Link, and a Python CLI script for workshops or quick what-if scenarios using CSV data.

The core algorithm uses what I call the 80/80 approach: take the 80th percentile of your daily CU consumption (weekday only, spike-filtered), and find the SKU where that sits at 80% utilisation. The remaining 20% of busy days are handled by Fabric's smoothing. The 20% headroom absorbs intra-day spikes.

On top of that, it does trend analysis (are you growing or shrinking?), reserved vs PAYG pricing comparison, workspace-level breakdowns, health scoring, and generates an HTML report you can share. In multi-capacity mode it analyses your whole estate and gives you a comparison table.

Everything below goes into the detail of how and why.

Before we get into the algorithm: a few capacity fundamentals

To follow the sizing logic, you need to understand how Fabric evaluates consumption. I'll keep this brief, but these concepts come up throughout.

Fabric doesn't charge per query or per refresh. You buy a fixed SKU (F2, F4, F8... up to F2048). With pay-as-you-go, you can pause the capacity and stop paying, but while it's running, you're paying for fixed compute whether you use 10% or 100% of it. This is the mental model shift that trips people up: the goal isn't to minimise usage. It's to get the most value from the capacity you've already paid for.

Everything in Fabric, from semantic model refreshes to notebook runs to someone opening a dashboard, consumes Capacity Units (CUs). Each SKU gives you a fixed number of CUs per second: F2 gets 2, F64 gets 64, and so on.

Here's where it gets less obvious. Fabric doesn't evaluate your consumption continuously. It batches everything into 30-second evaluation windows. Your budget for each window is: SKU number x 30. So an F64 gives you 1,920 CUs per window.

When you exceed that budget, Fabric smooths the overage forward into future windows, creating what I think of as "debt." Background operations can smooth across a full 24 hours (that's 2,880 windows). Interactive work smooths over shorter periods. It's quite forgiving, up to a point.

But sustained overload accumulates. The debt builds. And then you start hitting throttling: first delays on interactive queries, then rejections, and eventually background jobs get shut down entirely. I like the electricity grid analogy here. A brief spike and the grid absorbs it. Keep all the heaters on all day and the fuse blows.

This is why sizing matters. Too small, and you accumulate debt faster than it clears. Too large, and you're paying for headroom you'll never touch.

I learned most of this from the official docs, Matthew Farrow's excellent Fabric Billing Series, and a lot of trial and error running the Capacity Metrics app on real capacities. I'll link everything at the end.

The algorithm

I spent a while thinking about how to make the recommendation sensible. Not just "pick the SKU where you fit," but something that accounts for the messy reality of workloads that spike on Mondays, dip on weekends, and occasionally go wild on month-end.

Converting daily CUs to window-level load

There are 2,880 thirty-second windows in a day. To understand what your consumption looks like at the level Fabric actually evaluates, you divide:

cus_per_window = daily_cus / 2880

This gives you the average load per window if consumption were spread evenly. It's a simplification, but it mirrors what Fabric's smoothing mechanism does for background operations.

Checking utilisation against every SKU

For each of the 11 F-SKUs, the tool calculates what percentage of that SKU's budget your workload would consume:

def calculate_utilisation(daily_cus: float, sku: dict) -> float:
    cus_per_window = daily_cus / 2880
    return cus_per_window / sku["budget_30s"]

So if your average day is 800,000 CUs, that's about 278 CUs per window, which is 14.5% of an F64's budget. Comfortable. But if your peak day hits 5,000,000 CUs, that's 1,736 per window, or 90.4%. That's getting tight.

Why I chose the 80th percentile

This was probably the most important design decision.

I could have sized for the average day, but then you'd be under-provisioned whenever things get busy. I could have sized for the peak day, but then you'd be overpaying most of the time. The 80th percentile felt right: the CU level that 80% of your days fall below.

p80_daily_cus = sorted(daily_totals)[int(len(daily_totals) * 0.8)]

For the remaining 20% (your busiest days), Fabric's smoothing mechanism absorbs the overage. That's exactly what smoothing is designed to do. You just don't want it absorbing overage every single day, because that's how carryforward debt accumulates.

Weekday vs weekend: not all days are equal

This is something I added after running the tool on a few real capacities. The P80 across all days was being pulled down by weekends, where most capacities are barely ticking over. That meant the recommendation was slightly undersized for the weekdays that actually matter.

Now the tool splits consumption into weekday and weekend buckets. The P80 that drives the recommendation comes from weekdays only:

weekday_data = daily_summary[daily_summary['IsWeekday']]
p80_weekday = weekday_data['ActualCUs_sum'].quantile(0.80)

The weekend numbers still show up in the report for context, but they don't influence the sizing. This one change made the recommendations noticeably better for every capacity I tested.

Filtering out pause/resume spikes

Another lesson from real data. When you pause and resume a Fabric capacity, there's often a settlement spike: a catch-up burst where accumulated background work gets processed. These spikes can be massive, sometimes 2-3x the median day, and they were throwing off the P80.

The tool detects suspected spikes by flagging any day where peak CUs exceed twice the median:

def detect_pause_spikes(daily_summary, threshold_factor=2.0):
    median_peak = daily_summary['ActualCUs_max'].median()
    return daily_summary['ActualCUs_max'] > (median_peak * threshold_factor)

Flagged days get excluded from the P80 calculation but still show up in the charts as red triangles, so you can see them and decide for yourself whether they're genuine outliers or recurring patterns.

Targeting 80% utilisation

I also don't recommend an SKU where the 80th percentile load fills 100% of the budget. The target is 80%, leaving 20% headroom for intra-day spikes:

required_budget = cus_per_window / 0.80

I call this the "80/80 approach": 80th percentile consumption (weekday, spike-filtered) at 80% target utilisation. Below 50% utilisation, you're likely oversized. Above 85%, it's getting tight. Above 95%, you're rolling the dice.

Classifying every SKU

Each SKU gets a plain-language status based on what utilisation would look like:

Status	Average Utilisation	What it means
THROTTLING RISK	Peak > 100%	You will hit throttling on busy days
TOO SMALL	Avg > 95%	Consistently maxed out
TIGHT	Avg 85-95%	Limited headroom for growth
GOOD FIT	Avg 60-85%	The target zone
COMFORTABLE	Avg 40-60%	Plenty of room
OVERSIZED	Avg < 40%	Paying for unused capacity

The first SKU that lands in "GOOD FIT" or "COMFORTABLE" becomes the recommendation.

SKU Classification by Average Utilisation

Trend analysis

Sizing for today is one thing, but I wanted to flag where things are heading. The tool runs a linear regression across daily CU consumption and reports a weekly growth rate:

def calculate_trend(daily_summary, column='ActualCUs_sum'):
    slope = (n * sum_xy - sum_x * sum_y) / denom
    weekly_growth_pct = (slope / avg_val) * 100 * 7

It classifies the trend as GROWING (>2%/week), DECLINING (<-2%/week), or STABLE. If consumption is growing steadily, the report calls it out. It's a simple projection, not a forecast, but it's enough to start a conversation about whether to right-size now or plan for a scale-up.

Reserved vs pay-as-you-go pricing

Every SKU recommendation includes both PAYG and reserved instance pricing. Reserved commitments (one-year) save roughly 41% on list price, and the tool shows the break-even point: if your average utilisation is above about 60%, reserved is almost certainly cheaper.

I added this because the sizing conversation always leads to the pricing conversation, and I was tired of alt-tabbing to the Azure calculator. Now it's right there in the report.

The CLI script

The CLI script is the simpler version. It takes a CSV of daily CU data, runs the same algorithm, and prints the results in your terminal. I built it for training workshops where participants don't always have access to a live capacity.

Sample data and output

It ships with fifteen days of consumption across five Fabric items:

Date,Item Name,Item Type,CUs
2024-01-15,Sales Semantic Model,Semantic Model,320000
2024-01-15,Finance ETL Notebook,Notebook,280000
2024-01-15,Customer 360 Dataflow,Dataflow Gen2,120000
2024-01-15,Executive Dashboard,Report Views,45000
2024-01-15,Inventory Pipeline,Data Pipeline,25000

Run python fabric_sku_advisor.py and you get a consumption summary, then a full SKU comparison:

SKU        Avg%   Peak%  Status            Monthly
─────────────────────────────────────────────────────
F2         ████████████████████  THROTTLING    £130
F4         ████████████████████  THROTTLING    £260
...
>> F32     ██████████░░░░░░░░░░  GOOD FIT     £2,600  <<
F64        █████░░░░░░░░░░░░░░░  COMFORTABLE  £5,200

The item breakdown is where the real conversations happen. When one semantic model is eating 35% of your capacity, the question stops being "do we need a bigger SKU?" and becomes "can we optimise that refresh?" Much cheaper answer, usually.

The Fabric notebook

For actual client work I use the notebook version. It connects directly to the Capacity Metrics semantic model, so there's no exporting CSVs or copying data around.

Configuration

The notebook runs in two modes: single capacity (deep dive) or multi capacity (analyse everything). You set this at the top:

WORKSPACE_ID   = "your-workspace-guid"
DATASET_ID     = "your-dataset-guid"
ANALYSIS_MODE  = "single"       # "single" or "multi"
CAPACITY_ID    = "your-cap-id"  # required for single mode

WEEKDAY_WEEKEND_SPLIT  = True
TREND_ANALYSIS         = True
RESERVED_VS_PAYG       = True
SPIKE_FILTERING        = True

In single mode you get the full treatment: charts, text summary, HTML report. In multi mode it auto-discovers all active capacities and loops through them, one HTML report per capacity plus a styled comparison table at the end.

It uses sempy.fabric (Semantic Link) to execute DAX queries. No REST APIs, no service principals, no token management.

Dealing with version changes

Microsoft keeps updating the Capacity Metrics semantic model, and measure names change between versions. What was xBackground % in v37 became Background billable CU % in v40, and v47 introduced MPARAMETER for capacity filtering.

I borrowed the version detection approach from FUAM (Fabric Unified Admin Monitoring), which is an open-source solution accelerator from Microsoft's Fabric CAT team. The idea is a cascading detector: try the newest schema first, catch the error if it fails, and work your way back.

# Try v47+ first (latest schema)
test_query_v47 = """
DEFINE MPARAMETER 'DefaultCapacityID' = "00000000-..."
EVALUATE SUMMARIZECOLUMNS("Test", [Background billable CU %])
"""

# Falls back to v40-v44, then v37

Not elegant, but reliable. You don't need to know or care which version you're running.

Workspace and item-level breakdown

The notebook pulls item-level data including workspace name, item name, billing type, and overloaded minutes. This powers the workspace breakdown chart (CU consumption by workspace) and the top workloads chart (individual items consuming the most CUs).

If the billing type column is available, the breakdown splits billable vs non-billable CUs per workspace. Useful when you're trying to reconcile what's actually costing money versus system overhead.

Health score

Beyond the SKU recommendation, the notebook calculates a health score out of 100. I wanted a single number that captured "how healthy is this capacity?" because not everyone wants to interpret six charts.

It combines three factors:

Utilisation (40%): Sweet spot is 50-70%. Penalties kick in above 85%.

Throttling (40%): Any throttling drops the score. This gets heavy weighting because throttling is the thing users actually feel.

Carryforward (20%): Persistent carryforward means you're fundamentally undersized. Lower weight because some carryforward is normal.

The ratings: EXCELLENT (90+), GOOD (75+), FAIR (50+), POOR (25+), CRITICAL (below 25). I'm still tweaking the thresholds.

The charts

The notebook generates interactive Plotly visualisations. The ones I find most valuable:

The daily utilisation chart shows consumption day by day, weekday and weekend bars in different colours, suspected spikes marked with red triangles, and a trend line overlay.

The weekday vs weekend comparison makes the case for weekday-only sizing instantly visual.

The throttling timeline tracks interactive delay, interactive rejection, and background rejection over time. If those lines are trending upward, that's your early warning.

The SKU comparison chart shows all 11 SKUs as horizontal bars, colour-coded by status, with an arrow pointing to the recommendation. This is the one I share in meetings.

The workspace breakdown and top workloads charts show where the CUs are actually going.

The HTML report

Everything gets packaged into a self-contained HTML file you can download, open in any browser, or send to someone. I designed it mostly because I love a good HTML report with buttons and tooltips, but it's also useful for sharing with people who don't have Fabric access.

It has KPI cards, interactive charts, a reserved vs PAYG cost table, help modals on every section (I got tired of explaining "carryforward percentage" in meetings), and disclaimers throughout because this is advisory tooling, not a guarantee.

The subtitle on the report is "Your capacity. Your costs. Your next move." That's the conversation I want it to start.

Multi-capacity mode

When you run in multi mode, each capacity gets its own HTML report, plus you get a styled comparison table at the end. It shows every capacity side by side: current SKU, recommended action, health score with colour-coded badges, average utilisation, trend direction, and both PAYG and reserved costs.

This is the view I pull up when someone manages multiple capacities. Instead of going through each one individually, they can see the estate at a glance.

Saving to Lakehouse

If you want to keep a history of analyses, the notebook can save to a Lakehouse:

SAVE_TO_LAKEHOUSE = True
LAKEHOUSE_NAME = "CapacityAdvisorLakehouse"

Each run appends to a run_history.csv, so you build up a longitudinal view of how your capacities are evolving. It tries the attached Lakehouse first, then attempts to find or create one by name, and if all else fails, saves locally. I spent too long on that fallback logic, but it means the notebook never throws an error at the finish line, which matters when you're running it in front of a client.

The F64 question

Fabric has a hard licensing boundary at F64. Below it, everyone viewing content needs a Power BI Pro licence. At F64 and above, free Microsoft 365 users can view without Pro.

When you set NEEDS_FREE_VIEWERS = True, the tool won't recommend anything below F64, even if the utilisation numbers say an F32 would be fine. I don't automate that decision because it depends on things the tool can't see: your licensing strategy, how many viewers you have, whether that number is growing.

A note on Spark autoscale

If you're running heavy Spark workloads, autoscale billing for Spark is now generally available. Spark jobs can burst beyond your capacity's CU allocation and get billed per-CU for the overage. The tool flags this with a disclaimer because your actual costs could be higher than the SKU's list price. Worth knowing.

If you want to try the advisor against your own capacity, or if you just want to tell me it broke, I'd love to hear from you. A lot of what I've learned building this is going into a capacity course I'm putting together, so more on that soon.

Until next time, Prathy 👩🏾‍💻

References and further reading

A lot of what I know about Fabric capacity came from these resources. If you want to go deeper, this is where I'd start.

Microsoft docs:

Fabric throttling and smoothing: the deep dive on carryforward and the three throttling stages
Capacity Metrics app: the semantic model the notebook connects to
Fabric licences overview: the F64 boundary, Pro vs Free viewer access
Fabric pricing: current SKU pricing and reserved discounts
Capacity reservations: how reserved instances work

Community tools:

FUAM (Fabric Unified Admin Monitoring): open-source from Microsoft's Fabric CAT team, the version detection logic in the notebook is adapted from here
FCA (Fabric Cost Analysis): another Fabric CAT accelerator focused on FinOps
Fabric Chargeback app: Microsoft's tool for allocating capacity costs across teams

Blog posts that helped me:

Matthew Farrow's Fabric Billing Series: practical write-ups on pause/restart implications and scaling gotchas
Autoscale Billing for Spark in Fabric: worth understanding if you're running heavy Spark workloads
Automating Fabric Capacity Scaling with Azure Logic Apps (BI Insight): good walkthrough for schedule-based scaling