The Jungle of Metrics Layers and its Invisible Elephant

Discussion about the metrics layer (a.k.a. metrics store, a.k.a. headless BI, a.k.a semantic layers) within the modern data analytics stack is one of the hottest topics in the past few years in the data analytics space. Fundamentally, a metrics layer is a (missing) analytics stack component that should sit between a data- ware/lake/house and all data consumers. That’s the layer where you would get to define standard metrics once, ensuring consistency of definitions, whether accessed using BI tools, queried from Jupyter notebooks or retrieved in other ways.

Benn Stancil’s post on metric layer is a must-read (image from his post)

Historically, such an explicit layer did not exist, typically leading to metrics definitions hidden in the consumer-facing layers of the tech stack: BI tool formula languages (e.g. Tableau’s LoD expressions) or cells of manually maintained Excel spreadsheets. However, as analytics practices mature, aspects such as governance, consistency and reusability of metrics are increasingly important, and several players are popping up trying to solve that problem in different ways. To name just a few:

  • DBT deserves a special mention as a player that completely transformed the “T” of the ETL process and is trying to fit metrics definitions into its framework. However, unlike any other player, the current DBT’s approach is to be the layer where metrics are only defined but not queried. The implementation of converting metric definitions to queries/computation of metrics is delegated to upstream tools. You can see how that plays out in an announcement from Cube, which just released DBT metrics support.
  • “Thin” metrics layer players, which allow defining the objects/relations and the associated metrics in the semantic layer (typically via YAML or other configuration-like files). Once queried, they compile and run SQL queries against the data storage layer where the data is stored. Examples include Malloy/LookML, Metriql, Metlo, MetricFlow, Cube, and AtScale.
  • “Fat” metrics layer players that not only let you define the metrics but also have an engine/database to compute them (e.g. Spark-based Airbnb’s Minerva). As the underlying architectures become more involved, they are more likely to introduce their own, not-so-SQL-like query languages (e.g. Rel in RelationalAI), extend into the visualization layer (e.g. GoodData) or focus on tackling governance features (e.g. Kyligence).

I wanted to understand the players and their approaches in this highly dynamic space for a while now. But before diving into the jungle, I want to talk about elephants.

The Invisible Elephant

Here’s the thing: in none of the discussions about what a metrics layer should be / coverage of key players, have I ever seen a reference to a player that has functionality that fulfils most of the requirements typically listed for a metrics layer! A player that also happens to be a market leader in the BI space. I am referring to Microsoft Power BI. Before you dismiss this as a joke, did you know that Power BI:

  • Has a REST API for making queries.
  • Exposes an endpoint to make queries and manage metadata (read *and* write support!) based on a format you likely never heard of but one that happens to be ~20 years old, recognized among established industry players and can be consumed via python and Tableau.
  • Allows you to define complex metrics with a full-blown functional query language (DAX).
  • Enables users to supplement governed datasets with their data (e.g. a budget-to-actual analysis that relies on that one budget file that is still done in Excel on someone’s desktop while the actuals come from a data warehouse).

If the above sounds familiar, this post will only be helpful as an overview of other metrics layer players. But so far, I have only seen Microsoft/Power BI mentioned in passing in such discussions. At most, someone brings up MDX as a 90’s attempt to solve the metrics layer problem. Most of the time, the sentiment is similar to the one expressed in one of the first posts discussing metrics layers.

Tableau, PowerBI, and Datastudio are practically universal. However, none of these tools allow you to use the metrics you define outside of their UIs, via integrations or APIs.

https://basecase.vc/blog/headless-bi (Jan 2021)

Base Case Capital wrote the above post a while ago, and Microsoft released some of the relevant Power BI features only later. But the fact remains that the statement is simply no longer valid.

Now, I am not here to say that the metrics layer problem is solved and everyone should go home. I simply find it confusing that discussions never mention such a player. It likely stems from the vast gap between modern data stack players and the more old-fashioned enterprise analytics world where Power BI is still a newcomer. Also, given that Tableau’s strength is the aesthetics of visualizations and not data modelling, I am not surprised that people tend to lump all BI players together.

But given the size and importance of Microsoft as a market player, I think it’s essential to be aware of its flagship product capabilities, even if one is building a metrics layer for smaller companies that do not focus on enterprise-only requirements. So, at a minimum, it is interesting to look at Power BI from a metrics layer lens and consider if it could become a viable metrics layer itself (because today, it probably still isn’t).

Among traditional players, Power BI seems to be gaining market share, too

What would a perfect metrics layer be?

Before diving into Power BI, it’s worth setting some goalposts to compare against and understand how different players approach the challenges. After skimming through the docs, watching videos and trying to decipher the marketing messages of 10 players in this space, I identified several “feature space” dimensions worth highlighting.

I completed my research in June/July 2022, and things move quickly. For example, by the time I published this post in August 2022, Metlo had pivoted to be an “API security platform” ¯\_(ツ)_/¯. I did not update the post to reflect this.

Own database or just a pass-through?

Arguably, the most crucial difference that influences a lot of other features is whether the metrics layer stores data.

Thin vs fat metrics layers in a nutshell

Cube, Metriql, Metlo, MetricFlow, AtScale and Malloy do not store data. Instead, they act as proxy layers, converting the incoming requests to queries that they then pass on to the databases where the data is stored. Typical arguments for it include:

  • Data warehouse remains to be the single source of truth
  • Moving large amounts of data to another DB engine is expensive
  • If you’re running a cloud DW (e.g. Snowflake), why would you pay for another product if you can direct all queries to Snowflake and manage everything there?

However, even these players admit that in some cases querying raw data in data warehouses is suboptimal. Thus, some support the pre-aggregation – frequently used metrics and dimensions are precomputed and stored in their own database (e.g. Cube and AtScale; the latter in particular seems to have a pretty advanced feature set to do it efficiently) or materialized in the DW (MetricFlow and Metriql). Such caching layers, of course, introduces potential sync issues and can result in no-such-much-single-source-of-truth.

Other players build a metrics layer around a database engine. For example, RelationalAI has a graph database engine (and, frankly, RelationalAI is not a pure metrics layer product – it just can be used as one to some degree). Kyligence is built on top of Apache Kylin (OLAP engine). GoodData uses its own “Extensible Analytics Engine (XAE)”. A database engine allows these players to have more extensive feature sets related to metric definitions and query support.

How do these players respond to the advantages of the “thin-layer” approach?

  • They argue that the “metrics-with-DB-engine” approach results in a lower total cost of ownership (as analytical pay-per-use queries can add up)
  • They argue that you can’t run away from analytical modelling, and you’ll create views in your main DW to support the metrics layer anyway.
  • Tend to position themselves in the enterprise space where features such as row-/column-level access controls are essential (which are much easier to implement when you have a database as part of your product).

Semantic Models and Metric Types

Metrics are more complicated than simple sums or distinct counts. The most frequent requirements are to support metrics that are non-additive (e.g. % margin or churn rate), reusable (define revenue once, use it as a reference in other metrics), or require a dimension reference (e.g. year-on-year growth or % of total).

Once defined, metrics are typically evaluated with respect to a dimension (“revenue by country”) and not on a standalone basis. As a result, a metrics layer also needs to know how to relate data across dimensions, i.e. store information about its semantics (what I will refer to as the “semantic model”).

Across the players, approaches vary quite a bit (partly influenced by choice to have the database engine or not):

  • All players except DBT and Metlo allow you to define a semantic model. Most of the time, it’s the usual one-to-one, many-to-one and many-to-many relationship types. Notable exceptions include RelationalAI, where everything is a relation, and Metriql, which supports specific join-like features for use cases such as retention calculation.
  • All products support composable metrics, and most allow defining filtering conditions in their calculations (e.g. active_users: count(users) where active = 1)
  • Metrics typically can be at least SQL aggregates (avg, max, min, sum, distinct_count). A few players (Metlo, MetricFlow) additionally support “arbitrary SQL expressions”, e.g. sum(CASE WHEN user_is_active THEN 1 ELSE 0) or sum(price * quantity).

#Example semantic model and query definition - Malloy

source: users is table('malloy-data.ecomm.users') {
  primary_key: id
}

source: order_items is table('malloy-data.ecomm.order_items') {
  primary_key: id
  join_one: users with user_id
  measure:
    total_sales is sale_price.sum()
    order_count is count(distinct order_id)
}

query: sales_by_state_2020 is {
    where: created_at ? @2020
    group_by: users.state
    aggregate: total_sales
  }

  query: orders_by_status is {
    group_by: status
    aggregate: order_count
  }

  query: sales_by_month_2020 is {
    group_by: order_month is created_at.month
    aggregate: total_sales
  }

query: order_items -> sales_by_state_2020

Going further, you find features such as rolling window support (e.g. Cube) and cumulative metrics (e.g. MetricFlow), but so far, we are still in the land of relatively basic metrics. The players with DB engines are the ones that shine here:

  • Kyligence is MDX-based. And while MDX is not eye candy, you can do quite a bit with it. For example, you can define a year-on-year sales growth metric that will evaluate correctly against a set of years.
  • GoodData’s XEA engine seems to be built based on similar concepts as one can define metrics with respect to “contextual dimensions”, which allows the creation of metrics such as “sales as a % of region total”.
  • RelationalAI allows you to define completely arbitrary code-based metrics. Hard to beat that!
  • AtScale (a “thin” layer player) is an interesting exception to the trend. While I could not find any documentation, the videos on AtScale’s Youtube channel indicate that the metric/measure types they support are pretty complex.

How do I query all of that?

Once the semantic layers and metrics are defined, how do we query them? Most players will provide at least REST API and SQL-like/JBDC endpoints. Cube, Metlo, AtScale, RelationalAI, and GoodData have JS or/and Python/Julia SDKs, and some players support CLI or GraphQL access.

Given the importance of metrics layer integration with downstream tools (e.g. BI applications), I find it most interesting to look into how SQL endpoints work. Custom applications can leverage SDKs or REST API, but the whole point is to integrate with existing tools.

  • To start with, Malloy and RelationalAI have their own query languages and, as a result, do not integrate with downstream tools.
  • Cube supports Postgres compliant SQL that “has been tested with Apache Superset, Tableau Cloud, Tableau Desktop with JDBC driver and PowerBI”, with additional connectors in the works. The BI tools see all measures and dimensions as table columns. Metlo uses a similar approach (Presto JDBC driver, no information on BI tool compatibility).
  • Metriql provides a Trino-based JDBC driver with similar functionality as Cube. On top, it has implementations of several BI-tool connectors that ensure that BI tools recognize metrics properly (not as columns).
  • MetricFlow/Transform lists many BI tool integrations, but not all of them are created equal. For example, while Mode and Hex seem to support JDBC-based querying, Tableau support is based on Tableau’s Web Data Connector feature, which is a way to do a one-off extract of data from a remote API. They also have custom-made Excel and Google Sheets plugins that are unique among all the players I researched. Unfortunately, unlike Google Sheets, the Excel plugin is not formula-based, so I’d argue it’s more about making extracts than querying the data.
  • Kyligence supports ODBC-based SQL connections and exposes its data sources for XMLA and MDX-based querying. They have a Tableau connector, too.
  • AtScale does not advertise a SQL endpoint prominently (it may exist, though) and instead features tool-specific support (Live connections for Tableau, DAX for Power BI, MDX for Excel, and LookML for Looker)
  • GoodData is the only player with a SQL-like language but doesn’t prominently feature downstream integrations – which makes sense given they are the only player with a visual layer, too. It appears that JDBC-based querying can be done using PostgreSQL foreign data tables, but I am not clear if that would work with most BI tools.

Why such a disparity? I suspect some of it comes down to the fact that some products use SQL-like dialects that aren’t valid SQL. Below are a few examples from MetricFlow and GoodData. I can see why this would not work with Tableau out of the box!

#MetricFlow MQL language
SELECT AVG(booking_value) AS avg_booking_value
FROM MQL_QUERY(
<booking_value BY ds WHERE ds >= '2020-01-01'>
)
GROUP BY 1;

#GoodData MAQL language
SELECT {metric/amount_sold} WHERE {label/Color}="red"

The other aspect is that BI tools are complex and make assumptions about the data. Most BI tools use SQL as the primary query language but ensuring that the user experience in the BI tool is right requires additional connectors and custom work. On top, you may have cases where a non-SQL query language (e.g. DAX for Power BI or LookML for Looker) works even better. Just providing a JDBC/ODBC endpoint may not cut it. That’s why pretty much every metrics layer solution lists the BI tool support one by one, with different approaches. It’s not precisely “plug-and-play”.

Other features

Most of the products are built on code-based configuration files (i.e. semantic model / metric definitions are typically YAML files) and thus integrate easily into broader software development workflows. Some players (AtScale and Metlo) provide UIs for modelling, but it seems only GoodData opted for a UI-only model.

Finally, some of the products address the broader data governance and discoverability needs. For example, Cube, AtScale, and Kyligence support row- and column-based access controls. In addition, a few of the tools have the concepts relevant for self-service analytics (e.g. user workspaces). Some also include metric catalogues that allow finding the right metrics of interest.

Metric stores vs feature stores

One aspect that I did not cover at all in this post is the difference between metrics and feature layers/stores. Some players in the metrics layer space specifically focus on being feature stores, too.

Feature stores, in my view, are quite different because they serve ML models, not BI tools or analysts. Yes, they share some requirements (consistent definitions, a single source of truth). Still, feature stores have stricter latency requirements, and features are typically much more granular and operational (you likely won’t have user-specific metrics, but you will need user-level features for ML work). There may also be different needs related to “freezing” features at a given point in time. Therefore, I think it is best to leave out feature stores from this discussion.

Understanding Power BI

Now that we established what metric layers look like in the wild, let’s look at Power BI. Power BI is a very complex product with many features, and I do not intend to write a comprehensive review. Instead, I’ll focus on what matters.

First, it’s essential to understand the inner “architecture” of Power BI. On a high level, there are two components: Power BI Desktop (the authoring tool) and Power BI Service (the cloud offering). Power BI Service enables collaboration, provides APIs for querying Power BI datasets and has some more bells and whistles, but it won’t be the focus of this overview.

Instead, let’s dive into Power BI Desktop, which is an amalgamation of 3 distinct components (that had a previous life as individual Excel plugins):

Power BI Desktop “architecture”
  • Power Query – used to transform the data. Think about it as the “DBT of Power BI”. It uses a transform language M and is not vital when discussing metric layers. By the way, Power Query is still available as an Excel plugin today, and people who work with spreadsheets see it as a godsend.
  • Analysis services engine – the underlying calculation engine. It contains two subcomponents (more on that later) and allows storing/querying what Microsoft refers to as a Tabular model (semantic model and metric definitions). A previous reincarnation of it is built into Excel and is called Power Pivot (yes, Excel has its own columnar analytics DB engine that makes it easy to work with 1M+ data points).
  • The visual layer – what most people associate with Power BI. It issues queries to the underlying engine and renders charts/other visuals accordingly. Not very interesting in this context either.

Now, this would not be too different from other BI tools. For example, Tableau has Prep for data transformation. It also has similar architecture where the visual layer queries its internal storage engine (Hyper), and one can also query Hyper using external tools (e.g. SQL via a python library).

So what makes Power BI interesting from a metrics layer perspective? I’d argue it is:

  1. Feature-richness of DAX, the metric definition and query language
  2. External exposure of the semantic layer definition (Tabular model)
  3. Support for running DAX queries against documented endpoints
  4. Dual-role as a “thin” and a “thick” metrics layer enabled by two data connectivity modes and associated features

Let’s go through each of them in more detail.

Feature richness of DAX

DAX is an extensive query language that Power BI uses to define and evaluate metrics (“measures”). Compared to the players covered above, DAX is more powerful than most (except for RelationalAI). Analysts can create non-additive, “percentage of total”, time-based metrics that build on each other and use DAX in much more complex scenarios.

# Basic DAX Example to define metrics and run them against a dimension 
# (retrieves sales amounts and gross profit by country)
DEFINE
    MEASURE Sales[Sales Amount] = SUMX ( Sales, Sales[Quantity] * Sales[Net Price] )
    MEASURE Sales[Total Cost] = SUMX ( Sales, Sales[Quantity] * Sales[Unit Cost] )
    MEASURE Sales[Margin] = [Sales Amount] - [Total Cost]

EVALUATE
SUMMARIZECOLUMNS (Customer[CountryRegion],
    "Sales Amount", [Sales Amount],
    "Margin", [Margin]
)

For example, assume that the team responsible for product portfolio rationalization wants to have a metric “profitability of top 70% of product portfolio” and “profitability of bottom 30% of product portfolio”. Here are hypothetical requirements for the calculation:

  • First, estimate the sales share of each product, and sort them in descending order by total sales. Next, calculate a running cumulative sales share and attribute all products that fall into the top 70% of cumulative sales into the first bucket; the rest goes into the second.
  • Then, calculate total sales and total gross profit (sales less variable costs) for the two buckets. Finally, obtain the gross margin (%) by dividing the total gross profit by the total sales for each bucket. The two %-margin numbers are our two metrics.
  • The metrics need to support slicing where the reference product portfolio should change depending on the active filters (i.e. if a filter is for a specific geography, then the metric calculations should be based only on products sold in that geography). As a result, you cannot compute it in advance as each product may or may not be in the top 10%, depending on the filter context.

Not straightforward, right? With DAX, you can do that.

DAX is a functional query language that looks very different from SQL; many people do not like it. And seriously hate it. It’s not perfect, but, in my opinion, many complaints come down to “I’m trying to do something complicated and it gets really hard”. Furthermore, I can first-hand attest that DAX becomes much simpler once you master its key concepts (row context, filter context, iterators, expanded tables), which are unique and have no equivalents neither in SQL nor in mainstream programming languages.

Besides, the query languages offered by most metric layer players are far and away from even letting you try tackling similar analysis questions. One could argue that metric layers are not supposed to handle them. Metrics are supposed to be clear and simple, after all. But if they were such, why is there a growing need for metrics catalogues then?

Code-based access to the semantic model definition

90% of the Power BI content creators use Power BI Desktop to author data models (including metric definitions) and reports. All of that is saved together in a .pbix file and is UI-driven. Any self-respecting software developer would rightfully scoff at such a workflow. No code-based changes? Pf. No version control? Pffff. However, one should not forget that most Power BI content creators are not professional software engineers. UI works for them.

The other 10% of the content creators know that 3rd party tools can help. Power BI exposes its Tabular model via the XMLA endpoint (both read and write support), which enables manipulating it as code. The de facto tool to work with Tabular models is Tabular Editor.

So if you want to track your data model and metric definitions in source control, you can, for example, export a .bim file, which is just a JSON-based Tabular model definition. Here’s an example of how it looks.

{
  "name": "SampleFile",
  "compatibilityLevel": 1550,
  "model": {
    "culture": "en-US",
    "dataAccessOptions": {
      "legacyRedirects": true,
      "returnErrorValuesAsNull": true
    },
    "defaultPowerBIDataSourceVersion": "powerBI_V3",
    "sourceQueryCulture": "en-US",
    "tables": [
      {
        "name": "SampleTable",
        "lineageTag": "d9537d66-1467-41ae-a6da-3bd5fbe1aabc",
        "columns": [
          {
            "name": "Column1",
            "dataType": "string",
            "sourceColumn": "Column1",
            "lineageTag": "c0a52839-b825-4c12-a9fa-dd0f643150f7",
            "summarizeBy": "none",
            "annotations": [
              {
                "name": "SummarizationSetBy",
                "value": "Automatic"
              }
            ]
          }
        ],
        "partitions": [
          {
            "name": "SampleTable-8ec5b73c-24e8-412b-a782-fc4738bad7d2",
            "mode": "import",
            "source": {
              "type": "m",
              "expression": [
      //deleted for brewity              
]
            }
          }
        ],
        "measures": [
          {
            "name": "SampleMeasure",
            "expression": "COUNTROWS(SampleTable)",
            "formatString": "0",
            "lineageTag": "f54359d9-5306-43fc-8cdc-6ede90b89547"          
          }
        ],
        "annotations": [
            //deleted for brewity
        ]
      }
    ],
    "cultures": [
      {
        "name": "en-US",
        "linguisticMetadata": {                //deleted for brewity
        }
      }
    ],
    "annotations": [
      //deleted for brewity
    ]
  }
}

The Tabular model is documented, and, theoretically, you should be able to create a semantic model definition from scratch without even touching Power BI. Tabular models even include features that are not directly accessible from Power BI Desktop (and can only be enabled using 3rd party tools such as the Tabular Editor). The indispensable SQLBI team (who do a 10x better job explaining how Power BI works than Microsoft!) has written a good overview of various resources (and a bit of history) on the topic.

However, it is absolutely not all roses. For example, I understand that write support is limited to certain licensing types and setups. You cannot transfer changes made in Tabular Editor directly to Power BI Desktop, for example – that is only possible if you push the model to the Power BI Service only (which requires a premium subscription).

Query endpoints

Once the metrics and semantic model are defined, can you query Power BI externally? It turns out the answer is yes. Just like the visual layer internally sends DAX queries, one can do the same via the XMLA endpoint (which is how Tabular Editor and other 3rd party tools such as DAX Studio enable DAX-based querying) and a REST API endpoint. Not only that, those queries include support for metric and table definition at query time, too.

The experience is not perfect. While Python clients are popping up for the REST API, working in python with the more powerful XLMA endpoint is not straightforward, and there are some unexpected Microsoft/Windows dependencies here and there.

Obviously, this works best if Power BI is the visualization tool of choice in your organization. Then, REST/XMLA APIs act as additional access points for data scientists and custom application developers may need to query the data. Can you use Power BI as a metrics layer and another visualization tool of your choice? At least with Tableau, people have done it using the XMLA endpoint.

Data Connectivity Modes and Composite Models

As is not unusual for a BI tool, Power BI has two data connectivity modes. DirectQuery, where it fetches the data in real-time from data sources (e.g. Snowflake) or Import, where the information is stored in the columnar Vertipaq engine.

Power BI Desktop “architecture”

How does it work, exactly? First, each DAX query is sent to the Analysis Services engine (ASE), which generates a query plan to be executed by one or both components: the formula and the storage engine. In the case of DirectQuery, the DAX query is converted to SQL, and a DirectQuery storage engine is used to issue it to the underlying source (“storage engine” is a bit of a misnomer here – it is just a connector to the underlying database). On the other hand, if all the data is imported, it is retrieved by querying the Vertipaq storage engine where the information is actually stored. In both cases, the data obtained from the storage engine may be post-processed by the formula engine (to accommodate complex DAX expressions that the retrieval queries cannot directly compute). As usual, SQLBI has a great article that goes into more detail.

This architecture, by the way, explains why the usual Power BI implementation advice is to use the Import mode whenever possible. It results in faster queries as more complex DAX queries can be directly converted to Vertipaq retrieval queries vs requiring multiple SQL queries to an external database or more processing by the formula engine (which is slow).

From a metrics layer point of view, Power BI thus can act both as a “thin” and a “thick” client. I should note that it is easy to find complaints that Power BI issues non-optimized SQL queries, and relying heavily on a DirectQuery mode may not be the best experience.

The most exciting aspect of having such an architecture is the ability to combine data from different sources. Power BI calls them Composite Models. You can define semantic models with tables sourced from a file on your desktop, a DirectQuery to a DW, and a DirectQuery to a centrally maintained semantic model (set up by the central data engineering team using Import mode). Metric definitions can then span across the entire semantic model, combining different sources on the fly. Composite Models address self-service BI situations where you want to centrally manage core semantic models but also allow end users to extend them with their custom datasets or metric definitions.

So what’s the catch?

In a nutshell, Power BI:

  • Allows defining semantic models and metrics that are more flexible and powerful than most other players
  • Has API endpoints for querying the data and modifying the semantic model definitions
  • Supports self-service BI use cases
  • Has a bunch of enterprise features (that I did not cover – e.g. row/column access controls, incremental refreshes, datamarts, etc.)

Why isn’t it actually used as a metrics layer in practice? I’d speculate that there are a few reasons:

  • Awareness. The overlap between people who have built and maintained complex Power BI setups in an enterprise world and those who run “modern data stacks” in smaller organizations and need a metrics layer to solve their needs is not that substantial. With Tableau being a go-to tool for visualizations (and Tableau, to my knowledge, does not have feature parity in this space), awareness of Power BI capabilities is simply limited.
  • Skillset. With Power BI becoming as ubiquitous as Microsoft Office, there are many BI developers out there who can bake many dashboards in a day but have little appreciation for data modelling, engineering and governance best practices. Most of them have never heard of a metrics layer, to begin with. As a result, most Power BI implementations are not designed with metrics use cases in mind. Instead, they often become an unmaintainable clusterf*ck, leading to horror stories that convince others to stay away from such tooling and look for more “modern” solutions.
  • Cost. Power BI pricing and licensing scheme is… complicated. I suspect it is not very friendly for small to medium businesses that do not yet need the enterprise features but could already benefit from a metrics layer (a segment into which most startups fit).
  • Performance. While Microsoft will tell you that Power BI can support enormous datasets (and it has support for incremental refreshes), it is not hard to find complaints otherwise. The fact that companies such as AtScale exist and boast demos where they speed up queries by acting as intermediaries between Power BI and cloud DW implies some truth to those complaints. Some of them may be due to poor architectural choices, but it doesn’t change the fact that it is easy to form an impression that Power BI does not scale (even if it may not be true).
  • Preference for thin(ness). The idea behind the modern data stack is modular, interoperable components. Most entrants to the space are thin-layer players that delegate the computational workloads to the (cloud) databases/warehouses where the data is stored. With Snowflake and other similar players rising to prominence, it seems risky to introduce yet another data persistence layer. As a result, Power BI and its Import mode are not en vogue.
  • DAX. DAX is not exactly beloved by developers. With most metrics layers using SQL-like query languages, it is an additional barrier that may deter.

What’s next?

I am very curious to see how this space evolves. Is the future really “thin-layered”? Will the current economic conditions and focus on cost savings give more spotlight to time-tested (but modern) OLAP approaches? Will any of the BI behemoths start focusing more on metric layer / semantic model features? What about cloud DW providers? Metrics, after all, are tightly linked to underlying data models (if you want to go beyond the most straightforward metrics, that is), and it seems a natural layer for a database engine to extend to. Power BI is an example where such coupling brings many benefits with its Vertipaq engine and DAX. And, in all that evolution, will we see that we need a new paradigm for query languages beyond SQL to define and compute metrics? Is there a possibility that they will take some concepts from DAX?

In a perfect world, I would love to see Microsoft open-sourcing the minimum components needed to run an Analysis Services instance that supports a Tabular model and DAX query language. It would surely shake up the market and spur quite a bit of innovation. SQL, after all, was invented in the early 1970s but only standardized in 1986. So why could we not see something similar happening with Tabular models and DAX?

Is it likely that Microsoft would do that? A few years ago, it would have been entirely unthinkable – the compute engine was well in the product’s core. However, today’s enterprise Power BI Service/Analysis Service features (that range from PowerPoint/Teams integration to row/column security) are valuable enough that cannibalization effects would be much more limited, at least among large enterprises. It would still be risky, though, and I would not place my bets that Microsoft would do anything like it.

Having such a product out there, however, would make DAX/Tabular model a first-class citizen in the software engineering world. And, with time, it could pave the way to standardization. I think it is fair to believe that data modelling deserves something better than a bunch of YAML files and pseudo-SQL. Perhaps DAX/Tabular is not the answer on their own, but they could serve as inspiration for things to come.

Subscribe for infrequent new posts:

Hi! 👋 I am Aurimas Račas. I love all things data. My code lives on GitHub, opinions on Twitter / Mastodon, and you can learn more about me on LinkedIn.