Fabric Features not found in Azure Synapse or Power BI – Direct Lake

Power BI Data Architecturee for Semantic Data Modeling

With the general availability release of Fabric in November 2023, I am dedicating several posts to the features that are only in Fabric and not anywhere else. The first feature is Direct Lake. Direct Lake was created to address problems with Power BI Direct Query. Anyone who has used Direct Query knows what I am talking about. If you have implemented Direct Query, I am guessing you have run into one or all of these problems, including managing the constant hits to the source database which increase with the more users you have, user complaints about slow visuals, or the need to put apply buttons on all of your visuals to help with speed. Direct Query is a great idea. Who wants to import a bunch of data into Power BI? Directly connecting to the database sounds like a better idea, until you learn that that the data goes from Power BI to the database then back for each user one at a time, which means that Power BI must send more queries the more people are accessing reports. Users want to be able to access data quickly, have it scale well, and have access to the latest data.

Direct Lake Architecture

Direct Lake was designed to address the performance issues encountered by people who used Direct Query.  With Direct Lake you will connect to the existing data source live, but unlike Direct Query, you will be connecting not to a database, but to the Data Lake in OneLake. The diagram below from Microsoft shows the architecture of Direct Query for Power BI, Import, and of course Direct Lake.

Power BI Data Architecturee for Semantic Data Modeling

Power BI Data Architecture for Semantic Data Modeling

Direct Lake Performance

Direct Lake uses the same underlying technology that Fabric does, an implementation of Delta Lake on top of parquet formatted file. Delta Lake, is an open-source framework which is part of Spark and it allows parquet files to act more like databases as it contains ACID transations and DML operations which allow you to use common SQL Statements to modify and access the data. Microsoft has taken the underlying technology that the impemented in data warehousing, Polaris, to improve the way querying will work to make accessing data in OneLake faster.

Implementing Direct Lake

You need to have a Premium SKU, of P or F to be able to implement Direct Lake. Notice Premium Per User and A SKUs, commonly used for embedding do not have the ability to use Direct Lake. You also need to access the Admin Portal to ensure that the XMLA Endpoint is set to Read Write and the Integration settings to Allow XMLA Endpoints and Analyze in Excel is set to on. You can then provision a Fabric premium workspace lakehouse as your data source in Power BI when you use it as the sematic model for a report you create in the Power BI Service. Now you make look at this and think, well this is all well and good, but I don’t update the data lake as fast as I do my database, how am I going to make this work?

Mirroring

Mirroring proivdes the abilty to create a read only copy of your database in OneLake, which you can then use as a source for your Power BI reports. This was announced November 15, and should be releasing soon.  The way this will work is you will make a connection to the data source within Fabric, and a read only copy of the database will be created in OneLake. There is no ETL required. Just make the connection and the data is available.  I look forward to writing more about this feature when it is released as it will make it possible to have real time data access in Power BI.

Current Limitations of Direct Lake

As this is a new feature, there are some elements of Direct Query that are not there yet. Before you decide to use it you might want to understand some of the current limitations. The first is there is no way to create a Direct Lake connection from within Power BI Desktop. You must create the reports that use it in the service. You cannot mix the lakehouse with existing models as it must be sourced from a single lakehouse or data warehouse. You cannot use views created in the lakehouse as tables in your Power BI semantic model. Direct Lake cannot be used in embedded. You also cannot use calculated columns or tables.  If you implement row level security (RLS), the model is switched to Direct Query.othI imagine many of these limitations will be addressed over time, and when they do, I will be writing an update to this post.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Four Different ways to incorporate Azure Synapse and Power BI

More and more I am working with Azure Synapse analytics as the location of data for Power BI, and there are 4 different ways that you can incorporate Power BI.

Incorporating Power BI inside of Synapse

The first is to connect Power BI to Azure Synapse to explore and visualize data. You can examine your datasets that you have loaded in your datalake with Power BI to help with the analysis of the data either for a data science solution or to determine how you are going to transform the data. For more information on how to do this, check out my previous blog .

Traditional Data Warehousing Sourced in Power BI

Once you have curated the data within Azure Synapse and it is ready for reporting, Power BI can do that too. Of course if you have large data and a dedicated SQL pool in Azure Synapse where your data warehouse is located you can use Power BI to visualize that large data. Given the fact that this solution is for data greater than a terabyte, you may also want to create some aggregated views in Azure Synapse to aggregate the larger data set.

Data lakehouse connection to Power BI

More and more I see companies wondering if they need a database. Some really don’t because their data sets are not that large, and what they want is a data source for Power BI. There are two different was of accomplishing this. Configure Power BI to connect to a Data Lake Gen2 to use data curated in Azure Synapse Companies are using Azure Synapse to create data lakes with data designed for reporting. See how you can connect Power BI to an Azure Data Lake Gen2 account which contains this data.

Serverless Database data source in Power BI

For my clients, what I recommend is to create a data lakehouse by curating a finite data set for Power BI reporting with a series of files in a curated folder. I will then create a Serverless data base in Azure Synapse and use that to connect to Power BI. What is great about that is if they need to run some kind of an adhoc query they can do that as well, providing a lower cost solution than was previously available.

If you are interested in hearing me talk about this topic, you might be interested in the recorded talk I gave to the DFW Power BI User group.

Regards,

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Azure Synapse and Power BI

There are a number of different use cases for Azure Synapse, which no doubt you have already read about. Of course you can use Azure Synapse to build a scalable Data warehouse, use it to create machine learning solutions with Spark, but have you thought of why you might want to use it with Power BI? There are two ways which you can integrate Azure Synapse with Power BI: connecting to a Power BI Workspace from within Azure Synapse or using Azure synapse to provide the source of all of the data used in the Power BI data set. We will explore both methods in this blog.

Linking to Power BI from within Azure Synapse

From within Synapse you have the ability to access a Power BI workspace so that you can use Power BI from within Synapse.  Your Power BI tenant can be in a different data center than the Azure Synapse Workspace, but they both must be in the same Power BI Tenant.  You can use Power BI to look at any data you wish, as the data you use can be from any location. When this blog was written, it was only possible to connect to one Power BI workspace from within Azure Synapse. In order to run Power BI as shown here, first I needed to create a Linked Service from within Synapse. Select Power BI from the list of options, and then select the workspace from the list available for your tenant and create the linked service.  After the linked service is created, you can go to develop in Azure Synapse, and see the Power BI option listed on the left side of the screen.

Connecting Power BI to an Azure Data Lake Gen 2

 

 

 

 

 

 

 

 

 

 

 

 

 

As much of the functionality of Analysis services is being migrated to Power BI, many people are choosing to use Power BI instead of migrating to Analysis services, especially if you have Power BI Premium.   Another option you have is to gather the data that you are going to load into Power BI into an Azure data lake Gen 2 and create a “Data Lakehouse” as the container where you gather and clean up the data you are going to use in Power BI.  There are a few things that you will want to keep in mind if you want to configure Azure Synapse to do this.  The first is that your Power BI tenant and your Azure Data Lake Gen 2 must be in the same location.  Right now this is an issue as Azure Synapse is not available in all data centers.  To configure Power BI to use Azure, you need to go to the Admin section as shown above and select the subscription, resource group and storage account.  You will want to make sure that you have the proper permissions granted in Azure RBAC for user groups who need to access the data as they need storage blob owner permissions.  The Power BI services also need reader and data access roles granted as well.  You can use Azure Synapse to gather the data and then visualize it with Power BI.  I see a lot of use cases for this model as I suspect more people are going to be moving away from creating a database for Power BI and instead moving the datastore to Azure.

 

Regards,

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Exploring Azure Synapse

In the past few months, I have been examining Azure Synapse and what it can do.  When it was first released in November of 2019, the first functionality that was rolled out was an update of Azure SQL DW.  For this reason, many people think that Synapse is just an improved version of a cloud data warehouse.  Microsoft did improve SQL DW when it moved it to Synapse.  The biggest architectural design change is the separation of the code from the compute, a theme with many web projects, which allows the compute power to be increased when need dictates and scaled down when computing needs change.  Within Synapse, resources are allocated as Pools and you can define a sql pools to run data warehouse and later change the compute to a different resource.  You will still need to partition your DW as large datasets require partitioning to perform well.  Subsequently Microsoft Released the Azure Synapse Studio to be a container for a larger environment of tools and notebooks to interact with them.

Non-Data Warehouse Elements of Azure Synapse

To me the more interesting parts about Azure Synapse have nothing to do with data warehouses.  Azure Synapse also contains the ability to query files stored in Azure Data Lake Gen 2 as if they were SQL files. This is a great way to analyze large data without first cleaning it up and putting it into a relational environment. Within Synapse you can formulate a query using syntax for selecting parts of files, providing the ability to look at many files as if they were one. You can also create processes which bring data into your synapse environment using Orchestration, a process that people who are familiar with Azure Data Factory will find very familiar. Synapse also contains the ability to analyze data in Cosmos DB without doing ETL or moving the data at all using a scalable architecture which will not impact the transactions being processed simultaneously on the same Cosmos DB.

Azure Synapse and Spark

Check out the promotional video here

By far the most interesting component of Azure Synapse is the Spark connection. Microsoft has added the ability to create Spark Pools into Azure Synapse.  To be honest I was somewhat surprised that this functionality is included here first and not in Azure Machine Learning, where to use Spark you need to access clusters created them in Databricks.  Spark provides the ability to dynamically scale resources when running processes.  This is very handy when writing machine learning code which can really use the performance improvements Spark brings.  Because this is Microsoft’s Spark, you can also write your code to access it in .Net if you like, in addition to the more common Spark Languages, Scala, R or Python.  You can also incorporate the AutoML API created for Azure Machine learning in R and Python so that you can use the power of Azure to select your algorithm and hyperparameters instead of spending time doing it yourself.

Getting up to Speed with Synapse

There is a lot to learn when it comes to Synapse as it combines a lot of different components into one environment. As more and more data is being migrated to the cloud, it is uniquely designed to handle both big data components containing raw data, managed data lakes as well as more traditional data warehouse needs.  It can also be the location where all of the data is processed, secured, cleaned and analyzed using Machine Learning. There is a lot to cover and since it is new, there is not a lot of places yet where you can learn more about it.  If you are interested in a deep dive on Azure Synapse and how to use it in a Modern Data Warehouse, sign up for my precon at PASS Summit 2020 where I will cover the topic in depth.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur