With the general availability release of Fabric in November 2023, I am dedicating several posts to the features that are only in Fabric and not anywhere else. The first feature is Direct Lake. Direct Lake was created to address problems with Power BI Direct Query. Anyone who has used Direct Query knows what I am talking about. If you have implemented Direct Query, I am guessing you have run into one or all of these problems, including managing the constant hits to the source database which increase with the more users you have, user complaints about slow visuals, or the need to put apply buttons on all of your visuals to help with speed. Direct Query is a great idea. Who wants to import a bunch of data into Power BI? Directly connecting to the database sounds like a better idea, until you learn that that the data goes from Power BI to the database then back for each user one at a time, which means that Power BI must send more queries the more people are accessing reports. Users want to be able to access data quickly, have it scale well, and have access to the latest data.
Direct Lake Architecture
Direct Lake was designed to address the performance issues encountered by people who used Direct Query. With Direct Lake you will connect to the existing data source live, but unlike Direct Query, you will be connecting not to a database, but to the Data Lake in OneLake. The diagram below from Microsoft shows the architecture of Direct Query for Power BI, Import, and of course Direct Lake.
Direct Lake Performance
Direct Lake uses the same underlying technology that Fabric does, an implementation of Delta Lake on top of parquet formatted file. Delta Lake, is an open-source framework which is part of Spark and it allows parquet files to act more like databases as it contains ACID transations and DML operations which allow you to use common SQL Statements to modify and access the data. Microsoft has taken the underlying technology that the impemented in data warehousing, Polaris, to improve the way querying will work to make accessing data in OneLake faster.
Implementing Direct Lake
You need to have a Premium SKU, of P or F to be able to implement Direct Lake. Notice Premium Per User and A SKUs, commonly used for embedding do not have the ability to use Direct Lake. You also need to access the Admin Portal to ensure that the XMLA Endpoint is set to Read Write and the Integration settings to Allow XMLA Endpoints and Analyze in Excel is set to on. You can then provision a Fabric premium workspace lakehouse as your data source in Power BI when you use it as the sematic model for a report you create in the Power BI Service. Now you make look at this and think, well this is all well and good, but I don’t update the data lake as fast as I do my database, how am I going to make this work?
Mirroring
Mirroring proivdes the abilty to create a read only copy of your database in OneLake, which you can then use as a source for your Power BI reports. This was announced November 15, and should be releasing soon. The way this will work is you will make a connection to the data source within Fabric, and a read only copy of the database will be created in OneLake. There is no ETL required. Just make the connection and the data is available. I look forward to writing more about this feature when it is released as it will make it possible to have real time data access in Power BI.
Current Limitations of Direct Lake
As this is a new feature, there are some elements of Direct Query that are not there yet. Before you decide to use it you might want to understand some of the current limitations. The first is there is no way to create a Direct Lake connection from within Power BI Desktop. You must create the reports that use it in the service. You cannot mix the lakehouse with existing models as it must be sourced from a single lakehouse or data warehouse. You cannot use views created in the lakehouse as tables in your Power BI semantic model. Direct Lake cannot be used in embedded. You also cannot use calculated columns or tables. If you implement row level security (RLS), the model is switched to Direct Query.othI imagine many of these limitations will be addressed over time, and when they do, I will be writing an update to this post.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur