Recently I have been working on Analysis Services Tabular as a data source to Power BI and wanted to share some of the best practices for developing a model. Following these guidelines will improve the performance and scale of your project. There is a lot of material to cover, so I have broken up the information into this series.
Analysis Services Internal Dictionary Encoding
Understanding the internal components drives a number of design elements. The rules for Tabular are different because the underlying technology is different. Analysis Service is a columnar based database, which means that it is most efficient on summarizing columns of data. This methodology is incorporated in the Vertipaq Engine which is compresses the data in memory. The data is stored within a dictionary and is modified to store information optimally. Internally all of the data stored is translated into an integer value. This means that two date fields when joined in a model will be as efficient as two integer values, as all values are internally represented as numbers. Dictionary entries are created for each column. The most efficient compression would be if the column contains one value as there would only need to have one entry. If there were 4 different numbers in the column and the values were sorted on that column, there would be 4 dictionary entries. If instead the numbers were ordered sequentially, such as 1,2,3,4,1,2,3,4, compression would not be possible and there would be an entry for each row. When possible, data should be sorted so that like numbers are together. Whenever possible, do not load unique values, as this data cannot be compressed.
Microsoft recommends that when getting data from a relational database, the data retrieved should be encapsulated with views. This creates a record in the source data of the information used in by Power BI in the source database which can help ensure that the data structures are not modified in a way that will adversely impact the model. The views can be materialized which can prevent the structure from changing without changing the view. This provides visibility to the database administrator of the structure needed for Analysis Services. Using views also can insulate Analysis Services from being dependent on the existing data model as the view does not have to match existing data types. All views need to include an explicit list of columns for a deliberate inclusion of requested columns and should be sorted.
Extra columns should be eliminated from the view and the model as they will increase the size for no reason. The views should only contain columns which are actually used in either the DAX, reporting fields or to join tables together. Do not bring in every field in the source table as it is not a best practice. It is more efficient to only bring in the columns needed rather than hide columns not used. If the columns are not needed in DAX or for joining tables, do not include them in the model. The views need to be written to run as efficient as possible on the source system, so be sure to tune the views as well.
The data type selected will impact the physical storage used, not the compression of the models in memory. It is important whenever possible to reduce the cardinality of the data in order to be able to sort the data effectively. When storing decimal numbers, unless you need many significant digits, store the data as Currency as it will take less space in physical storage than decimal.
Following these rules will help your model be better interpreted for dictionary encoding and will improve the compression of your data meaning it will load faster. In the next part of this series I will discuss other things you can do to improve usability and structure of the data model. Let me know if you find this helpful by adding a comment or subscribing to this blog.
Data aficionado et SQL Raconteur