Upgrading a data model from Power BI to Tabular got a lot harder

If you were looking to upgrade a Power BI model to Analysis Services Tabular, hopefully you decided to do it before March 1, 2019, that was the day Microsoft decided to remove the tool from Azure. I know I needed to upgrade after that and I was really sorely disappointed. I also blew my schedule as the upgrade process went from a 20 minute process to something that took over 2 hours. I am writing this blog to save you some time as I learned a few tricks which I wish I would have known about before upgrading.

Preparing your Power BI model to Upgrade to Tabular

Unless you are upgrading to analysis services on SQL Server 2019, chances are you are going to have to review your DAX code and make some modifications as DAX on the other versions of SQL Server are not the same as Power BI. I was upgrading to AS on SQL Server 2016, there were some commands that I had to manual edit out of the JSON file. If you have any new DAX commands, take them out of your Power BI Model which means you will not have to manually edit the JSON file to remove them when the new commands are flagged as errors. Make sure your Power BI Model does not include commands such as SELECTEDVALUE, GENERATESERIES as well as all of the automatically generated date hierarchies. After your Power BI desktop file is clean, leave it running as you are going to need to have it running for the next step.

Connect to your Power BI Model in Analysis Services

Power BI is running a version of Analysis Services tabular and once you figure out what version is running; you can connect to it.  Open up the Task Manager and take a look at the Power BI Desktop in the Processes tab by clicking on the arrow at the right-hand side so that you see all of the processes running.  You see by the picture here, that there are 19 processes running with one Power BI file loaded. Right click on the instance of Microsoft SQL Server Analysis Services running within Power BI Desktop and select Go to Details. This command will bring you to the Details table in Task Manager and will highlight which version and PID to connect in Analysis Services on your computer. You will notice my PID is 3992.  Using this number I went to the command prompt and typed

Netstat -anop TCP | findstr 3392. 

This command returns the address I need to access Power BI model from Analysis Services, 127.0.0.1:51328

That’s it. Now I can look at my Power BI model in SQL Sever Analysis Services from within Management Studio.  It looks like this

TL;DR This model is not usable and you will have to do some tweaks to it to make it work.  The first step is to fix the connections, then edit some JSON. Right click on the connection and select properties. Click on the ellipse button next to the connection string, and you will get this error message. 

Once you click ok, you can edit the connection to something that will work.

All of my Power BI tables were connecting to views in SQL Server, which is a best practice as this allows you to select only the fields you need and provides you with the ability to sort your data as it is read in, which can improve the compression.  I changed the Provider to SQL Server, and fixed the connection.  Once the connections are changed, you will need to create an XMLA script and edit it. Right click on the Database with the GUID name and select Script->Script Database As->Create To-> File.  Give the file a name as you will be using it after connecting to your analysis Services instance. 

Editing the XMLA File

Open up the XMLA file you created in Analysis Sr.  The first thing you will need to fix is the compatibility level.  Power BI set the value to 1465. 1400 is for Azure AS, SQL Server 2017 and 2016 use 1200, so I edited the script to change the value 1200.  I also changed the database name to “SampleAS” from the GUID that it was assigned. I then executed the script, and got my first error

Executing the query …

The JSON DDL request failed with the following error: Unrecognized JSON property: variations. Check path ‘model.tables[0].columns[11].variations’, line 362, position 30..

Run complete

I typed Ctrl G and entered the line number to fix the first error which was on this line

                “formatString”: “dddd\\, mmmm d\\, yyyy”,

I deleted that line and ran it again. This time I got this error.

Executing the query …

The JSON DDL request failed with the following error: Unrecognized JSON property: variations. Check path ‘model.tables[0].columns[11].variations’, line 361, position 30..

Run complete

 

Line 361 starts with the word variations. The section is contained within square brackets [].

   “variations”: [

                  {

                    “name”: “Variation”,

                    “relationship”: “dc6b309e-0967-4eea-8e3f-7a308661dca5”,

                    “defaultHierarchy”: {

                      “table”: “LocalDateTable_c6cfa169-57d3-4da3-bcbc-21b677dde835”,

                      “hierarchy”: “Date Hierarchy”

                    },

                    “isDefault”: true

                  }

                ],

Date Hierarchies are not supported the same way in AS, so I need to edit the XMLA to remove this section. I will need to repeat this step for every auto-generated data hierarchy I have, which is why I recommend that you remove them as it will save you a lot of time. Another line which caused an error was this line.

                  “query”: “SELECT * FROM [Rates]”,

My view is not called Rates. It is called vw_PaidRates. The table in the Power BI model is called Rates. There is no table or view called rates so I needed to change Rates to the name of the view vw_PaidRates. I repeated this for every table in the XMLA file. I estimate that the editing process took over an hour to eliminate all of the code that would not work. After that I was able to create an project in visual studio using the Import From Server(Tabular) option.

Upgrade Process from Power BI to Tabular

The process was tedious, and it made me long for the 15 minute Azure AS conversion. The method I outlined here is not supported by Microsoft, so it is up to you to make it work.  I have been able to successfully convert a model, just takes a lot of time. Fixing the Power BI model up front helps, and it would also help if little of your text is formatted but there are some things, like the query fixes that you just have to do on your own. I understand that the conversion process is much easier if you are using SQL Server 2019 as that version of Analysis Services is mostly compatible with the version created by Power BI.  Unfortunately I don’t have any clients yet who are using 2019.  I hold out hope it will be easier to do next time, but I will allow for a lot more time to make the conversion happen.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Analysis Services Tabular Best Practices – Part 3

This is my last installment in my series on improving Analysis Services Tabular. If you want to start at the beginning, please click here. I started writing this because I wanted one place to point clients to improve the performance and scalability of Tabular Models, and there are a lot of places you need to go to find all of this information.  I hope you find it helpful.  Let me know if you have any questions or comments regarding this post I hope you leave some comments or perhaps ask me a question on twitter.  With that here are more things that you need to do to improve your Analysis Services Tabular implementations.

Optimize your DAX Code

While it is not easy to performance tune DAX you can do it, by evaluating the DAX Query Plan and VeritPaq Queries, and SQLBI’s VertiPaq Analyzer. Also, you can also look to use functions which perform better, for example COUNTROWS instead of DISTINCTCOUNT or ADDCOLUMNS instead of SUMMARIZE. Whenever possible use the CALCULATE function instead of the FILTER function, as CALCULATE filters for context inside the parenthesis and are more efficient. Also all of the iterative functions SUMX, COUNTX etc., should be used sparingly as the row-by-row transactions they create are less efficient and should be used only when SUM or COUNT will not work.  When evaluating if a value missing, if it is possible, use ISEMPTY instead of ISBLANK as ISEMPTY looks only for the presence of a row, which is faster than the evaluation performed by ISBLANK.

Sort by Settings

There are times when the sort needs to be specified as something other than alphabetic.  A common place where this is done is for months, as people want to see January appearing first rather than April.  April will appear by default as the dates are sorted alphabetically.  To change this, modify the Month Name column property Sort by Column to sort by the Month number field so the months will always appear with January first.  This process should be repeated for any column where the alphabetic sort is not likely what people want to see, such as Day of Week.

Formatting Fields

All values should be formatted in a way that will look good on reports. Most of the time this formatting has been included, but there are some instances where the general format is used instead. Most users want numbers which include a thousands separator.  In analysis services tabular, this is accomplished by using a custom format and the format style #,##0.  This will add the thousands separator when needed and will never show a decimal number.

Hiding values

All values not used in a report should be hidden.  This will make things less confusing for people who are writing reports as the values that appear are ones which they actually use. For purposes of joining tables, many times you need to bring in key values to create relationships, but they are not values which should be reported upon as they have no business meaning.

Business Friendly Column and Table Names

When creating Power BI reports, the column names in the table are the names used in the reports.  As a best practice, the names in the tabular model should be business friendly names.   All of the column names visible in the model should be names which can be readily understood by anyone reading the report.  Names should include special characters like spaces and should be appropriately capitalized in the normal function names appear.  For example WK_NUM would appear as something like “Week Number” would be understandable to the user and look appropriate on reports. Using views can also provide the ability to change the names to user readable names outside of the model, which is also considered a best practice as then the model does not need to be changed to have user specified names.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Analysis Services Tabular Best Practices – Part 2

I was mentoring a client recently trying to help them with some tips for using tabular. I couldn’t find all of the information I was looking for, so I decided to write all of them down so that next time I could just go to my blog for the information.  There are a lot of things to consider so I broke them up into 3 parts.  To make sure you get all of them, please subscribe to my blog.  And now, here are some more tips for tabular.

Marking all Date tables as Date

Tabular models include a number of Time Intelligence functions which provide the ability to quickly and easily calculate things such as Rolling 12 months, Year-to-date calculation, and more. In order for this functionality to be possible, the table needs to be marked as date.  In order for this feature to be available, there needs to be one contiguous date field in the table, meaning there should be no gaps in the date field. If there is one default missing date field which is many years from the start of the contiguous date values, the field needs to be removed or all of the dates after that date need to be added ensuring there are no gaps in the dates. You can choose to join on the date value or another key.

Modify Timestamps to Split Date and Time

When there is a field where the date and time are both needed, the values should be separated so that there is both a date field and a time field.   Having date time in two fields assists in the dictionary encoding as the date and time fields can be separately sorted into columns where the values are the same, decreasing the number of dictionary entries.  To further improve compression, only include the seconds if absolutely necessary, as add decreasing the cardinality will increase compression.

Modeling Role Playing Dimensions and Active Relationships

All relationships between tables should be active relationships.  If you have any dashed lines which connect portions of your model, these are inactive relationships and should be removed, unless you are writing DAX Code to use them. DAX code must be written to use inactive relationships and users often find it confusing. As a best practice for situations where there are multiple dates in the fact table, such as Order Date, Ship Date and Due Date, is to have 3 different date tables. The other date tables can be created by using a calculated table to create a second date table and join the other value to that date table.  Step-by-step instructions for creating a calculated table are included in the Reference section1.  Having multiple date tables is a best practice design for tabular models as it is a common for there to be multiple dates in one table.

Partitioning Tabular Models

Partitioning Tabular models does not improve query performance. In fact multiple partitions can actually make performance worse.  Partitioning should be implemented to assist in the loading of the data.  If you do not need to create a partition to make a load timeframe, then there is no reason to do it.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Analysis Services Tabular Best Practices – Part 1

Recently I have been working on Analysis Services Tabular as a data source to Power BI and wanted to share some of the best practices for developing a model. Following these guidelines will improve the performance and scale of your project. There is a lot of material to cover, so I have broken up the information into this series.

Analysis Services Internal Dictionary Encoding

Understanding the internal components drives a number of design elements. The rules for Tabular are different because the underlying technology is different.  Analysis Service is a columnar based database, which means that it is most efficient on summarizing columns of data.  This methodology is incorporated in the Vertipaq Engine which is compresses the data in memory. The data is stored within a dictionary and is modified to store information optimally. Internally all of the data stored is translated into an integer value.  This means that two date fields when joined in a model will be as efficient as two integer values, as all values are internally represented as numbers.  Dictionary entries are created for each column.  The most efficient compression would be if the column contains one value as there would only need to have one entry.  If there were 4 different numbers in the column and the values were sorted on that column, there would be 4 dictionary entries.  If instead the numbers were ordered sequentially, such as 1,2,3,4,1,2,3,4, compression would not be possible and there would be an entry for each row.  When possible, data should be sorted so that like numbers are together.   Whenever possible, do not load unique values, as this data cannot be compressed.

Create tables with Underlying Queries with Views

Microsoft recommends that when getting data from a relational database, the data retrieved should be encapsulated with views.  This creates a record in the source data of the information used in by Power BI in the source database which can help ensure that the data structures are not modified in a way that will adversely impact the model.  The views can be materialized which can prevent the structure from changing without changing the view.  This provides visibility to the database administrator of the structure needed for Analysis Services.  Using views also can insulate Analysis Services from being dependent on the existing data model as the view does not have to match existing data types.  All views need to include an explicit list of columns for a deliberate inclusion of requested columns and should be sorted.

Extra columns should be eliminated from the view and the model as they will increase the size for no reason. The views should only contain columns which are actually used in either the DAX, reporting fields or to join tables together.  Do not bring in every field in the source table as it is not a best practice. It is more efficient to only bring in the columns needed rather than hide columns not used.  If the columns are not needed in DAX or for joining tables, do not include them in the model.  The views need to be written to run as efficient as possible on the source system, so be sure to tune the views as well.

Data Type Selection

The data type selected will impact the physical storage used, not the compression of the models in memory.  It is important whenever possible to reduce the cardinality of the data in order to be able to sort the data effectively.  When storing decimal numbers, unless you need many significant digits, store the data as Currency as it will take less space in physical storage than decimal.

Following these rules will help your model be better interpreted for dictionary encoding and will improve the compression of your data meaning it will load faster.  In the next part of this series I will discuss other things you can do to improve usability and structure of the data model. Let me know if you find this helpful by adding a comment or subscribing to this blog.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Calculated Tables and Role-Playing Dimensions

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Working with role playing dimensions, which are found when you have say multiple dates in a table and you want to relate them back to a single date table, have always been problematic in SQL Server Analysis Services Tabular. Tabular models only allow one active relationship to a single column at a time. The picture on the left shows how tabular models represent a role playing dimension, and the model on the right is the recommended method for how to model the relationships in Analysis Services Tabular as then users can filter the data on a number of different date tables.

TabularRolePlaying dimension Modeling

 

RolePlayingDimension

The big downside to this is one has to import the date table into the model multiple times, meaning the same data is imported again and again. At least that was the case until SQL Server 2016 was released. This weeks TSQL topic Fixing Old Problems with Shiny New Toys is really good reason to describe a better way of handling this problem.

 

Calculated Columns: The solution for Role Playing Dimensions

SQL Server 2016 provides a new method of solving the role playing dimension problem, using a calculated column. Instead of copying in the source from the date table, instead create a formula to get a copy. First switch to the data view, of the model. Then select Table->New Calculated Table. ThSSASScreenCalcTablee screen will change to the new table screen and the cursor will be pointed to the formula.

In my model I have one table called date. I am going to add a calculated table called order date. The DAX is couldn’t be simpler. Just select the table named ‘Date’ which is shown in the picture below. Rename the table to something more meaningful, like Order Date and that is it. The modeling required is the same, but now the model size does not increase to accommodate all of the date tables needed, as there is only one copy of the date table referenced multiple times. If you are using Power BI this same concept can be used for handling role playing dimensions as well.

SQL Server 2016 had a lot of great new features, and in addition to the flashy ones like R there are a lot of great enhancements to the Tabular model that are worth investigating as well.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Moving from Power BI to Analysis Services Tabular

Update By: Ginger Grant
Date: August 2, 2015

Power BI is a great tool for creating data mashups and great visualizations. There are a lot of posts on how and why to use Power BI for these tasks, including mine. PowerBIDesktopModelAfter working with a number of clients on how to make implement Power BI in their data environments, I thought it might be helpful to talk about how Power BI works internally to provide guidance to determine how best to implement it. One of Power BI’s big strengths is the ability to bring data in from a lot of different sources and put them together into a single data model. Power Pivot in Excel and Power BI Desktop in the Relationship’s page both allow you to create data models. Power BI on the web, does not, as of this writing allow the ability to create data models from multiple sources. Power BI is able to handle a large amount of data to be stored within it using the Vertipaq engine which compresses the data and loads it in memory. How much data? Well that has to do with how much memory you have.

Having 8 GB of Memory on a 32 bit Operating System is Worthless

Recently I was working at a client who wanted to load five year’s worth of operating information into Power BI. He knew that Power BI needs a lot of RAM, so he requisitioned 8 GB of RAM for his laptop so he would be able to load this much data. However, his IT department had a policy of only installing 32 bit operating systems on the computers. When 64 bit operating systems first became available, often times the computer bios or the mother board wouldn’t support it. Until very recently, outside of developers, most people didn’t have a business reason for 64 bit operating system to do their job.

Power BI will use all the memory you have available. What is available? Here’s some math to explain how much memory is available. A 32 bit Windows operating system can assess 2 32 = 4GB of RAM, and you don’t even get all of that as the system uses some of it. This math applies if a 32 bit version of Office 2013 is installed onto a 64 bit version of the operating system. The client couldn’t load five years of operational data into Excel because he didn’t have enough addressable memory, since with a 32 bit operating system he had 4 GB of RAM he can use, period.

Reasons for Data Modeling in Excel Power Pivot

Power BI is a great tool, but it is not the only tool. If you want to load up years of data into your model, at some point, even if you have a 64 bit OS and 32 GB of RAM, eventually you will run out of memory. Well then what do you do?ImportPowerPivot If you have developed your data model in Excel, no problem you can upgrade it to Analysis Services Tabular. Visual studio has a tool to make migrating to Analysis Services Tabular easy to do. I find Power Pivot model to be one of the best ways to get started on creating a Tabular Model as you can test out your data modeling concepts very quickly. If you have developed your data model in Power BI Desktop, as of this writing there is no way to migrate directly to Analysis Services Tabular. If you are familiar with Power BI, you may be pleasantly surprised at how similar Analysis Service Tabular is. Along with improved performance accessing the data, there are a number of security features which are not available in Power BI in Analysis Services Tabular. With Power BI, you can create a corporate solution from a desktop application, which is one of the neatest things about Power BI.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur