Fallibility of Machine Learning: Why Lenovo will not sell me a laptop

Machine Learning is being adopted by more and more companies to assist in the sales process.  Like all technology, Machine Learning is not correct all of the time.  In fact, models with 75% accuracy are commonly accepted as good models and implemented for production.  This means 25% of the time the model is wrong, meaning that the algorithm will incorrectly flag 25% of the business. What happens to transactions where an algorithm has incorrectly determined the transaction is not viable? Legitimate business is turned away.  This happened to me when I tried to do business with Lenovo.

Lenovo’s Ordering System: No Laptop For You!

Buying a laptop from Lenovo reminded me of an episode of Seinfeld when Elaine was trying to buy soup.  For some unknown reason, when I placed an order on their website and gave them my money, Lenovo gave me a Cancellation Notice, the email equivalent of “No Soup for you!”  After placing an order, about 15 minutes later, I received a cancellation notice.  I called customer service.  They looked at the order and advised me the system incorrectly cancelled the order.  I was told to place the order again as they had resolved the problem.  I created a new order, and just like the last time, I received the No Laptop for You cancellation email.  I called back. This time I was told that the system thinks I am a fraud. Now I have no laptop and I have been insulted. I asked if the system could be overridden because I was not a fraud.  Customer service verified my method of payment and told me that were going to assign a case number to it as that would ensure the transaction would go through, and they would get credit for the order as they were going to place it.  Apparently, customer service has some kind of financial incentive for placing sales. That did not work either as, once I again I received the No Laptop for You cancellation email.  Not only did I not get a laptop, the person I spoke to also lost out as he was not going to get a credit for the sale.  I called back again and this time they told me that they had no idea what was wrong with the system but it had flagged me as a fraud and a case number did not get assigned last time as it was supposed to, which was the reason that that order was canceled, again.  They placed the order again and once again I received the No Laptop for You cancellation email. Every attempt at buying a laptop had failed. I had struck out with customer service as had received advice 3 times and every time I got a  No Laptop for You cancellation email. At this point I tried getting the situation resolved via social media. Publicly Lenovo said they wanted to help, and sent me one direct message letting me know they would fix the system, and that was the last I ever heard from them.  By not sending me another email, the message they sent me instead was No Laptop For You!

Relying on Machine Learning can Cost businesses Sales and leave them wondering about Toilet Seats

I tried to give Lenovo nearly $2000 and they refused to take my money.  How many other transactions are they ignoring?  Over 500? That does not seem like a terribly high number.  500 transactions for $2000 a piece would mean Lenovo’s sales are needlessly down 1 million dollars because they implemented a system which turns away sales and actively prevents sales despite the best intentions of their employees to close a sale.  Blindly relying on the accuracy of a computer program to determine with 100% accuracy whether or not a transaction is viable or not is not just a bad idea but is a bad business decision which can cost millions of dollars in sales.  While you may not have been rejected to buy a product, most people I know have seen lists of recommended products on websites which do not reflect things you want to purchase.  A friend of mine who was remodeling a bathroom, bought a toilet seat on Amazon. When he logged in again, he continued to see a myriad selection of toilet seat product recommendations for the next six months at the exclusion of other products he might actually want to buy.  Apparently, the machine learning algorithm determined that because he bought one toilet seat, he was a Toilet Seat Connoisseur and wanted to decorate his house with a variety of rare and unusual of toilet seats for the next six months.

Combining Machine Learning with People

I create machine learning solutions for clients and provide training sessions to help people learn how to write machine learning models. I understand the process and the steps which are used to create a machine learning experiment. First you gather and clean the data, then train it using a set of algorithms against a set of data, and then you create a model.  The problem “Should I cancel this sale” is has two possible answers, yes or no, meaning it is a binary classification for anomaly detection. Never have I created a model which was 100% accurate as that is not possible. I tell clients that is not possible and help them implement solutions to handle conditions when the model is wrong. Machine Learning needs to work in concert with people who have the ability to resolve problems which are flagged by the system, as there is a place for people in all automated systems.

Most normal people would have probably given up after their order was canceled twice, but I persisted as I was amazed that such a big company like Lenovo could continue to be so wrong, and I wanted to prove I was not a fraud.  Continued failure to successfully place an order convinced me that I did not want to do business with Lenovo. If a company does not want to resolve an issue where they will receive money, how likely are they to want to resolve a situation which costs them money, such as a warranty claim? Based on my experience, I have no confidence that one could get Lenovo customer service to solve a problem as they do not have the ability, even when they are financially incentivized to do so. Machine Learning and AI may decrease the number of people needed, but when things go wrong people are needed to fix them. When a machine learning model is wrong, and this will happen, the policy should be to permit your customer service people to create successful sales.  If instead, your customer service insults and ignores customers when machine learning models go wrong, sales will go down as customers will be going to competitors.

I researched laptops as I was interested in having a lightweight powerful laptop which I could haul through various airports to use at clients and conferences, like Live 360 SQL Server  where I will be speaking on December 3. Fortunately there are other companies who have determined they do not need to create some kind of machine learning score to sell a laptop, they just sell laptops to people who go to their website and give them money with no problems.  Using the same address and credit card information which Lenovo flagged as fraudulent, I bought my new HP laptop, which I will be happy to demonstrate next time you see me at a conference or class.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Analysis Services Tabular Best Practices – Part 3

This is my last installment in my series on improving Analysis Services Tabular. If you want to start at the beginning, please click here. I started writing this because I wanted one place to point clients to improve the performance and scalability of Tabular Models, and there are a lot of places you need to go to find all of this information.  I hope you find it helpful.  Let me know if you have any questions or comments regarding this post I hope you leave some comments or perhaps ask me a question on twitter.  With that here are more things that you need to do to improve your Analysis Services Tabular implementations.

Optimize your DAX Code

While it is not easy to performance tune DAX you can do it, by evaluating the DAX Query Plan and VeritPaq Queries, and SQLBI’s VertiPaq Analyzer. Also, you can also look to use functions which perform better, for example COUNTROWS instead of DISTINCTCOUNT or ADDCOLUMNS instead of SUMMARIZE. Whenever possible use the CALCULATE function instead of the FILTER function, as CALCULATE filters for context inside the parenthesis and are more efficient. Also all of the iterative functions SUMX, COUNTX etc., should be used sparingly as the row-by-row transactions they create are less efficient and should be used only when SUM or COUNT will not work.  When evaluating if a value missing, if it is possible, use ISEMPTY instead of ISBLANK as ISEMPTY looks only for the presence of a row, which is faster than the evaluation performed by ISBLANK.

Sort by Settings

There are times when the sort needs to be specified as something other than alphabetic.  A common place where this is done is for months, as people want to see January appearing first rather than April.  April will appear by default as the dates are sorted alphabetically.  To change this, modify the Month Name column property Sort by Column to sort by the Month number field so the months will always appear with January first.  This process should be repeated for any column where the alphabetic sort is not likely what people want to see, such as Day of Week.

Formatting Fields

All values should be formatted in a way that will look good on reports. Most of the time this formatting has been included, but there are some instances where the general format is used instead. Most users want numbers which include a thousands separator.  In analysis services tabular, this is accomplished by using a custom format and the format style #,##0.  This will add the thousands separator when needed and will never show a decimal number.

Hiding values

All values not used in a report should be hidden.  This will make things less confusing for people who are writing reports as the values that appear are ones which they actually use. For purposes of joining tables, many times you need to bring in key values to create relationships, but they are not values which should be reported upon as they have no business meaning.

Business Friendly Column and Table Names

When creating Power BI reports, the column names in the table are the names used in the reports.  As a best practice, the names in the tabular model should be business friendly names.   All of the column names visible in the model should be names which can be readily understood by anyone reading the report.  Names should include special characters like spaces and should be appropriately capitalized in the normal function names appear.  For example WK_NUM would appear as something like “Week Number” would be understandable to the user and look appropriate on reports. Using views can also provide the ability to change the names to user readable names outside of the model, which is also considered a best practice as then the model does not need to be changed to have user specified names.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Analysis Services Tabular Best Practices – Part 2

I was mentoring a client recently trying to help them with some tips for using tabular. I couldn’t find all of the information I was looking for, so I decided to write all of them down so that next time I could just go to my blog for the information.  There are a lot of things to consider so I broke them up into 3 parts.  To make sure you get all of them, please subscribe to my blog.  And now, here are some more tips for tabular.

Marking all Date tables as Date

Tabular models include a number of Time Intelligence functions which provide the ability to quickly and easily calculate things such as Rolling 12 months, Year-to-date calculation, and more. In order for this functionality to be possible, the table needs to be marked as date.  In order for this feature to be available, there needs to be one contiguous date field in the table, meaning there should be no gaps in the date field. If there is one default missing date field which is many years from the start of the contiguous date values, the field needs to be removed or all of the dates after that date need to be added ensuring there are no gaps in the dates. You can choose to join on the date value or another key.

Modify Timestamps to Split Date and Time

When there is a field where the date and time are both needed, the values should be separated so that there is both a date field and a time field.   Having date time in two fields assists in the dictionary encoding as the date and time fields can be separately sorted into columns where the values are the same, decreasing the number of dictionary entries.  To further improve compression, only include the seconds if absolutely necessary, as add decreasing the cardinality will increase compression.

Active Relationships and Modeling Role Playing Dimensions

All relationships between tables should be active relationships.  If you have any dashed lines which connect portions of your model, these are inactive relationships and should be removed, unless you are writing DAX Code to use them. DAX code must be written to use inactive relationships and users often find it confusing. As a best practice for situations where there are multiple dates in the fact table, such as Order Date, Ship Date and Due Date, is to have 3 different date tables. The other date tables can be created by using a calculated table to create a second date table and join the other value to that date table.  Step-by-step instructions for creating a calculated table are included in the Reference section1.  Having multiple date tables is a best practice design for tabular models as it is a common for there to be multiple dates in one table.

Partitioning Tabular Models

Partitioning Tabular models does not improve query performance. In fact multiple partitions can actually make performance worse.  Partitioning should be implemented to assist in the loading of the data.  If you do not need to create a partition to make a load timeframe, then there is no reason to do it.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Analysis Services Tabular Best Practices – Part 1

Recently I have been working on Analysis Services Tabular as a data source to Power BI and wanted to share some of the best practices for developing a model. Following these guidelines will improve the performance and scale of your project. There is a lot of material to cover, so I have broken up the information into this series.

Analysis Services Internal Dictionary Encoding

Understanding the internal components drives a number of design elements. The rules for Tabular are different because the underlying technology is different.  Analysis Service is a columnar based database, which means that it is most efficient on summarizing columns of data.  This methodology is incorporated in the Vertipaq Engine which is compresses the data in memory. The data is stored within a dictionary and is modified to store information optimally. Internally all of the data stored is translated into an integer value.  This means that two date fields when joined in a model will be as efficient as two integer values, as all values are internally represented as numbers.  Dictionary entries are created for each column.  The most efficient compression would be if the column contains one value as there would only need to have one entry.  If there were 4 different numbers in the column and the values were sorted on that column, there would be 4 dictionary entries.  If instead the numbers were ordered sequentially, such as 1,2,3,4,1,2,3,4, compression would not be possible and there would be an entry for each row.  When possible, data should be sorted so that like numbers are together.   Whenever possible, do not load unique values, as this data cannot be compressed.

Create tables with Underlying Queries with Views

Microsoft recommends that when getting data from a relational database, the data retrieved should be encapsulated with views.  This creates a record in the source data of the information used in by Power BI in the source database which can help ensure that the data structures are not modified in a way that will adversely impact the model.  The views can be materialized which can prevent the structure from changing without changing the view.  This provides visibility to the database administrator of the structure needed for Analysis Services.  Using views also can insulate Analysis Services from being dependent on the existing data model as the view does not have to match existing data types.  All views need to include an explicit list of columns for a deliberate inclusion of requested columns and should be sorted.

Extra columns should be eliminated from the view and the model as they will increase the size for no reason. The views should only contain columns which are actually used in either the DAX, reporting fields or to join tables together.  Do not bring in every field in the source table as it is not a best practice. It is more efficient to only bring in the columns needed rather than hide columns not used.  If the columns are not needed in DAX or for joining tables, do not include them in the model.  The views need to be written to run as efficient as possible on the source system, so be sure to tune the views as well.

Data Type Selection

The data type selected will impact the physical storage used, not the compression of the models in memory.  It is important whenever possible to reduce the cardinality of the data in order to be able to sort the data effectively.  When storing decimal numbers, unless you need many significant digits, store the data as Currency as it will take less space in physical storage than decimal.

Following these rules will help your model be better interpreted for dictionary encoding and will improve the compression of your data meaning it will load faster.  In the next part of this series I will discuss other things you can do to improve usability and structure of the data model. Let me know if you find this helpful by adding a comment or subscribing to this blog.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Using Power BI without an Power BI Account or Active Directory Entries

Believe it or not there is a way to use Power BI without signing up or having an account.  With an embedded application, you can use one Power BI ID and have all users use this account to log in. Now before you judge this is concept as a horrible security idea, keep on reading as you may change your mind. In this embedded application all users to login using a unique id and password and restrict the data seen in Power BI data based upon that id. How do you accomplish this seemingly incongruent task and why would you ever do such a thing? Well it turns out there is a logical reason for wanting to implement a Power BI application this way.

Creating a way to Securely Access Power BI Data for Customers

There are many companies which would like to provide Power BI reports which would allow customers to interactively work with their data, but they don’t want to create Power BI accounts for customers as that can be a lot of work from an administrative standpoint.  For the same reason, these customers are not added to the corporate network which means they are not added Active Directory.  For example, if Desert Isle SQL contracts with Acme Corporation to create a custom conference display, Acme might want to show me a report showing when the components were purchased, when they were modified and when the order is in process and when the order is completed.  How do I show a Power BI report containing information? From an application design perspective data from all of the customers should be stored in the same place and Desert Isle SQL should only see their orders when logging in to Acme’s site.  Here is the workflow that I want to implement.

 

Passing Login information to Power BI

When creating an embedded application, connecting requires a connection string.  It is possible to pass additional information to the connection string buy modifying the gateway to use effective identity and then pass the role information you want to use.  There are two configuration steps you need to complete to make this work.  The Power BI gateway needs to be configured to use CustomData through the Map User name screen. Also SSAS needs to be configured to use Roles as the role will restrict the data that users can access. The CustomData can contain a comma delimited list of values, which can include the data I need to have to access the role.  In the DAC for the role, the CUSTOMDATA field as performs as if it was a table. The DAX in the role would provide permissions based on the value of that table DimTerritory[TerritoryName] = IF(CUSTOMDATA() = “username”.“territory” .  This will restrict the data that a customer can see based on the territory they have assigned.  The Id can then be used to implement Row level security in Power BI with either the embedded data model or with Analysis Services Tabular. By using this method, you have the ability to restrict the data for each user and use one Power BI account all at the same time.

 

Costs for Implementation Multi-User Power BI Systems

Unfortunately, this solution means that you are going to be purchasing the embedded version of Power BI as this functionality is not covered with a Pro License. Embedded applications require that you purchase an embedded license or have a premium account.  The pricing for embedded has changed quite a bit from 2017 when it was introduced.  Pricing is all about capacity, unless you use a Premium account.

Power BI can be implemented in a number of different ways, and this implementation is one that you may see more of in the future.  There are a lot of different things that you can implement Power BI and it is hard to keep up with all of the changes. If you are interested in learning more about some of Advanced Power BI topics, join me in person in Boston for a full day of Advanced Power BI Training on Friday, September 21.  I look forward to meeting you here or anywhere else we might meet up.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Many to Many and Composite Modeling Power BI

The July 2018 release of Power BI includes a lot of neat new features, like wallpaper and of course getting rid of the big formatting frame on Power BI Visuals, but Composite Modeling really stands out. Have you ever created a direct connection to a tabular model and then wanted to add that one Excel spreadsheet for another sales group or needed to add some economic web data? I have. If I was lucky I was able to get it added to the tabular model.  Otherwise I had to give up on the idea. Adding more tables to the model which is using direct query is now possible with Composite Model.

Turning on Composite Modeling

Since this feature was just released, it is still in preview.  In order to be able to use it, go to the Power BI File tab then Options and Settings->Options->Preview features, as shown in the picture.  You will have to restart Power BI after this option is changed. This option not only turns on Composite Modeling, it also enables Many to Many data modeling.

Many to Many Data Modeling

In Power BI if you have tried to create a join relationship between two columns and one of them has duplicate values, you will get an error.  If you need to join two tables and they have duplicate key values, more than likely you are familiar with the work around, which is to create another intermediate table which has a column with unique values to create the relationship.  If you hide the table you can almost pretend that it really works. When you have Composite Models enabled in the Preview Features, you can create a direct relationship between tables which do not have unique keys! The Many to many relationship will be created for you in Power BI.

Caveats with the Composite Modeling Feature

Like me after playing around with both Many to Many and Composite Modeling you are probably all ready to deploy these features into production. That is a problem. As Composite modeling is a preview feature, it cannot be loaded to the Power BI Service, for now.  It will work as a desktop prototype, but you cannot distribute it, yet.

There is another big factor regarding Many to Many which you may have noticed when trying to get this feature to work. It is not possible to incorporate Many to Many with SQL Server Analysis Services Tabular Models, yet. It is my understanding that Microsoft plans to have this available for Analysis Services data sources, but for now, it is not possible to model them.

If you are planning on releasing either of these two features it is now possible to create and to use them in a desktop prototype.  I am looking forward to the day, which I hope will be later this year, when these features are available to be distributed to the Power BI Service.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Gaining Better DAX Skills

If you work with Excel Pivot Tables, Power BI, or Analysis Services Tabular more than likely you are working to improve your DAX skills. I know I am. While I do a lot of work with DAX, I am always looking to get better at writing it and better at explaining how to write it for clients or class attendees.  As I find things which I think others may find helpful, I write about them here to help spread the knowledge, especially to those people who have background in SQL.

DAX History

Microsoft has been working on the DAX language for about 10 years and released it in as part of Excel 2010 Power Pivot Add-in, back when Microsoft was incorporating Power BI functionality in Excel. Microsoft wrote DAX so that it would be like Excel so that people who are familiar with Excel would have an easier time writing it. While DAX is easier than MDX, it is enough different from Excel that it can get quite complex. One of the founders of the DAX language is Jeffery Wang, who is still actively working on enhancing the language further.  He works not only with developers but also people in the technical community who work extensively with DAX to get their input. I also found out recently that he took time out to do a technical review of a book on the DAX language by Phil Seamark b | t.

Leveraging SQL Knowledge to Learn DAX

If like me, your skills like not in Excel but in SQL, reading a book which builds upon SQL knowledge to understand DAX is really helpful.  Beginning DAX with Power BI provides examples of how to write something in SQL then shows how the same functionality would be written in DAX, sort of like a SQL-to-DAX language library.  Showing how to get the answer from SQL and DAX is a great technique to provide a better understanding of how to write DAX. I worked a client recently who validated all of the DAX in the tabular model by writing SQL to ensure they were getting the correct answers. I think the validation process would have been improved with this book which takes DAX out of the black magic category and explains the way it works.

Performance Tuning DAX

Once you start writing a lot of DAX, you will at some point write a query which performs badly.  What do you do to avoid poorly performing DAX? Phil provides some “old-school debugging” techniques to understand what is being generated and where it is going wrong. He also shows how you can use DAX Studio and SSMS to help figure out how and why your DAX is working the way it is. You will definitely want to reference Chapter 8 to learn more about this important topic.

Other Methods for Learning DAX and Improving Power BI skills

If you are looking for a hands-on environment to better understand how DAX works and how to improve your Power BI skills I have two opportunities for you in the US. I will be giving Advanced Power BI Classes in Columbus, Ohio on July 27 and Boston, Massachusetts on September 20.  Of course we will be covering more advanced DAX development and tuning as well as applied visualization theory to improve the interactivity of your Power BI reports as well as hands on labs to implement the new drill-through features and enhancements in Power BI and other class topics.  I hope to meet you there or at some of the other conferences I will be attending this year.  Check out my Engagement page and hopefully find a place where we can meet up as I would love to meet you.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Data Science with Python

KD Nuggets Data Science/ Machine Learning PollFor those of you who might have missed it, the website KDnuggets released their latest internet survey on data science tools, and Python came out ahead, again. Python has continued to gain as a tool that people are using for Data Science.  The article accompanying the graphic is very interesting as it brings up two data related points. The first is the survey only had “over 2300 votes” and “…one vendor – RapidMiner – had a very active campaign to vote in KDnuggets poll”.  This points the fallacy in completely relying on data with an insufficiently sized data set, as it is possible to skew the results, which is true both for surveys and data science projects.  If you look at the remaining results one thing also strikes me as interesting. Anaconda and sci-kit learn are Python libraries.  Tensorflow could be used for either R or Python.  This does tend to increase the argument for more use of R or Python over RapidMiner.  The survey also made me want to check out RapidMiner.

Thoughts around Rapid Miner for Machine Learning

While I have not had enough time to fully analyze Rapid Miner, I thought I would give my initial analysis here and do a more detailed review of it in another post.  Rapid Miner scored well in the Kaggle Survey, but also it ranked highly on the 2018 Garner Magic Quadrant for Data Science Platforms.  Rapid Miner is trying to be a tool not only for data scientists, but also for business analysts as well.  The UI is pretty intuitive, which is good because the help is not what it should be. I also was less than impressed at its data visualization capabilities, as R and Python both provide much better visuals. Of course, I used the free version of the software, which works but it is limiting.  It looks like a lot of the new stuff is going to be only available on the paid version, which decreases my desire to really learn this tool.

Machine Learning Tools

Recently I have done a number of talks on Python in SQL Server, literally all around the world, including Brisbane, Australia tomorrow and Saturday, June 2 as well as in Christchurch New Zealand. As R was written in New Zealand, I thought that it would be the last place where people would be looking to use Python with Data Science, but several of the attendees of my precon on Machine Learning for SQL Server told me that where they worked, Python was being used to solve data science problems. Now of course this is anecdotal sample, as we are not talking about a statistically significant sample set, but that doesn’t keep it from being interesting.   The demand for Python training continues to increase as Microsoft has announced they are working on incorporating Machine Learning Service blog series with SQL Server Central.  The first two post have been released. Let me know what you think of them.

Upcoming Events

I am looking forward to talking about Machine Learning with SQL Server in Brisbane both at an intense day long session and at a one hour session on Implementing Python in SQL Server 2017 at SQL Saturday #713 – Brisbane, Australia. I look forward to seeing you there. For those who can’t make it, well, hopefully our paths will cross at a future event.

 

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

Creating Power BI Date Tables by Marking as Date Table

In a previous post, I wrote about the fact that Power BI did not have the capability to Mark a table as date, but that has changed! Now if you want a date table to use numeric keys to join to other tables you can and still get the time dimension functionality to work.  Now there is no need to use some kind of a work around to get a date table, you can just make it one.

Marking a Date Table in Power BI

Naturally there is more than one way to mark a table as a date table in Power BI.  The first way is to click on the ellipse (you know the three dots) next to the date table Name  in the Power BI Fields, which I highlighted in yellow. The menu contains an option to Mark as date table.  If you select say the locations table, Power BI is smart enough to know that this is not a date table and the option will not appear. The first time that you select the table it will show the window shown above and prompt you to select the column in the table which is an actual value.  Once you select the OK button, the menu item will have a check mark next to the Mark As Date Option and if you select it again you will see this window. Generally speaking there is no reason to Unmark a date table, but if you want to, you can.

The other place in Power BI where you can Mark a Date table is in the report view there is an option in the Modeling tab to Mark as Date Table. Note this option will be disabled unless you have clicked on a table first.  The icon is only available in the Modeling tab of the report view.  If you are in the grid view or the relationship view you won’t see the option to Mark as Date Table, it is only available in the Report Modeling tab.

Time Dimension Functions: The reason for marking a Date Table

For those of you who may be wondering: Why would I want to mark a table as a date anyway there is a very good reason.  Marking a table as a date allows you to use all of the cool features which Microsoft added to DAX to do date calculations for you.  While it is possible to not use any time based DAX functions, unless you are rather expert at DAX, like these guys, you probably will want to use the nice DAX functions Microsoft created for handy things like TOTALYTD or SAMEPERIODLASTYEAR.  Now just because you have a date table added does not mean that you can use any date in those calculations, you need to reference a table that you have marked as date, not a date field within the table.  That means that  CALCULATE(SUM(ResellerSales_USD[SalesAmount_USD]), SAMEPERIODLASTYEAR(DateTime[DateKey]))  will return the correct value and CALCULATE(SUM(ResellerSales_USD[SalesAmount_USD]), SAMEPERIODLASTYEAR(ResellserSales[DateKey]))  will not.  You need to reference the table marked as a date table every single time in for Time Dimensions.   For and example of a Power BI Desktop DAX calculation not working when you are not using a marked date table,  check out this post.

Power BI Update Requests

Power BI updates every month and you may miss some of the things that the team released last month if you do not check out this site every month.  I know that the Power BI teams takes ideas for upcoming releases from Https://ideas.powerbi.com, and there is one thing that I really wish would get some more votes so that it can be added soon, which is another idea, like Marking Date Tables, which started out in SSAS Tabular. Having recently worked on a project with a ton of measures. The next feature that I hope that Power BI inherits from SSAS tabular is the ability to create folders for measures.   If you would like that too, vote for it here.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

Limitations in Time Series Data Analysis and the growth of Advanced Data Analytics

As someone who regularly analyzes data, I have done my share of time series analysis to determine trends over time.  I am struck by the fallibility of this sort of analysis.  For those who are unfamiliar with this time of analysis, time series analysis is performed to try to identify patters in the noise of data to help predict future trends through the use of algorithms like ARIMA. As I kid I remember hearing in fast announcer voice the following text “Past performance is no indication of future results”.  As a matter of fact this is a rule that the SEC requires mutual funds to tell all of their investors this statement.  Yet I get asked to do it anyway.  While I enjoy working with data and using advance analysis techniques including R and Python, I think it is important to realize the limitations of this sort of analysis.  It is considered a good experiment in Machine Learning if you are 85% right.  This is not acceptable if you are talking about a self-driving car as running people over 15% of the time is generally not considered acceptable. There are times when looking at the future that the data is not always going to provide an answer.  When looking to find answers in data, that needs to be something people keep in mind.  While you can find some answers in data, other answers will require prognostication or plan old guessing.

Impact on Technology realized in 2026

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Data analysis is all about pattern matching, and while I don’t find it to be infallible, looking at a wideset of data has led me to plan accordingly.  While I am no Faith Popcorn, my analysis of what I see in the marketplace has led me to make some changes in my own life as I believe change is coming to the industry.  Adam Mechanic’s prompting for looking ahead to 2016 has provided the impetus to publish these theories.  What I see in the marketplace is the tools which are used to support databases are improving.  I see the ability of software to provide relevant hints and automate tuning of database queries and performance to continually improve, meaning there will be less of a need to employ people to perform this task.  I see with databases being pushed more and more to the cloud and managed services less and less need to employ many people to perform dba roles. Where I see the industry moving is towards more people being employed in analyzing the data to determine meaning from it.  I see that in 2026 very little data analysis being performed with R and most analysis being performed in Python.  This means that if you are looking ahead, and are employed in areas where people are being supplemented with tools, the time is now to learn skills in areas where there is growth. If you have been thinking about learning data science, Python and advanced analytics tools now is the time to start so that you will be prepared for the future.

 

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur