Recently I’ve started working with Azure Machine Learning and looking at what I consider the most challenging part, picking the right analysis. For those people who haven’t ventured into Azure Machine Learning, it looks a lot like a data flow in SSIS. After that you need to train or more to the point evaluate which model works best. The answer to that question takes a while. What kind of data do you have? Are you looking to find errors? Determine whether data classified in a certain way can predict a result? Perform a regression analysis of data over time? Group data together to identify trends?
Is your Model better than a Monkey throwing Darts?
While you can analyze your variables and rank them to determine the chance that the variables indicate a result, there is another method that is also used to determine an outcome, the coin toss. This lowly method of analysis is right half the time. If you have more than two outcomes, or to speak the language of Machine Learning, the outcome is not binary, there is another method used to determine the accuracy of predictions, monkeys. I have read about the various skills of monkeys in both literature and financial analysis. Think about it for a minute and you may remember reading or hearing about monkeys typing on a keyboard who have been able to write Shakespeare, or a blog post. This is known as the Infinite monkey theorem. Another thing monkeys have been known to do is throw darts. Various financial publications have been measuring the success of mutual funds to monkeys throwing darts at stocks since the last century. The goal is of course to create a model that has the better success as a monkey or a quarter. The question is how?
Probability of Picking the Right Model
ROC [Receiver Operating Characteristic] Curves are used to ensure the machine learning model generated is better than a monkey throwing darts. Your goal is a perfect game of golf. Chances are your ROC curve will be somewhere between the two. In looking at the ROC curve generated here, you can see 3 lines, a light grey, a red and a blue one.
The diagonal line represents a coin toss. If you were able to get one of your scored datasets to be a 1, meaning that you got a true positive rate every time, you would have played a perfect game of golf. Chances are you will have two lines like I do here and one, in this example the blue line, has a higher number of true positive rates than does the red line, so the results generated by that model are more accurate.
More ML More of the Time
I find myself spending more time with Azure ML, meaning that I will be devoting a lot more future blog posts on this topic. I am also speaking on Azure ML both as part of a Pre-Convention Event on the Modern Data Warehouse and at SQL Saturday in Phoenix. If you happen to be in Phoenix, I would love to meet you. SQL Saturdays are great learning events and I am happy that I was selected to participate in this one.
Data aficionado et SQL Raconteur
The gap between Power BI and Excel keeps on getting wider. As there is conflicting information about the Excel/Power BI break-up on various places on the internet, I wanted to clarify some of the common discussion questions. One place where you can get definitive answers is in the new Licensing Information for Power BI which I linked in case you missed it. Unfortunately for those of us who have been paying the higher fees for Power BI, the price reduction to ten dollars isn’t immediate. Although I am disappointed, the non-immediate fee reduction makes sense, since the new-Excel-free-version of Power BI is still preview edition. No one outside the US is able to even try it yet. The new pricing will be available when the new product is available. This also gives people a chance to migrate their existing reports to the new version of Power BI. Another way of saying this is, going forward you don’t need Excel or the Four Powers – Power Query, Power Map, Power Pivot and Power View – anymore. The only thing you’ll need is the New Power BI.
No SharePoint needed for Power BI
Another thing that the licensing document makes clear, is when the new Power BI is released, the Office 365 version of SharePoint will not be required. To be even clearer, SharePoint will not be needed to use Power BI. There are several places online where I have read conflicting information regarding the need to have SharePoint. Let me clarify by quoting from Microsoft’s Licensing Information for Power BI page just to make it perfectly clear “Power BI service will become a standalone service and will no longer require SharePoint Online”. Since the current version of Power BI is using SharePoint, if this is the only reason you have Office 365 SharePoint, you can get rid of SharePoint, which will be an additional cost savings. How much will it cost? Talk to Microsoft Support as the details must be worked out with them.
Why Did Microsoft Change Power BI to not use Excel ?
While at SQL Saturday in Albuquerque, which was even better than my high expectations, I had a chance to talk to someone from Microsoft, and of course Power BI came up. I asked why he thought Microsoft moved away from Power BI? While not divulging anything that is covered under an NDA, he mentioned that there were a lot of people who would like to use the features of Power BI, but they didn’t have the right version of Excel within their organizations. Microsoft removed this barrier to adoption by moving to a non-Excel version. Excel also had a lot of features that weren’t needed for data visualization, and support for some of the current features was sort of confusing. I agreed with him. For example there are three different ways of creating a data connection, which is definitely confusing.
Scheduled Updates in Excel
For those people who like the Power BI Add-ins to Excel and want to stay with them, there is one big issue, scheduled updates. Only with Power BI deployed to the Office 365 Cloud SharePoint can you get scheduled updates from all the places which you might be retrieving data. For all those people who for a variety of issues didn’t want to go with Power BI deployed that way, there is now a solution. If you are looking to update Excel, Power Update is what you need. Kudos to Rob Collie for providing this solution as I have heard from a number of people that they had SharePoint and didn’t want to go to the cloud for Power Query updates or didn’t want to have to deploy SharePoint. Keeping up with Excel and Power BI is now a wider world than just Microsoft. The one thing you can count on is things will always be changing.
Data aficionado et SQL Raconteur
On February 7, I was fortunate enough to be selected to speak at SQL Saturday in Albuquerque, New Mexico on Top 10 SSIS Tuning Tricks. Having worked with SSIS for a number of years, I’ve needed to research what was the best methods to employ to ensure my SSIS ETL was running optimally. I’ve compiled the most valuable items, with examples of course, into this presentation. I’m assuming that everyone attending already has been using SSIS for a while, so I will skip straight into more in-depth ways of tuning SSIS. One of the questions that I know I have heard most often is “When should I do X in SQL or SSIS?” If you are able to attend this session, you will have the answer to that question.
I really enjoy the opportunity to speak on data related topics and meeting people who may have come upon my blog in the past. Having spoken at this event last year, I know what a good job Keith, Chris and Meredith and friends do organizing this event. I want to take the time to say thank you for all of your hard work as I really appreciate it. These events are a great place to learn and keep up with a lot of the changes going on in the industry. I anticipate there will be many lively discussions both before and after the event. That reminds me. If you get a chance, on Friday there are two great precons scheduled on Friday, February 6th , Powershell Basics with Mike Fal and Query Tuning, Troubleshooting and Execution Plans with Jason Kassay. Having been fortunate enough to meet both of them, I know that they are both extremely knowledgeable in their respective topics, and if you are in Albuquerque I encourage you to sign up for either of them as I am sure both will be excellent.
I hope that you will be able to attend as I know I will enjoy seeing you there.
Data aficionado et SQL Raconteur