What Does Analytics Mean?

A lot of words get used in technology and after a little while, no one bothers to mention what the word means. That’s too bad when the definition of a word gets changed, but that’s not the case with analytics. I found out that analytics is not a new word. It was coined in the 16th century to describe trigonometry, which makes me even more surprised WordPress’ spell checker always puts a red line under it as a misspelled or unknown word. I had someone tell me recently that they really weren’t sure what it was supposed to mean.

Wikipedia says “Analytics is the discovery and communication of meaningful patterns in data“. That’s what as data professional doing when we provide data in a manner which answers questions, such as providing KPIs, machine learning algorithms, or visualization. It’s not enough to be the keepers of the data library, data should also be used to provide meaning. Keeping that in mind, businesses all over the world tend to look for Adverity (adverity.com) or any similar company that has a skilled and experienced team of data analysts. Such service providers usually extract meaningful insights from available data and assist in the formulation of marketing strategies. Here’s another reason to work on analytics, the dollars the trade press is predicting will be spent on business analytics by 2018.

Steps to Providing Analytics

When describing the process for providing analytics, I am sure many people will recognize parts of the process as they are engaged in them now. The first step is to understand the data. Understanding the data does not only mean having knowledge of the structure of the data, as that obviously will be necessary to select it, but also needing to know how the business uses the data. Which fields contain the data they actually use? The second step is to preparing the data, including determining what data to include. Do you have all of the data you need to do the analysis? If the answer to that question is no, the analytic process will stop. You may have to exclude some data if it is incomplete or of dubious quality.

Once one has the needed data, it’s time to start the third step, data modeling. Modeling is where you categorize and make various decisions regarding the data. For example, if you are wearing a blue shirt and tan pants and you are looking at the laptops and you happen to be in Best Buy, you have found an employee. Determining if your model is evaluated in the next step. Generally speaking the analysis will include items where you know the outcome. For example, if you are trying to predict when your website volume will increase, you want to look at the historical events that made that happen. Marketing people do this to determine if the ad campaigns were successful, for example.

The Dynamic Analytical Process

After the model is created and sucessfully tested and evaluated, it’s time to deploy it and monitor the outcomes. One thing to remember about complex analytical models is they will probably change. One example of this is an analytical model many people are familiar with, the FICO score. FICO scores were created to predict credit risk. They have been tweaked quite a lot as the latest real estate crash showed that the fact a high FICO score showing someone paid credit cards on time was a lousy predictor of whether or not that same person would default on a mortgage. Netflix changes the movies they recommend when new movies come out. Things change all the time, so working on analytics means the work is never “done”. All the better for those of us who enjoy data analytics.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Azure ML, SSIS and the Modern Data Warehouse

Recently I was afforded the opportunity to speak at several different events, all of which I thoroughly enjoyed. I was able to speak on Azure Machine learning first at the Arizona SQL Server Users Group meeting. I really appreciate all who attended as we had quite a crowd. Since the meeting is held MachineLearningTalkpractically on Arizona State University’s Tempe Campus, it was great to see a number of students attending, most likely due to Ram’s continued marketing efforts on meetup.com. After talking to him about it, I was impressed at his success at improving attendance by promoting the event on Meetup, and wonder if many SQL Server User Groups have experienced the same benefits. If you have, please let me know. Thanks Joe for taking a picture of the event too.

Modern Data Warehousing Precon

The second event where I had the opportunity to talk about technology was at the Precon at SQL Saturday in Huntington Beach, where I spoke about Modern Data Warehousing. It was a real honor to be selected for this event, and I really enjoyed interacting with all of the attendees. Special thanks to Alan Faulkner for his assistance. We discussed the changing data environment including cloud based storage, analytics, Hadoop, handling ever increasing amounts of data from different sources, increasing demands of users, the review of technology solutions demonstrate ways to resolve these issues in their environments.

Talking and More Importantly Listening

The following day was SQL Saturday in Huntington Beach #389. Thanks to Andrew, Laurie, Thomas and the rest of the volunteers for making this a great event as I know a little bit about the work that goes into planning and pulling off the event. My sessions on Azure ML, Predicting the future with Machine Learning and Top 10 SSIS Tuning Tricks were both selected and I had great turnout on both sessions. To follow-up with a question I received during my SSIS Session, Balanced Data Distributor was first released as a new SSIS transform for SQL Server 2008 and 2008 R2, so you can use it for versions prior to SQL Server 2012. I’ve posted more information about it here. I also got a chance to meet a real live data scientist, the first time that has happened.  Not only did I get a chance to speak but a chance to listen. I really enjoyed the sessions from Steve Hughes on the Building a Modern Data Warehouse and Analytics Solution in Azure, Kevin Kline on , and Julie Koesmarno on Interactive & Actionable Data Visualisation With Power View. As always it’s wonderful to get a chance to visit in person with the people who’s technical expertise I read. In addition to listening to technical jokes which people outside of the SQL community would not find humorous, it’s great to discuss technology with other practitioners. Thanks to Mr. Smith for providing me a question which I didn’t know the answer, which now I feel compelled to go find. I’ll be investigating the scalability of Azure ML and R so that I will be able to have an answer for him next time I see him. I really enjoy the challenge of not only investigating and applying new technology but figuring out how to explain what I’ve learned. I look forward to the opportunity to present again, and when I do I’ll be sure to update this site so hopefully I get a chance to meet the people who read this.
Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Musing about Microsoft’s Acquisition of Datazen and Power BI

DataZenMicrosoft just announced that they have bought Datazen, a mobile data visualization product. While I have no idea what Microsoft is actually going to do with the Datazen product, I couldn’t resist the chance to speculate about it. In earlier posts, I’ve talked about the conversion of what Power BI was before Power BI Designer was released and what Power BI is now. Since then I have been working on creating new Power BI dashboards. The process left me, shall we say underwhelmed? The tools in Excel allow for much greater flexibility and options than new Power BI. Now to be fair, new Power BI was released December 18th, 2014, so it’s not possible for it to contain all of the rich feature and functionality that the Excel tools do. That’s all well and good, but what it won’t do led to some frustration. If the new Power BI was the way that Microsoft was going to climb up to the top of the Gartner BI visualization charts, I didn’t think it was going to do the trick.

Anyone Still Using Lotus 123?

The one thing that I kept on thinking about when looking at the new Power BI is, there has to be a part of the plan I’m not getting. I didn’t see how this product would have the feature and functionality needed by the time the reviews came around again next February. In looking back in time, I couldn’t help of thinking of a time when Microsoft was battling it out in another space, spreadsheets. When Excel first came out, the big leader in the space was Lotus 123, which has since disappeared. (If you are running it where you work, please post comment to let me know, because I think Lotus 123 is gone.) The reason for Microsoft’s dominance in spreadsheets was Excel got a lot better at providing spreadsheets the way people wanted to use them.

Datazen, Hopefully Not the Next ProClarity

Microsoft’s purchase of Datazen looks to be a way to leverage a product with some really cool features to enhance the capabilities of Power BI. Datazen is a mobile application, but they have some good looking visualizations which hopefully could be incorporated into Power BI. There’s only one thing that may be a reason for pause. In 2006, Microsoft made another acquisition. They bought a company called ProClarity. ProClarity had some really neat features, some of which were included in Performance Point, but for the most part, the application was killed. I hope that history is not a guide in the purchase of Datazen, because Datazen has some great visualizations which could really help the new Power BI, and it would be good if Microsoft could figure out how to merge the features into the new Power BI to help improve the their position in the data visualization marketplace. I look forward to seeing how the two companies merge the Datazen features into Microsoft’s data visualization components.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Motivation

Speaking for myself, sometimes I have a hard time getting motivated. I know that I need to get a bunch of work done, and I find myself mesmerized by the internet as pet pictures or the news or twitter momentarily provide really compelling reasons not to work on the list of things that I have to get done. Eventually, I pull my head out, and start getting things accomplished. I seek out articles which have motivational tips too. One of the best tips I read went something like the inhabitants of Planet Kardashian will exist whether or not you are aware of their foibles. (They have planets now? Star Trek fortold a reality show?) What I took from the tip is; what’s going on other places will continue to go on whether you know about it or not, so you can find out about it after your work is done. Some days that works well, others, more of a goal. I write to do lists, place sticky notes around where I can’t help but see them and engage in most of the other tricks I’ve read to motivate myself. Sometimes it is not enough to push myself, I need an outside force.

External Incentive

People can provide a big external incentive. For an example of this, check out how hard sometimes people try to impress people they will never see again at stop lights. I know that I have been guilty of similar behavior, just not Green lightat stoplights. Being a part of an online community helps in finding motivation, as there are other people trying to do the same thing that you are. Motivation can come from anywhere, from a blog or even twitter. I found motivation in both places. After reading Ed Leighton-Dick’s post, I found an external motivator. His blog also showed me how powerful a post can be. Thanks to twitter, a lot of people saw his post and a number of people in the SQL Server Community have posted links and wrote their own blogs in support of his efforts. As I am sure Psy can attest, one can never know how much people are going to respond to what you put out on the internet, so kudos to Ed to being the Psy of the SQL Server Community. A number of people are now finding themselves motivated to bring their thoughts out of their head and onto the keyboard. Sharing of knowledge will help us all get smarter and better at our jobs. If you happen to be on twitter and see an interesting blog post with the hashtag #SQLNewBlogger, thank Ed as he helped make it happen.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

What is a Modern Data Warehouse?

As I was honored enough to be selected to give a PreCon on the Internals of the Modern Data Warehouse, I thought that I would take the time to explain why I felt drawn to the topic. There are a lot of places that haven’t given much thought to the changes in technology which have happened over the last few years. The major feature upgrades to SQL Server in 2012 and 2014 have meant that they can use column store indexes which makes things faster and maybe better High Availability. While those things are certainly valuable improvements there is a lot more that you can do to derive value from your data and companies want more than just a well-organized, running data warehouse.

Data is a Valuable Asset

In 2010, Borders Group Inc. was allowed by the Federal Trade Commission to sell their customer information to Barnes and Noble as part of their bankruptcy sale of their assets. In 2015, RadioShack is doing the same thing. Businesses understand that data is valuable and they are interested in using it to drive decision making. Amazon, Netflix and Target are well known for their use of customer information to drive sales, but they are far from the only ones doing this. This is one of the bigger trends identified recently in the business press. The heads of companies are now looking for their data teams to do more with their data so that they too can have the dream information systems they are reading about.

Total Destruction of the Existing DW is Not Required

Excavator working with earth and sand in sandpitWhile a lot of the time, it might be nice to level everything and start over, that is not always an option. The major reason for this is that the data warehouse environment already in place has a lot of value. You want to add to the value already there, not destroy what you have. Also it would take a long time to recreate the environment and no one is patient enough to wait for that. Alternatively you could expand into areas of new technology as your data grows. Perhaps this mean you archive some of your data from your database to a Hadoop cluster instead of backing up the data in some far off location. This would allow you to use Sqoop to bring the data back when you need it, providing ready access to the data. Perhaps you want to provide the users more self-service BI capabilities, moving the data analysis into the hands of the people who are more familiar with the data? You could add the capabilities of Power View in Excel, Power Designer or Tableau to your environment.

Incorporating Social Media Information

The business world operates not only on a batch cycle. More and more companies want to know what is being said about them so they can respond appropriately. With tools like Azure Event Hubs, Data Factory, Streaming Analytics, and Machine Learning this isn’t as hard to do as it might sound. We’ll review these products so that attendees will understand how these tools can provide greater insight not only into their own data, but the data building about them outside of the company firewall.

For More Information

I really hope you can join me in Huntington Beach on April 10 for a full day of exploring these concepts. I always look forward to events like the precon and of course SQL Saturday #389 – Huntington Beach which is the following day.

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Complex Data Analysis and Azure Machine Learning Presentation Wrap Up

Thank you for all of the people who signed up for my webinar on Data Analysis with Azure Machine Learning [ML]. I hope after watching it that you find reasons to agree that the most important thing you need to know to get started in Machine Learning is not Math, but having good knowledge of the data you want to analyze. There’s no reason not to investigate as Azure Machine Learning is free.  In order to take more time with the questions after the presentation than the webinar format allowed,  I am posting my answers here, where I am able to answer them in greater detail.

How would one choose a subset of data to “train” the model? For example, would I choose a random 1000 rows from my data set?

It is important to select a subset of data which is representative of the data which wish to evaluate. Sometime a random 1000 rows will do that, and other times you will need to use other criteria, like transactions throughout a given date range to be a better representative sample. It all comes down to knowing your data well enough to know that the data used for testing is similar to what you will be ultimately using for analysis.

Do you have to rerun or does it save results?

The process of creating an experiment requires that for each run you need to re-run the data as it does not save results.

Does Azure ML use the same logic as data mining?

In a word, no. If you look at the algorithms used for data mining you will see they overlap with some of the models available in Azure ML. Azure ML provides a richer set of models, plus a greater ability to either call models created by others or write custom models.

How much does Azure ML cost?

There is no cost for Azure ML. You can sign up and use it for free.  Click here for more information on Azure ML.

If I am using Data Factory, can I use Azure ML ?

Data Factory added the ability to call Azure ML in December, providing another place to incorporate Azure ML analytics. When an Azure experiment is complete, it is published as a web service so that the experiment can be called by any program which chooses to call it. Using the Azure ML experiments from directly within Data Factory decreases the need to write custom code, while allowing the logic to be incorporated into routine data collection processes.

http://azure.microsoft.com/blog/2014/12/16/azure-data-factory-updates-integration-with-azure-machine-learning-2/

If you have more questions about Azure ML or would like to see me present on the topic live and live in Southern California, I hope you can attend SQL Saturday #389 – Huntington Beach where I will be presenting on Azure ML and Top ten SSIS tips. I hope to see you there.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Math and Machine Learning

MLModelsI had an interesting conversation with someone at SQL Saturday Phoenix, an event that I am happy I was able to attend, regarding knowing math and getting started in Machine Learning. As someone who had majored in Math in college, he was sure that you had to know a lot of math to do Machine Learning. While I know that having really good math skills can always be helpful when creating statistical models based on probability, a big part of Machine Learning, I do not believe that you need to know a lot of math to do Azure Machine Learning [ML].

Azure Machine Learning and Throwing Spaghetti Against the Wall

For those of you who cook, you may have heard of an old school way of testing to see if the spaghetti is done. You throw the spaghetti against the wall and if it sticks, the pasta is done. If it falls right off, keep the spaghetti in the pot for a while longer. Testing machine learning models is similar, but instead of throwing the computer against the wall, you keep on testing using the large number of models available in Azure ML. Once you have determined the classification of your data, there are a number of different models for the classification which you can try without knowing all of the statistical formulas behind each model. I have listed all of the models from Azure ML here so that you can take a look at the large number of models available. By taking a representative sample of your data, and testing all of the related models, determining which one will provide a result is not terribly difficult. The reason it is not very hard is you do not have to understand the underlying math needed to run the model. Instead you need to learn how to read a ROC curve, which I included in my last blog post. While you can pick the appropriate model by having a deep understanding of the formula behind each model, you can achieve similar results by running all of the models and selecting the model based on the data.

Advanced Statistical Analysis and Azure ML

While Azure ML contains a lot of good tools to get started if you do not have a data scientist background, which recruiters lament not enough people do, why would you use Azure ML if you have coded a bunch of R Modules already to analyze your data? Because you can use Azure ML to call those modules as well and provides a framework to raise visibility and share those modules with people within your organization or the world, if you prefer.

How to Pick the Right Model

I am going to demonstrate how to pick the right model in an upcoming webinar, which is probably easier to explain in that fashion rather than in a blog post. If you want to see how to determine which model to use and not know a lot of Math, I hope you take the time to attend. Azure ML offers the ability to integrate analysis into your data environment without having to be a data scientist, while providing advanced features to accommodate those really good at math, which I will be talking about in an upcoming Preconvention event for SQL Saturday in Huntington Beach. If you happen to be in Southern California on April 10th I hope you will be able to attend that event.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Getting Started with Machine Learning – Result Analysis

Recently I’ve started working with Azure Machine Learning and looking at what I consider the most challenging part, picking the right analysis. For those people who haven’t ventured into Azure Machine Learning, it looks a lot like a data flow in SSIS. After that you need to train or more to the point evaluate which model works best. The answer to that question takes a while. What kind of data do you have? Are you looking to find errors? Determine whether data classified in a certain way can predict a result? Perform a regression analysis of data over time? Group data together to identify trends?

Is your Model better than a Monkey throwing Darts?

While you can analyze your variables and rank them to determine the chance that the variables indicate a result, there is another method that is also used to determine an outcome, the coin toss. This lowly method of analysis is right half the time. If you have more than two outcomes, or to speak the language of Machine Learning, the outcome is not binary, there is another method used to determine the accuracy of predictions, monkeys. I have read about the various skills of monkeys in both literature and financial analysis. Think about it for a minute and you may remember reading or hearing about monkeys typing on a keyboard who have been able to write Shakespeare, or a blog post.  This is known as the Infinite monkey theorem. Another thing monkeys have been known to do is throw darts. Various financial publications have been measuring the success of mutual funds to monkeys throwing darts at stocks since the last century. The goal is of course to create a model that has the better success as a monkey or a quarter. The question is how?

Probability of Picking the Right Model

ROC [Receiver Operating Characteristic] Curves are used to ensure the machine learning model generated is better than a monkey throwing darts. Your goal is a perfect game of golf. Chances are your ROC curve will be somewhere between the two. In looking at the ROC curve generated here, you can see 3 lines, a light grey, a red and a blue one.

RocCurve

The diagonal line represents a coin toss. If you were able to get one of your scored datasets to be a 1, meaning that you got a true positive rate every time, you would have played a perfect game of golf. Chances are you will have two lines like I do here and one, in this example the blue line, has a higher number of true positive rates than does the red line, so the results generated by that model are more accurate.

More ML More of the Time

I find myself spending more time with Azure ML, meaning that I will be devoting a lot more future blog posts on this topic. I am also speaking on Azure ML both as part of a Pre-Convention Event on the Modern Data Warehouse and at SQL Saturday in Phoenix. If you happen to be in Phoenix, I would love to meet you. SQL Saturdays are great learning events and I am happy that I was selected to participate in this one.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Licensing of Power BI moves away from Office 365

The gap between Power BI and Excel keeps on getting wider. As there is conflicting information about the Excel/Power BI break-up on various places on the internet, I wanted to clarify some of the common discussion questions.  One place where you can get definitive answers is in the new Licensing Information for Power BI which I linked in case you missed it. Unfortunately for those of us who have been paying the higher fees for Power BI, the price reduction to ten dollars isn’t immediate. Although I am disappointed, the non-immediate fee reduction makes sense, since the new-Excel-free-version of Power BI is still preview edition. No one outside the US is able to even try it yet. The new pricing will be available when the new product is available. This also gives people a chance to migrate their existing reports to the new version of Power BI. Another way of saying this is, going forward you don’t need Excel or the Four Powers – Power Query, Power Map, Power Pivot and Power View – anymore. The only thing you’ll need is the New Power BI.

No SharePoint needed for Power BI

Another thing that the licensing document makes clear, is when the new Power BI is released, the Office 365 version of SharePoint will not be required. To be even clearer, SharePoint will not be needed to use Power BI. There are several places online where I have read conflicting information regarding the need to have SharePoint. Let me clarify by quoting from Microsoft’s Licensing Information for Power BI page just to make it perfectly clear “Power BI service will become a standalone service and will no longer require SharePoint Online”. Since the current version of Power BI is using SharePoint, if this is the only reason you have Office 365 SharePoint, you can get rid of SharePoint, which will be an additional cost savings. How much will it cost? Talk to Microsoft Support as the details must be worked out with them.

Why Did Microsoft Change Power BI to not use Excel ?

While at SQL Saturday in Albuquerque, which was even better than my high expectations, I had a chance to talk to someone from Microsoft, and of course Power BI came up. I asked why he thought Microsoft moved away from Power BI? While not divulging anything that is covered under an NDA, he mentioned that there were a lot of people who would like to use the features of Power BI, but they didn’t have the right version of Excel within their organizations. Microsoft removed this barrier to adoption by moving to a non-Excel version. Excel also had a lot of features that weren’t needed for data visualization, and support for some of the current features was sort of confusing. I agreed with him. For example there are three different ways of creating a data connection, which is definitely confusing.

Scheduled Updates in Excel

For those people who like the Power BI Add-ins to Excel and want to stay with them, there is one big issue, scheduled updates. Only with Power BI deployed to the Office 365 Cloud SharePoint can you get scheduled updates from all the places which you might be retrieving data. For all those people who for a variety of issues didn’t want to go with Power BI deployed that way, there is now a solution. If you are looking to update Excel, Power Update is what you need. Kudos to Rob Collie for providing this solution as I have heard from a number of people that they had SharePoint and didn’t want to go to the cloud for Power Query updates or didn’t want to have to deploy SharePoint. Keeping up with Excel and Power BI is now a wider world than just Microsoft.  The one thing you can count on is things will always be changing.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Tips on SSIS at SQL Saturday Albuquerque

sqlsat358_ABQOn February 7, I was fortunate enough to be selected to speak at SQL Saturday in Albuquerque, New Mexico on Top 10 SSIS Tuning Tricks. Having worked with SSIS for a number of years, I’ve needed to research what was the best methods to employ to ensure my SSIS ETL was running optimally. I’ve compiled the most valuable items, with examples of course, into this presentation. I’m assuming that everyone attending already has been using SSIS for a while, so I will skip straight into more in-depth ways of tuning SSIS. One of the questions that I know I have heard most often is “When should I do X in SQL or SSIS?” If you are able to attend this session, you will have the answer to that question.

I really enjoy the opportunity to speak on data related topics and meeting people who may have come upon my blog in the past. Having spoken at this event last year, I know what a good job Keith, Chris and Meredith and friends do organizing this event. I want to take the time to say thank you for all of your hard work as I really appreciate it. These events are a great place to learn and keep up with a lot of the changes going on in the industry. I anticipate there will be many lively discussions both before and after the event. That reminds me. If you get a chance, on Friday there are two great precons scheduled on Friday, February 6th , Powershell Basics with Mike Fal and Query Tuning, Troubleshooting and Execution Plans with Jason Kassay. Having been fortunate enough to meet both of them, I know that they are both extremely knowledgeable in their respective topics, and if you are in Albuquerque I encourage you to sign up for either of them as I am sure both will be excellent.

I hope that you will be able to attend as I know I will enjoy seeing you there.

 

Yours Always
Ginger Grant
Data aficionado et SQL Raconteur