Articles for the Month of November 2015

Using Tumbling Windows to Select Data from Azure Stream Analytics – Part 2

TumblingWindows
Tumbling Windows are another way of selecting data from an Azure Stream to drive an Azure ML Experiment.  Once again of my examples here are going to be based on the concrete company Eohs, which I referenced in a previous post when talking about Streaming Windows. Eohs is streaming data, via Azure Stream Analytics [ASA] and we need evaluate a portion of that data for an Azure Machine Learning experiment. The experiments don’t need all of the data; only a portion of that data is required. Some of the data will be reported on in real time, and other portions of data will be used for analysis at a longer window. The necessary data will be extracted via an Azure Stream Analytics Query using Windowing. In this post, we will be talking about Tumbling Windows.

Eohs: Streaming Sensor Data

Eohs has installed a tracking system which sends GPS positioning and sensor data which is sent back in near real time to the dispatching company. The dispatchers are able to monitor on their screens the location of the truck, speed, heading and some sensor information delivered every 20 seconds which allow them to know if the truck is loading concrete, pouring concrete, adding water, seatbelt information, and if the passenger door is opened. Eohs is interested in using the sensor data received to figure out if they will need to perform maintenance on their concrete mixing drums. The drums need to have maintenance performed on them based on the drum speed, concrete pouring sensor, and the amount of water added when in use.

Using Azure ML to Determine when to Perform Maintenance

TumblingWindowSince Eohs is streaming their data with ASA, we monitor the sensor information for the water and the drum speed over time to see if maintenance is required on the concrete drum. The Azure ML experiment will look at the combination of the water, drum speed and time of day to determine if maintenance is required. We will need to evaluate the sensors every 15 minutes.

Tumbling Windows in Azure Stream Analytics

We want to look at the performance of the sensors in 15 minute increments, so to do this we are going to use a tumbling window. Tumbling windows are designed to read data in fixed increments, so our query is going to read them every 30 minutes. Using the Stream Analytics Query Language, this query will provide the data.

SELECT VehicleID, Avg(DrumSpeedSensor), avg(PouringSensor), avg(WaterSensor), System.Timestamp as EvalTime 
FROM VehicleTrackingSystem TIMESTAMP BY EntryTime
Group by VehicleID, TumblingWindow(minute, 15)

This query will return the data every 15 mint. The EvalTime will be the single time value when the query was run. TIMESTAMP BY EntryTime will ensure that the data is evaluated based upon when the data was created instead of the time that the data reached the Azure server as sometimes data packets may be received out of order. Having our data split into multiple streams like this will allow for multiple experiments to be performed on our Azure Data Stream.

For Part 3 of this series we will talk about Hopping Windows and how and when to use that technique on our data. If you are interested in knowing when my next post will be available, please subscribe and you’ll receive an email when the next post is available.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Incorporating Azure Stream Analytics with Azure ML – Part 1

Using the Azure Stream Analytics Query Language to Drive an ML Experiment

In the past I have talked about some of the components of Azure Machine Learning, but I thought it might make more sense to talk about creating a solution, rather than the individual components.  As that will take a while, this post  begins a multi-part series to bring in some real world examples to make the concepts around streaming data and Azure Machine Learning [ML] less abstract by starting with the data, adding several ML experiments, then talking about ways to implement the solution. The blog series is focused on the streaming data from a sample company the concrete company Eohs.

Streaming Data in Azure

Eohs has installed a vehicle tracking system which sends GPS positioning and sensor data which is sent back in near real time to the dispatching company. The dispatchers are able to monitor on their screens the location of the truck, speed, heading and some sensor information delivered every 20 seconds which allow them to know if the truck is loading concrete, pouring concrete, adding water, seat belt information, and if the passenger door is opened. Eohs has some policies for their drivers which can involve termination if they are violated. Drivers are not permitted to stop the truck anywhere other than the assigned delivery location, which cuts down on fraud and helps reduce insurance costs. This data is streamed via Azure Stream Analytics [ASA].

Cortana Analytics Implementation of Azure ML

Since Eohs is streaming their data with ASA, we want to implement an Azure ML Experiment to notify dispatch in real time any violation of their policies. As I discussed in a previous blog, since Cortana Analytics includes Azure ML and Stream Analytics, this would using the components is considered a Cortana Analytics implementation. We have created a Machine Learning Experiment which will look at the GPS position of the delivery location, and determine if a driver is stopped for an extra-ordinary length of time at a delivery location, as well as stopped in a non-delivery location. The dispatchers are immediately notified of this, so they can call the driver to figure out what is happening to the truck. What kind of data is needed to be sent to the Azure ML experiment to analyze?

Sliding Windows in Azure Stream Analytics

SlidingWindowsThe Azure ML Experiment needs to evaluate all of the vehicle data which shows that the truck is stopped for a while, generally speaking greater than 90 seconds. After all some traffic lights take 90 seconds to get through, so eliminating the short stops would be helpful in decreasing the data needed to be evaluated. ASA uses a SQL-like query language which makes it easy to split the data so only the data that the experiment needs will be sent. We want to evaluate a window of time where data returned is only the data where the vehicle shows it is stopped for 91 seconds. Finding the 91 second stops is considered a sliding window. Here’s the code you would need to do this.

SELECT VehicleID, Avg(GPSLat), avg(GPSLong), min(Speed), max(PourSensor),Max(WaterSensor), dateadd(second, -91, System.Timestamp) as StartEvalTime
, System.Timestamp as EndEvalTime
FROM VehicleTrackingSystem TIMESTAMP by SensorTime
Group by VehicleID, SlidingWindow(second,91)
HAVING min(Speed) <1

 

EndEvalTime is the Time that this event was calculated by the system. Since I wanted both the start and end evaluation time, the time was calculated by using the DATEADD function. If one of the data elements arrived out of order, using the TIMESTAMP function will ensure that they events will be evaluated in the order they happened instead of the order when the data was received.

Other Windowing in Azure Stream Analytics

window_slideASA also supports two other windowing functions, Tumbling and Hopping. In my next post I will be discussing how and when to use a Tumbling Window. If you are interested in reading the posts as they occur, please subscribe to desertislesql.com to be notified when the next post is available.

 

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Update on SharePoint 2016 and Excel Services

As I wrote about in September of this year, SharePoint 2016 will no longer contain the Power Pivot Gallery. That functionality would be included in Office Online Server, which wasn’t out yet. Later that month Office Online Server was released, and a lot of the questions regarding how Excel Reporting Services were going to be made available were known.

Office Online Server: The Location for Excel

For those of you unfamiliar with the Office Online Server, it isn’t a new product. It was released as Office Web Apps Server in 2012, and was designed to support creating and editing Office documents without having to load SharePointAndOfficethe applications to desktops. The server provides the capability for Office documents to be run via a browser, decreasing the need to update desktops. Most people continued to load office to workstations anyway. Microsoft continued to pound away at the application, which is meant to be the location to go to view Office Applications. Fast forward to 2016, and Microsoft decided that Office Online Server would be the place to go to view all Office documents, meaning they were removing Excel Services from SharePoint, as they already had a product that would do that.

Viewing Excel Reports via SharePoint 2016

Another preview of Office Online Server was released last week, meaning a lot of the details of how Excel Services will work is now known, and you can download Office Online Server yourself to try it out. Office Online Server is a completely separate application from SharePoint, and as a matter of fact it should not share space on a server with SharePoint installed. SharePoint 2016 is designed to work with the Office Online Server, so it will support viewing Excel documents. You will need to establish a trust relationship between the two servers to authenticate them via certificates, which is a lot easier than setting up Kerberos. Of course you can also set up SharePoint to view SQL Server Reporting Services reports as well so you can store all of your reports in one place.

What to do to Prepare for an Upgrade to SharePoint 2016

If your organization is looking to upgrade their versions of SharePoint 2016 and you want to view Excel Reports, go requisition another server for installing Office Online Server as you will need it to support viewing Excel or any other Office Document. SharePoint 2016 has been streamlined to manage files and their content, and is pushing other features to other products. One of the reasons for this is SharePoint has been modified to support content whether it be onsite or on the cloud. As part of Microsoft’s new realization that there are some people who aren’t going cloud, it merges some of the SharePoint Online functionality into the new release and is working on providing better support for SharePoint in the future.

So go ahead and upgrade to the latest SharePoint. You will still be able to access everything in Excel, not through SharePoint, with Office Online Server.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Data Access and Self-Service Business Intelligence

There is more to implementing Self-Service Business Intelligence Business Intelligence than getting new software like Power BI, mindsets and practices also need to change. The data teams in many companies formed their policies based on history with previous technologies. One of those policies that is fraught with contention is letting the users have access to the data in order to do their own analysis. The reasons for this are based on a story like this one.  Like many a data professional, I worked at a company where we gave a team of users access to the database in order for them to do analysis. It was a replicated database, as we didn’t want to impact production. As these analysts primary skill was marketing not SQL, they wrote a query that took all the resources so no one had the ability to do anything else with the database, and we were required to intervene and kill the query to make the database useful again. After that, we changed their access to only being able to use views created for them to prevent this from happening again. Variations of this story exist all over.

Data Access has changed and so has the need for a 64-bit OS

Self-Service BI is supposed to be a way for Analysts to answer ad-hoc questions from Management about the business. While data professionals certainly could and do answer these questions, at some point a DatabaseAccessfocus line is drawn. If the primary focus is to determine the best way to write a query or implement an appropriate indexing scheme, this person has a technical focus and not a business focus. People with a business focus probably should be the person who use data to drive decision making. While technical people can write reports very efficiently, given the continual requests for answers from the data, keeping up with what the business people want to do can be extremely difficult as the numbers of reports required in various formats can be overwhelming. Like the old argument that “You don’t need a 64-bit OS” have become obsolete, so have the reasons for not giving business users access to the data. Now is the time to give them access. If you only have a 32-bit operating system, you don’t have the memory needed to do much data analysis. Data Analyst need 64 bit OS and access to the data.

What kind of Access should Analyst Have?

Most Analyst use Excel, which has become the de-facto tool of choice for data analysis. One doesn’t need to have a working knowledge of the SQL language to analyze data, and the scenario referenced above still happens. Instead data should be provided in a manner which is easy to consume in a Pivot table, allowing users to select, sort and filter the data at will. Analysis Services cubes, whether they be tabular or multidimensional provided this capability. Using a cube in an excel spreadsheet has very little chance of ever crashing a server, so go ahead and grant access. Give analysts the tools they need to provide the answers they need. Create a collaborative environment to grant access and provide the analyst what they need. In this kind of environment true data based decision making can really happen.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

What did I just Evaluate in Azure Machine Learning?

Recently I read a twitter message where the person said they had just run an Azure Machine Learning [ML] experiment and it worked successfully. He wondered what the results mean? I thought it might be helpful to explain how to interpret some of the results. There are a number of different types of algorithms you can use in Azure ML, each has a different way of evaluating the result of the experiment. Azure ML has three major algorithm categories: classification, clustering and regression. Since I believe classification and clustering are more often discussed when reading about ML, first I thought I’d talk about regression.

When should one use Regression Models?

RegressionModelEvaluateWhile it is possible to make a reasonable guess on the other algorithm categories mean, regression algorithms are not so intuitive. A regression algorithms are used to predict a specific value output number based upon a series of variables. In order to run this kind of experiment, you will need a key to uniquely identify the record to be able to use this model, and be looking to return a number for a results.

Reading the Results

If you visualize the output of a ML project, the visualization looks like the picture on the left. The number you want to pay attention to is the coefficient of determination. This value tells you how good your predictions are. The closer this number is to 1, the better your variables are at predicting the results. If you look at the values in my experiment, it will round up to the number 1, since the value is 0.642447. That means the variables I am evaluating can mostly be used to determine a value. If you read about when to use a regression models, the common example using them is to predict a home value, which if you have all the information people use for creating real estate comparisons, such as number of rooms, closeness to a busy street, lot size etc.

No Result is a Result

Sometime there is a value in finding nothing. Recently I did some ML analysis for a company, who provided a set of data for analysis to determine how it was influencing the outcome. After running a number of different algorithms, weights, and other changes in the experiment, I determined that their data didn’t impact the final outcome in any significant way. Prior to presenting the findings to them, I thought that they would not have a favorable opinion of the analysis I did, as they were looking for an answer I didn’t find. What did happen is they were surprised and happy that I was able to show them that their thoughts about the data were not true, so they could focus their efforts to different areas. Keep that in mind next time you find nothing, because no result is an answer too.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Updates to SQL Server Reporting Services include Power BI Compatibility

Where is Power View in Excel 2016?

Last week Microsoft announced their reporting roadmap at PASS Summit. They have been talking about pieces about it for a while, starting with the release of Excel 2016 on September 22, 2015. In the latest version of Excel, you have to add Power

Excel 2016 Group created for Power View

Excel 2016 Group

View and Pivot Tables to the toolbar to be able to use them as these features are considered to be an integral part of reporting outside of Excel. Right click on the blank part of the toolbar, and select Customize the Ribbon. On the bottom right side of the screen there is a button called New Group. You will need to add a group, and then add icons to the Group. I called my group OldExcel, as these items are all considered depreciated features by Microsoft. Once the group has been created, change the dropdown box on the left to All Commands and then you can add Pivot Tables, Power View, Pivot Charts or any other features on the list on the left. The reason you have to add these items is they are no longer part of the reporting roadmap. According to the roadmap, if you want to do interactive reporting you can use Power BI or Datazen, which if you read my earlier blog will be part of SSRS. One thing to note in the reporting roadmap, Excel is not listed, which must mean is not meant to be part of the reporting of the future. This may also why the Power Pivot Gallery is not going to be part of SharePoint 2016.

SQL Server Reporting Services Compatibility with Power BI

Part of the upgrades to SQL Server Reporting Services [SSRS] allow for interoperability with Power BI. Starting with SQL Server 2016 CTP 3, reports created in SSRS can be integrated with reports on Power BI for a single location for reports. Reports created in SSRS can be pinned to Power BI to any existing dashboard via an icon within report manager. For Power BI to integrate directly with SSRS, the SSRS Reporting Services Configuration Manager needs to registers a Power BI account.

Power BI and SSRS Integration Button

Power BI and SSRS Integration Button

Once the account exists, you can pin your SSRS report to a Power BI Dashboard the same way that previously you printed it. If you look at SSRS, it looks a lot like Power BI, since it now uses the same black color scheme, and includes visualization of the various new chart types in Power BI like that are now included like Tree Maps and Sunburst charts. For those of us who have written a lot of reports in the past, one of the really nice features is the ability to configure the parameters any way you want them. With the updates to SSRS, it appears that Microsoft’s slide showing support for on premise applications is more than just a slide, but a commitment which is reflected in the many updates to their on premise reporting solution SSRS.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

The Future of Datazen – SSRS

There were a number of presentations where the next versions of software were discussed at PASS Summit. Among the more interesting was the conversions about the Future of DataZen. For those of you who are unfamiliar with the product, Microsoft bought Datazen on April 14, 2015, as their on-premise Mobile Solution. For more information on Datazen, please check out my previous posts on it here. In July of this year when I talked to someone from Microsoft about how Datazen fit into the product offering, he told me that it was going to be the non-cloud application to provide visualizations to mobile phones. Now with the latest series of product announcements, I fully understand what he meant. Datazen is going to be merged into SQL Server Reporting Services [SSRS]. The Datazen server is going to be combined into SSRS. SSRS is the new Datazen.

SSRS Updates Include Adding Datazen

For anyone who has looked at the product releases notes of the last three versions of SQL Server, you won’t find much about SSRS. The picture perfect reporting system was left alone. I was not surprised to hear from one of the Microsoft Employee’s at their booth at PASS Summit that they were told not even to mention SSRS as a reporting solution a few years ago. Everything was all about Power BI. Power BI is a neat visualization tool, but it doesn’t fit the needs of all users, primarily because it is a cloud based application, and let’s face it. Some people are just not going to go to the cloud.

On Premise Solutions Are No Longer Forgotten

Anyone who has perused by Microsoft’s product offerings in the past few years saw lots of things about the cloud.

Microsoft shows support for cloud and on premise software

Microsoft shows support for cloud and on premise software

Applications like Power BI and Azure ML are only available there. It seemed like Microsoft was abandoning people who wanted to maintain their own servers. They are looking to change this impression. I have included the graphic that Microsoft included in a number of their presentations. They are trying to show love for the box, which is the representation of on premises software.

Datazen SSRS Integration

Datazen is going to maintain the client features that people like, including designing using the grid view and picking the form factor. The server which manages Datazen is no longer going to be a separate server but will be included in the SSRS Server in SQL Server 2016. The idea is that Datazen will inherit some of the features people like about SSRS, such as the subscription model. The details regarding exactly how all of this will work were not made exactly clear, but I imagine that in the very near future more details of the Datazen/SSRS integration will be available soon. One of the things that I wondered is if the name Datazen was going to go away entirely and the application was going to be called SSRS mobile? I heard differing answers to this question, so I am not sure it has been determined yet. What I was assured of is the features that made Datazen a very good mobile solution are not going away. Microsoft hopes that having the ability to publish from SSRS to the phone may be a big reasons people decide to upgrade to SQL Server 2016 next year when this functionality is released.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Lessons Learned About Speaking

As an attendee at PASS Summit, I had the opportunity to learn about a wide variety of topics, including public speaking. I’ll be devoting other blogs to the great technical things I learned, but I thought I would start by talking about the sessions in general. I saw a number of presentations, some which went well, others RimmaNimmeDavidDeWittwhich were beset by technical difficulties. By far the best talk I saw was the keynote with Rimma Nehme and David DeWitt of Microsoft. The presentation was well rehearsed without sounding canned, and the slides were absolutely amazing. You can check out the slides here as they are publically available. I am going to remember what made this talk work, and hope to incorporate what I saw here when I speak next. If you are interested in where that will be, check out my Engagement page as maybe we can meet sometime.

Speaking Techniques

I saw a number of different speaking techniques employed at Speaker Idol. People were really creative. Todd Kleinhans navigated through a game interface. Wes Springbob did an homage to Hitchhiker’s Guide to the Galaxy. By the way, if you haven’t read the series, I think you should as they are great books. I was surprised that all of the judges hadn’t read the books, but even those who didn’t thought he gave a great talk. I demonstrated that I had never used a microphone before, which was not positive. Bill Wolf worked to engage the audience throughout his talk. Ed Watson videotaped his demo. This technique is something that often I have heard that you should do in case your demo crashes, but this was the first time I have seen anyone who actually did record the demo. William Durkin brought great stage presence, which I noticed was a common theme among all of the talks I liked. Effective presenters know their topic so well that the talk should appear effortless and fun, without appearing that a script has been memorized that you are working to run through. Also, remember to have a point to follow during the presentation so I remember what the talk was about midway through. Everyone who did this I thought did a great job.

Speaker Idol Results

The finalist for Speaker Idol were William Durkin b | t , Theresea Iserman t, and David Maxwell b | t. My name was not there, due to my issues with the microphone, which put me off my game. Also despite my goal of not adding useless words, I threw in many “um” and “so” into my talk. In my round, William did the best job, so it was logical that he went forward. I talked to David and Theresa about their respective talks, and I know they put a lot of work and practice making them really good. David was the winner, so I look forward to seeing him at PASS Summit 2016 giving the talk of his choice. As for me, I hope to follow the pattern of fellow Speaker Idol 2014 non-winner Reeves Smith b | t who spoke at PASS Summit for the first time this year the old-fashioned way, by picking a good topic and writing a good abstract for it.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur