Articles

Event Hub Troubleshooting

When creating an Azure Event Hub, chances are there will be no errors. This is not always a good thing, as it may mean that the errors exist but do not appear. Maybe the event hub is sending data, but the data cannot be read by a stream analytics job. Maybe the event hub really is working, but nothing appears in the dashboard. If any of these problems sound familiar, this post should help.

Testing the Event Hub

If you don’t have a source of data, like a raspberry pi or a sensor sending data, you can use this guide to create a C# program to send data to your event hub. Chances are though, this code is going to have to be modified even more than the instructions indicate, because the data sent is not in JSON. While it is not a requirement that data sent to the event hub be in JSON, if you want to read it with stream analytics that is one of the acceptable formats needed. If you are using the code provided and you want to insert a record into a database field input01, the message needs to be changed to the following to add the double-quotes and brackets required by JSON.
var message = "{'input01':\"" + Guid.NewGuid().ToString()   + "\"}";

Validating the Event Hubs Receive Messages

To ensure that the event hub is actually receiving data, validation can only be done in the old Azure portal. The service bus icon is two down from the HDInsight elephant. Double-clicking on the service bus namespace will bring up the a list of event hubs. Double-clicking on that will show this screen. This screen picture was taken roughly at 7:10. How many messages are there at 7:00? None.

eventhubbefore

This screen print was taken at 7:17. Notice anything different about the message count at 7:00?

eventhubafter

Oh look, there are 144 Messages which came in between 7:00 and 7:05. This means that everything really was working, I just needed to wait to see them appear. The wait time tends to vary from 10-20 minutes. Perhaps nothing is wrong. Lucky if this is you as you can stop reading

Stream Analytics with Event Hubs

If you are using an event hub to pass data to a stream analytics job, step one, make sure the stream analytics job is started. Created does not mean started as it should say Running as shown in the clip below.

The input for this stream is set to an event hub which has a standard subscription. The basic subscription, which is of course cheaper, has one default consumer group. With a standard subscription multiple consumer groups can be created and more importantly named. When setting up the inputs there is a blank for the name of the consumer group. If you have a basic subscription this will be empty. If it is empty, then the event hub won’t pass data to the stream analytics job. Perhaps there is a way to get a basic event hub to work with a stream analytics job, but I couldn’t make it happen. When I created an event hub with a standard subscription and created a consumer group and added that name to the input of a streaming analytics job, it worked.

If you have found these troubleshooting tips helpful, please subscribe to my blog, as I will be passing along more tips in my next post which will detail the steps of how to get data from the event hub to a Azure Database.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Azure Stream Analytics Hopping – Part 3

When incorporating streaming data from Azure Streaming Analytics, it is important to select the data to accomplish the goals of the tasks at hand. The sample company Eosh is streaming a lot of sensor data and has a variety of questions which the data is will answer.  Eosh is a concrete delivery company which is streaming the data back from their vehicles to the central dispatch system using Microsoft’s Stream Analytics. There’s a much more detailed description of what Eosh is streaming and their data needs in the first post in this series. After reviewing when Tumbling Windows and Sliding Windows, are used, in this post we are going to discuss another option for streaming data, Hopping Windows.

When to Use Hopping Windows

Eosh wants to use Hopping Windows to determine the previous action when water is added to the concrete mix. There is a flow meter sensor in the water tank which detects when the driver flips the switch to add more water. There are a number of different reasons for adding water, one being that the pouring is complete and the driver is washing out the remaining concrete. Another reason could be that the driver is stuck in traffic and the water is added to keep the concrete from setting up within the mixer. Depending on the type of concrete in the mixer, if too much water is added, the concrete will no longer have the required strength and can’t be used to create a load bearing structure. It is very important that concrete used in structural concrete be created according to specification, as concrete mixed incorrectly will crumble over time, something commonly seen in Detroit.  If too much water is added the vehicle may be routed to a different location so the concrete can be used for a non-load bearing purpose, like creating sidewalks.

Overlapping Hops

HoppingSliceBy design, all hops contain an overlapping previous time slice. The picture provides a good visualization for how the data slices are created. Eohs wants to look at the events which happened 5 minutes prior so that the adding water event can be appropriately categorized. The following Streaming query can provide that data

 

SELECT System.TimeStamp AS OutTime, VehicleID, COUNT(*)
FROM Input TIMESTAMP BY WaterStartPour
GROUP BY VehicleID, HoppingWindow(minute,10 , 5)

This query will create 10 minute slices of time. Each slice will look at the last 5 minutes previous reported and 5 minutes past that. By slicing the data in this way, the context around adding water can be evaluated to determine what kind of water add event took place. Eosh can then use this data to determine if the concrete can be delivered to the original location or if it needs to be rerouted.  This later processing will be accomplished via machine learning, which I will talk about in a later post.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Using Tumbling Windows to Select Data from Azure Stream Analytics – Part 2

TumblingWindows
Tumbling Windows are another way of selecting data from an Azure Stream to drive an Azure ML Experiment.  Once again of my examples here are going to be based on the concrete company Eohs, which I referenced in a previous post when talking about Streaming Windows. Eohs is streaming data, via Azure Stream Analytics [ASA] and we need evaluate a portion of that data for an Azure Machine Learning experiment. The experiments don’t need all of the data; only a portion of that data is required. Some of the data will be reported on in real time, and other portions of data will be used for analysis at a longer window. The necessary data will be extracted via an Azure Stream Analytics Query using Windowing. In this post, we will be talking about Tumbling Windows.

Eohs: Streaming Sensor Data

Eohs has installed a tracking system which sends GPS positioning and sensor data which is sent back in near real time to the dispatching company. The dispatchers are able to monitor on their screens the location of the truck, speed, heading and some sensor information delivered every 20 seconds which allow them to know if the truck is loading concrete, pouring concrete, adding water, seatbelt information, and if the passenger door is opened. Eohs is interested in using the sensor data received to figure out if they will need to perform maintenance on their concrete mixing drums. The drums need to have maintenance performed on them based on the drum speed, concrete pouring sensor, and the amount of water added when in use.

Using Azure ML to Determine when to Perform Maintenance

TumblingWindowSince Eohs is streaming their data with ASA, we monitor the sensor information for the water and the drum speed over time to see if maintenance is required on the concrete drum. The Azure ML experiment will look at the combination of the water, drum speed and time of day to determine if maintenance is required. We will need to evaluate the sensors every 15 minutes.

Tumbling Windows in Azure Stream Analytics

We want to look at the performance of the sensors in 15 minute increments, so to do this we are going to use a tumbling window. Tumbling windows are designed to read data in fixed increments, so our query is going to read them every 30 minutes. Using the Stream Analytics Query Language, this query will provide the data.

SELECT VehicleID, Avg(DrumSpeedSensor), avg(PouringSensor), avg(WaterSensor), System.Timestamp as EvalTime 
FROM VehicleTrackingSystem TIMESTAMP BY EntryTime
Group by VehicleID, TumblingWindow(minute, 15)

This query will return the data every 15 mint. The EvalTime will be the single time value when the query was run. TIMESTAMP BY EntryTime will ensure that the data is evaluated based upon when the data was created instead of the time that the data reached the Azure server as sometimes data packets may be received out of order. Having our data split into multiple streams like this will allow for multiple experiments to be performed on our Azure Data Stream.

For Part 3 of this series we will talk about Hopping Windows and how and when to use that technique on our data. If you are interested in knowing when my next post will be available, please subscribe and you’ll receive an email when the next post is available.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Incorporating Azure Stream Analytics with Azure ML – Part 1

Using the Azure Stream Analytics Query Language to Drive an ML Experiment

In the past I have talked about some of the components of Azure Machine Learning, but I thought it might make more sense to talk about creating a solution, rather than the individual components.  As that will take a while, this post  begins a multi-part series to bring in some real world examples to make the concepts around streaming data and Azure Machine Learning [ML] less abstract by starting with the data, adding several ML experiments, then talking about ways to implement the solution. The blog series is focused on the streaming data from a sample company the concrete company Eohs.

Streaming Data in Azure

Eohs has installed a vehicle tracking system which sends GPS positioning and sensor data which is sent back in near real time to the dispatching company. The dispatchers are able to monitor on their screens the location of the truck, speed, heading and some sensor information delivered every 20 seconds which allow them to know if the truck is loading concrete, pouring concrete, adding water, seat belt information, and if the passenger door is opened. Eohs has some policies for their drivers which can involve termination if they are violated. Drivers are not permitted to stop the truck anywhere other than the assigned delivery location, which cuts down on fraud and helps reduce insurance costs. This data is streamed via Azure Stream Analytics [ASA].

Cortana Analytics Implementation of Azure ML

Since Eohs is streaming their data with ASA, we want to implement an Azure ML Experiment to notify dispatch in real time any violation of their policies. As I discussed in a previous blog, since Cortana Analytics includes Azure ML and Stream Analytics, this would using the components is considered a Cortana Analytics implementation. We have created a Machine Learning Experiment which will look at the GPS position of the delivery location, and determine if a driver is stopped for an extra-ordinary length of time at a delivery location, as well as stopped in a non-delivery location. The dispatchers are immediately notified of this, so they can call the driver to figure out what is happening to the truck. What kind of data is needed to be sent to the Azure ML experiment to analyze?

Sliding Windows in Azure Stream Analytics

SlidingWindowsThe Azure ML Experiment needs to evaluate all of the vehicle data which shows that the truck is stopped for a while, generally speaking greater than 90 seconds. After all some traffic lights take 90 seconds to get through, so eliminating the short stops would be helpful in decreasing the data needed to be evaluated. ASA uses a SQL-like query language which makes it easy to split the data so only the data that the experiment needs will be sent. We want to evaluate a window of time where data returned is only the data where the vehicle shows it is stopped for 91 seconds. Finding the 91 second stops is considered a sliding window. Here’s the code you would need to do this.

SELECT VehicleID, Avg(GPSLat), avg(GPSLong), min(Speed), max(PourSensor),Max(WaterSensor), dateadd(second, -91, System.Timestamp) as StartEvalTime
, System.Timestamp as EndEvalTime
FROM VehicleTrackingSystem TIMESTAMP by SensorTime
Group by VehicleID, SlidingWindow(second,91)
HAVING min(Speed) <1

 

EndEvalTime is the Time that this event was calculated by the system. Since I wanted both the start and end evaluation time, the time was calculated by using the DATEADD function. If one of the data elements arrived out of order, using the TIMESTAMP function will ensure that they events will be evaluated in the order they happened instead of the order when the data was received.

Other Windowing in Azure Stream Analytics

window_slideASA also supports two other windowing functions, Tumbling and Hopping. In my next post I will be discussing how and when to use a Tumbling Window. If you are interested in reading the posts as they occur, please subscribe to desertislesql.com to be notified when the next post is available.

 

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur