Tumbling Windows are another way of selecting data from an Azure Stream to drive an Azure ML Experiment. Once again of my examples here are going to be based on the concrete company Eohs, which I referenced in a previous post when talking about Streaming Windows. Eohs is streaming data, via Azure Stream Analytics [ASA] and we need evaluate a portion of that data for an Azure Machine Learning experiment. The experiments don’t need all of the data; only a portion of that data is required. Some of the data will be reported on in real time, and other portions of data will be used for analysis at a longer window. The necessary data will be extracted via an Azure Stream Analytics Query using Windowing. In this post, we will be talking about Tumbling Windows.
Eohs: Streaming Sensor Data
Eohs has installed a tracking system which sends GPS positioning and sensor data which is sent back in near real time to the dispatching company. The dispatchers are able to monitor on their screens the location of the truck, speed, heading and some sensor information delivered every 20 seconds which allow them to know if the truck is loading concrete, pouring concrete, adding water, seatbelt information, and if the passenger door is opened. Eohs is interested in using the sensor data received to figure out if they will need to perform maintenance on their concrete mixing drums. The drums need to have maintenance performed on them based on the drum speed, concrete pouring sensor, and the amount of water added when in use.
Using Azure ML to Determine when to Perform Maintenance
Since Eohs is streaming their data with ASA, we monitor the sensor information for the water and the drum speed over time to see if maintenance is required on the concrete drum. The Azure ML experiment will look at the combination of the water, drum speed and time of day to determine if maintenance is required. We will need to evaluate the sensors every 15 minutes.
Tumbling Windows in Azure Stream Analytics
We want to look at the performance of the sensors in 15 minute increments, so to do this we are going to use a tumbling window. Tumbling windows are designed to read data in fixed increments, so our query is going to read them every 30 minutes. Using the Stream Analytics Query Language, this query will provide the data.
SELECT VehicleID, Avg(DrumSpeedSensor), avg(PouringSensor), avg(WaterSensor), System.Timestamp as EvalTime FROM VehicleTrackingSystem TIMESTAMP BY EntryTime Group by VehicleID, TumblingWindow(minute, 15)
This query will return the data every 15 mint. The EvalTime will be the single time value when the query was run. TIMESTAMP BY EntryTime will ensure that the data is evaluated based upon when the data was created instead of the time that the data reached the Azure server as sometimes data packets may be received out of order. Having our data split into multiple streams like this will allow for multiple experiments to be performed on our Azure Data Stream.
For Part 3 of this series we will talk about Hopping Windows and how and when to use that technique on our data. If you are interested in knowing when my next post will be available, please subscribe and you’ll receive an email when the next post is available.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur