When incorporating streaming data from Azure Streaming Analytics, it is important to select the data to accomplish the goals of the tasks at hand. The sample company Eosh is streaming a lot of sensor data and has a variety of questions which the data is will answer. Eosh is a concrete delivery company which is streaming the data back from their vehicles to the central dispatch system using Microsoft’s Stream Analytics. There’s a much more detailed description of what Eosh is streaming and their data needs in the first post in this series. After reviewing when Tumbling Windows and Sliding Windows, are used, in this post we are going to discuss another option for streaming data, Hopping Windows.
When to Use Hopping Windows
Eosh wants to use Hopping Windows to determine the previous action when water is added to the concrete mix. There is a flow meter sensor in the water tank which detects when the driver flips the switch to add more water. There are a number of different reasons for adding water, one being that the pouring is complete and the driver is washing out the remaining concrete. Another reason could be that the driver is stuck in traffic and the water is added to keep the concrete from setting up within the mixer. Depending on the type of concrete in the mixer, if too much water is added, the concrete will no longer have the required strength and can’t be used to create a load bearing structure. It is very important that concrete used in structural concrete be created according to specification, as concrete mixed incorrectly will crumble over time, something commonly seen in Detroit. If too much water is added the vehicle may be routed to a different location so the concrete can be used for a non-load bearing purpose, like creating sidewalks.
Overlapping Hops
By design, all hops contain an overlapping previous time slice. The picture provides a good visualization for how the data slices are created. Eohs wants to look at the events which happened 5 minutes prior so that the adding water event can be appropriately categorized. The following Streaming query can provide that data
SELECT System.TimeStamp AS OutTime, VehicleID, COUNT(*) FROM Input TIMESTAMP BY WaterStartPour GROUP BY VehicleID, HoppingWindow(minute,10 , 5)
This query will create 10 minute slices of time. Each slice will look at the last 5 minutes previous reported and 5 minutes past that. By slicing the data in this way, the context around adding water can be evaluated to determine what kind of water add event took place. Eosh can then use this data to determine if the concrete can be delivered to the original location or if it needs to be rerouted. This later processing will be accomplished via machine learning, which I will talk about in a later post.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur