When starting with Azure Machine Learning, it is sort of hard to wrap one’s brain around what kind of insight that Machine Learning can provide. When doing data analysis, often times we are looking for patterns. Does the volume of data really go up at the end of the month or is just the additional processes that make it seem that way? Does anyone really know if sales really pick up in August or is that just legerdemain from the sales department? Linear Regression can help determine that.
Relationships between Different Items
There are two types of indicators for linear correlation, positive and negative as shown on the following charts. The Y axis represents Grades, and the x axis is changed to show positive and negative correlation of the amount of X on grades. When X is the amount of study hours, there is a positive correlation and the line goes up. When X is changed to watching cat videos, there is a negative correlation. If you can’t draw a line around the points there is no correlation. If I were to create a graph where X indicated the quantity of the bags of Cheese Doodles consumed on grades, it would not be possible to draw a straight line, where the data points cluster around it. Since this is Line-ar regression, if that line doesn’t exist there is no correlation. Knowing there is no correlation is also useful.
Calculating Linear Regression
While the variable relationship is really easy to see without Math, there is an underlying formula that describes Linear Regression, and lest all of the math majors get upset I thought I would include the formula
Yi = a0 + b0Xi + ei
Y – is the value of the Y axis, which in our example is grades
a – Is the point where the line intersects Y, or more clearly stated, where the line is. Now ideally your data should intersect at those points but since the line is sort of a guide, this won’t exactly match.
b – Contains the slope of the line
X – Is the value of the X axis, which depending on the example you are looking at is
E – This contains the error
Machine Learning with Linear Regression
In the blog examples, there are only two values, grades and something else. Machine learning can take all of your input variables and determine which values, if any impact the result. Hopefully this information provides you with a good use case for machine learning. In case you were unaware, Azure ML is availablefor free. All you need to do is sign up for an account at https://studio.azureml.net . There are a few size limitations as far as how much data you can load, but you can load enough to determine if machine learning will work in your environment.
Data aficionado et SQL Raconteur