Recently I read a twitter message where the person said they had just run an Azure Machine Learning [ML] experiment and it worked successfully. He wondered what the results mean? I thought it might be helpful to explain how to interpret some of the results. There are a number of different types of algorithms you can use in Azure ML, each has a different way of evaluating the result of the experiment. Azure ML has three major algorithm categories: classification, clustering and regression. Since I believe classification and clustering are more often discussed when reading about ML, first I thought I’d talk about regression.
When should one use Regression Models?
While it is possible to make a reasonable guess on the other algorithm categories mean, regression algorithms are not so intuitive. A regression algorithms are used to predict a specific value output number based upon a series of variables. In order to run this kind of experiment, you will need a key to uniquely identify the record to be able to use this model, and be looking to return a number for a results.
Reading the Results
If you visualize the output of a ML project, the visualization looks like the picture on the left. The number you want to pay attention to is the coefficient of determination. This value tells you how good your predictions are. The closer this number is to 1, the better your variables are at predicting the results. If you look at the values in my experiment, it will round up to the number 1, since the value is 0.642447. That means the variables I am evaluating can mostly be used to determine a value. If you read about when to use a regression models, the common example using them is to predict a home value, which if you have all the information people use for creating real estate comparisons, such as number of rooms, closeness to a busy street, lot size etc.
No Result is a Result
Sometime there is a value in finding nothing. Recently I did some ML analysis for a company, who provided a set of data for analysis to determine how it was influencing the outcome. After running a number of different algorithms, weights, and other changes in the experiment, I determined that their data didn’t impact the final outcome in any significant way. Prior to presenting the findings to them, I thought that they would not have a favorable opinion of the analysis I did, as they were looking for an answer I didn’t find. What did happen is they were surprised and happy that I was able to show them that their thoughts about the data were not true, so they could focus their efforts to different areas. Keep that in mind next time you find nothing, because no result is an answer too.
Data aficionado et SQL Raconteur