Thank you for all of the people who signed up for my webinar on Data Analysis with Azure Machine Learning [ML]. I hope after watching it that you find reasons to agree that the most important thing you need to know to get started in Machine Learning is not Math, but having good knowledge of the data you want to analyze. There’s no reason not to investigate as Azure Machine Learning is free. In order to take more time with the questions after the presentation than the webinar format allowed, I am posting my answers here, where I am able to answer them in greater detail.
How would one choose a subset of data to “train” the model? For example, would I choose a random 1000 rows from my data set?
It is important to select a subset of data which is representative of the data which wish to evaluate. Sometime a random 1000 rows will do that, and other times you will need to use other criteria, like transactions throughout a given date range to be a better representative sample. It all comes down to knowing your data well enough to know that the data used for testing is similar to what you will be ultimately using for analysis.
Do you have to rerun or does it save results?
The process of creating an experiment requires that for each run you need to re-run the data as it does not save results.
Does Azure ML use the same logic as data mining?
In a word, no. If you look at the algorithms used for data mining you will see they overlap with some of the models available in Azure ML. Azure ML provides a richer set of models, plus a greater ability to either call models created by others or write custom models.
How much does Azure ML cost?
There is no cost for Azure ML. You can sign up and use it for free. Click here for more information on Azure ML.
If I am using Data Factory, can I use Azure ML ?
Data Factory added the ability to call Azure ML in December, providing another place to incorporate Azure ML analytics. When an Azure experiment is complete, it is published as a web service so that the experiment can be called by any program which chooses to call it. Using the Azure ML experiments from directly within Data Factory decreases the need to write custom code, while allowing the logic to be incorporated into routine data collection processes.
http://azure.microsoft.com/blog/2014/12/16/azure-data-factory-updates-integration-with-azure-machine-learning-2/
If you have more questions about Azure ML or would like to see me present on the topic live and live in Southern California, I hope you can attend SQL Saturday #389 – Huntington Beach where I will be presenting on Azure ML and Top ten SSIS tips. I hope to see you there.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur