Fallibility of Machine Learning: Why Lenovo will not sell me a laptop

Machine Learning is being adopted by more and more companies to assist in the sales process. Like all technology, Machine Learning is not correct all of the time. In fact, models with 75% accuracy are commonly accepted as good models and implemented for production. This means 25% of the time the model is wrong, meaning that the algorithm will incorrectly flag 25% of the business. What happens to transactions where an algorithm has incorrectly determined the transaction is not viable? Legitimate business is turned away. This happened to me when I tried to do business with Lenovo. If you are someone who is genuinely interested in how technology has adapted, then you may want to check out this quick guide on some tech tips that might be able to help you with any issues that crop up, unfortunately for me, Lenovo was my issue and I needed their help.

Lenovo’s Ordering System: No Laptop For You!

Buying a laptop from Lenovo reminded me of an episode of Seinfeld when Elaine was trying to buy soup. For some unknown reason, when I placed an order on their website and gave them my money, Lenovo gave me a Cancellation Notice, the email equivalent of “No Soup for you!” After placing an order, about 15 minutes later, I received a cancellation notice. I called customer service. They looked at the order and advised me the system incorrectly cancelled the order. I was told to place the order again as they had resolved the problem. I created a new order, and just like the last time, I received the No Laptop for You cancellation email. I called back. This time I was told that the system thinks I am a fraud. Now I have no laptop and I have been insulted. I asked if the system could be overridden because I was not a fraud. Customer service verified my method of payment and told me that were going to assign a case number to it as that would ensure the transaction would go through, and they would get credit for the order as they were going to place it. Apparently, customer service has some kind of financial incentive for placing sales. That did not work either as, once I again I received the No Laptop for You cancellation email. Not only did I not get a laptop, the person I spoke to also lost out as he was not going to get a credit for the sale. I called back again and this time they told me that they had no idea what was wrong with the system but it had flagged me as a fraud and a case number did not get assigned last time as it was supposed to, which was the reason that that order was canceled, again. They placed the order again and once again I received the No Laptop for You cancellation email. Every attempt at buying a laptop had failed. I had struck out with customer service as had received advice 3 times and every time I got a No Laptop for You cancellation email. At this point I tried getting the situation resolved via social media. Publicly Lenovo said they wanted to help, and sent me one direct message letting me know they would fix the system, and that was the last I ever heard from them. By not sending me another email, the message they sent me instead was No Laptop For You!

Relying on Machine Learning can Cost businesses Sales and leave them wondering about Toilet Seats

I tried to give Lenovo nearly $2000 and they refused to take my money. How many other transactions are they ignoring? Over 500? That does not seem like a terribly high number. Jokes apart, many companies tend to limit the sales of a particular new product. This can be a part of market research where companies tend to use tools like Conjoint Analysis (look for – What is Conjoint Analysis?) or similar tools where they can find out the actual requirements of the consumer as well as a viable price point for it. However, this might not be the case as I don’t remember the customer service person mentioning it. Also, 500 transactions for $2000 a piece is huge. That would mean Lenovo’s sales are needlessly down 1 million dollars because they implemented a system that turns away deals and actively prevents sales despite the best intentions of their employees to close a sale. As a result, Lenovo’s customers may have a negative impression of the company, which could lead to a decrease in laptop and computer sales. In this case, the company’s original and powerful brand image can be revived by implementing a marketing campaign, removing negative web content (with the help of an online reputation management firm), and executing effective branding strategies. Moreover, Lenevo may need to consider the primary cause of this problem — the over-reliance on machine learning.

Blindly relying on the accuracy of a computer program to determine with 100% accuracy whether or not a transaction is viable or not is not just a bad idea but is a bad business decision that can cost millions of dollars in sales. The company can instead train its employees to generate more sales by enrolling them in a sales training program. In the case of an already existing training program, it would make sense to find ways to improve retail sales training program to prepare employees for increased sales. With effective training and the use of machine learning, it is possible that any company like Lenevo is likely in line to get more revenue. Along with that, it seems crucial to showcase such products on the website that are available in the inventory. A story could solidify this statement. A friend of mine who was remodeling a bathroom bought a toilet seat on Amazon. When he logged in again, he continued to see a myriad selection of toilet seat product recommendations for the next six months at the exclusion of other products he might actually want to buy. Apparently, the machine learning algorithm determined that because he bought one toilet seat, he was a Toilet Seat Connoisseur and wanted to decorate his house with a variety of rare and unusual of toilet seats for the next six months.

Combining Machine Learning with People

I create machine learning solutions for clients and provide training sessions to help people learn how to write machine learning models. I understand the process and the steps which are used to create a machine learning experiment. First you gather and clean the data, then train it using a set of algorithms against a set of data, and then you create a model. The problem “Should I cancel this sale” is has two possible answers, yes or no, meaning it is a binary classification for anomaly detection. Never have I created a model which was 100% accurate as that is not possible. I tell clients that is not possible and help them implement solutions to handle conditions when the model is wrong. Machine Learning needs to work in concert with people who have the ability to resolve problems which are flagged by the system, as there is a place for people in all automated systems.

Most normal people would have probably given up after their order was canceled twice, but I persisted as I was amazed that such a big company like Lenovo could continue to be so wrong, and I wanted to prove I was not a fraud. Continued failure to successfully place an order convinced me that I did not want to do business with Lenovo. If a company does not want to resolve an issue where they will receive money, how likely are they to want to resolve a situation which costs them money, such as a warranty claim? Based on my experience, I have no confidence that one could get Lenovo customer service to solve a problem as they do not have the ability, even when they are financially incentivized to do so. Machine Learning and AI may decrease the number of people needed, but when things go wrong people are needed to fix them. When a machine learning model is wrong, and this will happen, the policy should be to permit your customer service people to create successful sales. If instead, your customer service insults and ignores customers when machine learning models go wrong, sales will go down as customers will be going to competitors.

I researched laptops as I was interested in having a lightweight powerful laptop which I could haul through various airports to use at clients and conferences, like Live 360 SQL Server where I will be speaking on December 3. Fortunately there are other companies who have determined they do not need to create some kind of machine learning score to sell a laptop, they just sell laptops to people who go to their website and give them money with no problems. Using the same address and credit card information which Lenovo flagged as fraudulent, I bought my new HP laptop, which I will be happy to demonstrate next time you see me at a conference or class.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Preparing for Exam 70-774 – Perform Cloud Data Science with Azure Machine Learning

There are a number of reasons why you might want to take a Microsoft cert exam. Maybe you want to focus your studies on a tangible thing, or you think it will help further your career, or you work for a Microsoft Partner and they required a certain number of people to pass the exam to maintain their current partner status.  I am not going to get into the long argument regarding whether or not a cert will help you in your career, or not, I can tell you why you might want to take the 70-774 exam. Machine Learning, or Data Science if you prefer, is an important analytic skill to have to analyze data.  I believe that it will only become more useful overtime. Azure Machine Learning is a good tool for learning the analysis process.  Once you have the concepts down, then should you need to use other tools to perform analysis it is just a matter of learning a new tool.  I talk to a number of people who are trying to learn new things, and the study them in their spare time.  It’s very easy to spend time vaguely studying something, but you may find that having a target set of items to study will focus your time, and as a bonus you get a neat badge and some measure of proof that you were spending time on the computer learning new things and not just watching cat videos.

Exam 70-774 Preparation Tips


While you could always buy the book for the exam (shameless plug as I was one of the authors), the book will not be enough and you will still need to write some code, and do some additional studying. This exam one of two needed for the MCSA in Data Science and you an take the exams in any order. The best place to start is by first looking at the 70-774 exam reference page from Microsoft.  There are four different sections in the exam, and I have created some links for each section which will help you prepare for the exam. In studying for exams in the past, the best way I have found to prepare is to look at everything on the outline and make sure that I know it.

Prepare Data for Analysis in Azure Machine Learning and Export from Azure Machine Learning

Normalizing Data
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/normalize-data

TanH
https://reference.wolfram.com/language/ref/Tanh.html

ZScore
http://stattrek.com/statistics/dictionary.aspx?definition=z-score
http://howto.commetrics.com/methodology/statistics/normalization/

Min Max
https://www.quora.com/What-is-the-meaning-of-min-max-normalization

PCA
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/principal-component-analysis
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/principal-component-analysis
https://stackoverflow.com/questions/9590114/importance-of-pca-or-svd-in-machine-learning

SVD
http://andrew.gibiansky.com/blog/mathematics/cool-linear-algebra-singular-value-decomposition/

Canonical-correlation analysis (CCA)
https://en.wikipedia.org/wiki/Canonical_correlation

Singular Value Decomposition (SVD)
http://andrew.gibiansky.com/blog/mathematics/cool-linear-algebra-singular-value-decomposition/

Develop Machine Learning Models

Team Data Science
https://docs.microsoft.com/fi-fi/azure/machine-learning/team-data-science-process/python-data-access

K-Means
https://www.datascience.com/blog/k-means-clustering

Confusion Matrix
http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
https://en.wikipedia.org/wiki/Confusion_matrix
https://en.wikipedia.org/wiki/F1_score

Ordinal Regression
https://en.wikipedia.org/wiki/Ordinal_regression

Poisson regression
https://en.wikipedia.org/wiki/Poisson_regression

Mean Absolute Error and Root Mean Squared Error
http://www.eumetrain.org/data/4/451/english/msg/ver_cont_var/uos3/uos3_ko1.htm

Cross Validation
https://towardsdatascience.com/cross-validation-in-machine-learning-72924a69872f

Operationalize and Manage Azure Machine Learning Services

Connect to a published Machine Learning web service
https://docs.microsoft.com/en-us/azure/machine-learning/studio/publish-a-machine-learning-web-service
https://docs.microsoft.com/en-us/azure/machine-learning/studio/consume-web-service-with-web-app-template
https://docs.microsoft.com/en-us/azure/machine-learning/studio/manage-new-webservice

Use Other Services for Machine Learning

Microsoft Cognitive Toolkit
https://www.microsoft.com/en-us/cognitive-toolkit/

BrainScript
https://docs.microsoft.com/en-us/cognitive-toolkit/brainscript-basic-concepts

Streamline development by using existing resources
https://docs.microsoft.com/en-us/azure/machine-learning/studio/gallery-how-to-use-contribute-publish
Perform database analytics by using SQL Server R Services on Azure
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/provision-vm
https://docs.microsoft.com/en-us/machine-learning-server/install/r-server-vm-data-science
https://journal.r-project.org/archive/2009-2/RJournal_2009-2_Williams.pdf
http://blog.revolutionanalytics.com/2017/07/xgboost-support-added-to-rattle.html
https://github.com/JohnLangford/vowpal_wabbit/wiki

I hope you have found this test preparation material helpful.  If you passed the exam, let me know by sending me a comment.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

Azure Machine Learning Workbench

Microsoft released Azure Machine Learning Workbench at the Ignite conference on September 25, 2017 as a public preview.  This tool is a new tool which they are adding to their Azure ecosystem, which includes the machine learning tool they introduced three years ago, Azure Machine Learning Studio. Microsoft has said they plan on keeping both products. When asked about the two products, they said that the earlier tool, Azure Machine Learning Studio, is targeted to developers who wanted to add machine learning to their current applications, as it is an easy to use tool that doesn’t require a person to be a trained data scientist.  Azure Machine Learning Workbench is targeted to data scientists who want to bring in other libraries, like TensorFlow for Python, and delve deep into the data.

Microsoft Moves into Machine Learning Management

Microsoft is looking for Azure Machine Learning Workbench for more than a tool to use for Machine Learning analysis. It is part of a system to manage and monitor the deployment of machine learning solutions with Azure Machine Learning Model Management. The management aspects are part of the application installation.  To install the Azure Machine Learning Workbench, the application download is available only by creating an account in Microsoft’s Azure environment, where a Machine Learning Model Management resource will be created as part of the install. Within this resource, you will be directed to create a virtual environment in Azure where you will be deploying and managing Machine Learning models.

This migration into management of machine learning components is part of a pattern first seen on the on-premises version of data science functionality.  First Microsoft helped companies manage the deployment of R code with SQL Server 2016 which includes the ability to move R code into SQL Server.  Providing this capability decreased the time it took to implement a data science solution by providing a means for the code can be deployed easily without the need for the R code to be re-written or included in another application. SQL Server 2017 expanded on this idea by allowing Python code to be deployed into SQL Server as well.  With the cloud service Model Management, Microsoft is hoping to centralize the implementation so that all Machine Learning services created can be managed in one place.

Hybrid Cloud, Desktop, and Python

While you must have an Azure account to use the Machine Learning Workbench, the application is designed to run on a locally on either a Mac or Windows computer.  There is a developer edition of the tool so that one can learn the tool and not incur a bill, which is the case with the previous product, Azure Machine Learning.  The download of Machine Learning Workbench must be accessed within an Azure account and is installed to your local computer.  When running the application from your computer, the application will prompt to log into your Azure account to load Azure Machine Learning Workbench.

The application is designed to use and create Python code.  Azure Machine Learning Workbench does not contain any accommodation to incorporate machine learning components written in R, just Python.  If you have created machine learning components using R, they can be incorporated into the Azure Machine Learning Model Management if you create webservices which encapsulate the R code. The R code does not interface into Workbench, but can be made to be a part of the managed projectes in Azure. While it is possible to create a webservice for R with the earlier product with Azure Machine Learning, there is no direct way to include R with Azure Machine Learning Workbench.  There are a number of sample templates to get started using Python templates including the ubiquitous Iris dataset, Linear regression and several others.   Once the project is created, you can use your favorite IDE, it creates python code which can be read anywhere.

Staying within Machine Learning Workbench application allows you access to arguably one of the neatest parts of the Machine Learning Workbench, the data parser. This tool which was originally code-named project Pendleton and designed to be an intuitive way to modify the contents of data even better than the previous leader in parsing data, Power BI’s Power Query.

You can select the option “Derive column by example” or “Split Column by Example” and then start typing in a new column.  For example, if you want to separate a column which contains the date and the time, if you right click on that date column and select “Split Column by Example” then type the date in the new column provided, the application will immediately determine that you want two columns and crate them. The date column and a time column be created for you after typing in one date.  After the sample columns have been created, you can approve the change or reject it if does not work how you want to.

Like Power Query, each change made to the data is included in the window called Steps on the right side of the application window. When you are done modifying the data, right click on the Data Preparations source icon, which in my example is called UFO Clean, to and the UI changes made to the data are used to create Python code to perform the changes. The generated Python code can be used to the source data programmatically.

The next step in the process is to write the python code needed to evaluate the data and create a model which would in my case determine where and when you are most likely to see the next UFO based on the dataset I have included in my project.  Unlike it’s counterpart Azure Machine Learning, you will need to know how to write the necessary code needed to create a machine learning analysis in Python for Azure Machine Learning Workbench. One could write the Python code to create a machine learning analysis in any Python editor.  If you chose to use Azure Machine Learning, the Python library scikit-learn is installed as part of the application.  Other libraries which you may want to use, such as the common library matplot, you will need to load within Azure Machine Learning Workbench.

Web Service: How Azure Machine Learning Workbench Solutions are Deployed

To deploy a package, you will need to export the completed model serialized Python object, with the Python Module, Pickle. This will create a file with the suffix of pkl, which is the file that you will be deploying. Azure Machine Learning Workbench expects that you will be deploying via Docker containers or creating an Azure cluster.  You will need to register the Docker container in the Machine Learning Container for it to be deployed.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

SQL Server 2017 Machine Learning Services Part 3 – Internals

After you have installed SQL Server 2017 with Machine Learning Services, you may notice a couple of interesting things.  One is that by default you will have 20 new users created.  These user ids are  by default named MSSQLSQLServer01, MSSQLSQLServer02, MSSQLSQLServer03… MSSQLSQLServer20, but if you have a named instance, like I have called SQLServer2017, the users are named with the named instance.  There is a subdirectory created for each User ID with is by default located in  \Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityDataExternal .  You do not want to remove these User IDs or rename them.  You may be wondering Why do you have all of these User IDs to use Machine Learning Services and what are they for? Keep on reading to find the answer

SQL Server Launchpad and User IDs

When calling external processes, internally SQL Server uses User IDs to call the Launchpad service, which is installed as part of Machine Learning Services and must be running for SQL Server to be able to execute code written in R or Python.  The number of users is set by default.  To change the number of users, open  up SQL Server Configuration Manager by typing SQLServerManager14.msc at the run prompt. For some unknowable reason Microsoft decided to hide this application which was previously available by looking at the installed programs on the server.  Now for some reason they think everyone should memorize this obscure command. Once you have the SQL Server Configuration Manager open, right click on the SQL Server Launchpad service and select the properties which will show the window, as shown below.  You will notice I am running an instance called SQLServer2017 which is listed in parenthesis in the window name.

SQL Server 2017 Launchpad Configuration

Clicking on the Advanced Tab shows an entry for External Users Count, which is shown highlighted. This value is set by default to 20 users.  This means that 20 different threads can concurrently call an R or Python process.  If you reduce this number to 0, no R or Python code can be run, and the SQL Server Launchpad service will not run.  The minimum number of users you can have and have the launchpad service still run is two, but changing the users to that low number is not recommended as those processes are needed to run Machine Learning Services to rn.  If you have more than 20 concurrent R or Python processes running, SQL Server will wait until one of these threads is no longer in use and once one is free, will use it to call another process. While the process is running you may see some GUIs or other non-decipherable data appear in the folders for a user.  The garbage cleanup runs soon after to delete anything that is in the folder, as they will eventually all be empty. What does the Launchpad Service do and what does that have to do with Machine Learning Services and SQL Server? Well, the short answer is the launchpad.exe is used to call R and Python.

SQL Server Internal Machine Learning Components

To run R or Python code in SQL Server, you will need to execute an external script, which I talked about in the first post of this series.  The following diagram illustrates what happens when that call is made and what executables are called.  When a request to run R or Python code is received by the sqlservr.exe, using a named pipe, SQL Server calls the Launchpad.exe. Every time a stored procedure or call to run R or Python is requested an Rlauncher or Python process is run.  Windows job objects to process the are also created if none exist, but if there are unused windows job objects initiated by a previous call and not presently in use they will be utilized.

The job objects containers will execute the code using the rterm.exe or Python.exe. The rxlink.dll processes messages to the BxlServer to process any SQL/R functions written in the R code, send monitoring information to the SQLPAL, create XEvents.  The Python35.dll will run the python code.  If the Python code is using the revoscalepy library it will call the SQLPAL to create XEvents to use it. Otherwise it will call the BxlServer and call the sqlsatellite.dll to send and retrieve data from SQL Server.  The data is sent back to SQL Server from the sqlsatellite.dll back to SQL Server.  The named pipe used to call launchpad.exe is created internally and is not part of any other named pipe process.  The launchpad.exe uses the User IDs to call R or Python external processes. The R and Python code is executed outside of SQLPAL and the processed data is returned by sqlsatellite.dll to SQL Server.

Hopefully this post answered the questions you had about what SQL Server is doing when you run Machine Learning Services. If you have any additional questions, please let me know by asking me on twitter @desertislesql or leaving me a comment on this post.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

 

SQL Server 2017 Machine Learning Services Part 2 – Memory Allocation

SQL Server 2017 fundamentally changed the underlying structure of SQL Server for reasons that had nothing to do with Machine Learning Services.  Understanding this new architecture will help you configure SQLServer to optimally run R and Python. When Microsoft set out to get SQL Server to work on Linux, the goal was to provide the nearly 30 years of development effort to a new operating system without having to re-write all of the code used to make SQL Server run on the Linux operating system. For SQL Server 2005, Microsoft created a SQLOS, which created an abstraction layer between the hardware and SQL Server.  This abstraction layer allowed SQL Server to take advantages of hardware changes by expanding the capability of SQL Server to take advantage of hardware changes even when the operating system had not implemented all of the code needed to fully implemented it. From a practical perspective, this mean when you configured SQL Server internally to use 100% of all available memory, this didn’t mean all of the memory on the server, it mean 100% of the memory allocated to SQL Server.

For SQL Server 2017, Microsoft created the SQL Server Platform Abstraction Layer [SQLPAL].  Like SQLOS before it, SQLPAL abstracts the calls to the operating system. It implemented the ability to be operating system independent by separating SQL Server Code from the operating system by creating abstraction layer between SQL Server and the Operating system which includes the management of memory, processing thread and IO. This layer of abstraction provides the ability to create one version of SQL Server code which can then be run both platforms, Linux or Windows operating systems.  SQL PAL manages all memory and threads used by SQL Server.

Machine Learning Resources and SQL Server Memory Allocation

Enabling Machine Learning Services on SQL Server which I discussed in a previous blog post, requires you to enable external scripts.  Machine Learning Services are run as external processes to SQLPAL. This means that when you are running Python or R code you are running it outside of the managed processes of SQL Server and SQLPAL.  This design means that the resources used to run Machine Learning Services will run outside of the resources allocated for SQL Server.  If you are planning on using Machine Learning Services you will want to review the server memory options which you may have set for SQL Server.  If you have set the max server memory For example, if your server has 16 GB of RAM memory, and you have allocated  8 GB to SQL Server and you estimate that the operating system will use an additional 4 GB, that means that machine learning services will have 4 GB remaining which it can use.

By design, Machine Learning Services will not starve out all of the memory for SQL Server because it doesn’t use it.  This means DBAs to not have to worry about SQL Server processes not running because some R program is using all the memory as it does not use the memory SQL Server has allocated.  You do have to worry about the amount of memory allocated to Machine Learning Services as by default, using our previous example where there was 4 GB which Machine Learning Services can use, it will only use 20% of the available memory or  819 KB of memory.  That  is not a lot of memory.  Most likely if you are doing a lot of Machine Learning Services work you will want to use more memory which means you will want to change the default memory allocation for external services.

SQL Server Resource Allocation

SQL Server manages all resources using the application layer, SQLOS. SQLOS is the interface between SQL Server and all of the underlying hardware resources, including of course memory.  Using the Resource Governor within SQL Server it is possible to allocate the resources used by specific processes to ensure that no single process will for example use all the memory, starving out other processes running on the machine. Configuring and using Resource Pools provides more important functions such as production applications to be allocated the majority of the SQL Server resources used by the SQLOS. This will ensure for example that an ad-hoc reporting query will not adversely impact the primary application.

Machine Learning Services Resource Allocation within SQL Server

The allocations for the Resource Governor for all SQLPAL functions can be found by running

SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'default'

By default, the max cpu, memory and cpu cap are all set to 100 percent. To look at the resource allocation for Machine Learning Services, you will need to  look at the the external resource pools.

SELECT * FROM sys.resource_governor_external_resource_pools WHERE name = 'default'

By default, the maximum memory that Machine Learning Services can use, outside of the memory that has been allocated to SQL Server, is 20% of the remaining memory. If the processes running require more memory, the allocated percentage amounts for memory and external pool resources may need to be adjusted. The following settings will decrease the overall memory settings for SQLOS and increase the memory allocated to external processes from 20% to 50%

ALTER EXTERNAL RESOURCE POOL "default" WITH (max_memory_percent = 40);
ALTER RESOURCE GOVERNOR reconfigure;
GO

Using our previous example of 4 GB of memory available after the memory allocation to SQL Server and the OS, the memory available for Machine Learning Services would go from .819 GB to 2 GB.  Setting resources for the external resource pool will in no way impact the resources SQL Server uses.  If you run the previous queries listed above you will see the changes made to the external pool while the standard resource governor pool is not changed.

Determining How Much Memory is needed for Machine Learning Services with SQL Server

How do you know how much memory SQL Server needs for Machine Learning Services? Well since I am a consultant I feel compelled to say, it depends.  Given the relative newness of the Machine Learning Tools, there are not any really good guidelines as the memory which you are using greatly depends on the complexity and quantity of the R or Python code you are running as well as how much data these processes are running against.  It also depends what language you are using.  R is more memory intensive than R and unless you are using the Rx functions which are a part of the Machine Learning Services service, will not swap items in and out of memory. The best way to determine how much memory you are using is to monitor its use over time, and the best way to do that is to create a process for monitoring the external resources.

Best Practice Method for Monitoring Machine Learning Services Resources

Creating resource pools for machine learning to monitor use over time is considered a best practice method for ongoing monitoring of resources. The following code will create an external resource pool for processes running Machine Learning Services and classifying the resources run to use it. If you are familiar with setting up resource pools in SQL Server, this process is the same, it just needs to be applied to external resources as well to use the external resources. To monitor the Machine Learning Services, the first step is to create an external resource pool called ML_Resources instead of just using the default. I am going to allocate all of the external resources to it.

CREATE EXTERNAL RESOURCE POOL ML_Resources WITH (max_memory_percent = 100);

The next step in the process is to create a workload group.  The workload group, named MLworkloadGroup  in the code, is used as a container to hold processes which have been classified as ML processes.

CREATE WORKLOAD GROUP MLworkloadGroup WITH (importance = medium) USING "default", EXTERNAL "ML_resources";

The next step is to create a function for classifying processes running as R or Python so that they can be monitored in the workload group.

USE master
GO
CREATE FUNCTION is_ML_app()
RETURNS sysname
WITH schemabinding
AS
BEGIN
IF program_name() in ('Microsoft R Host', 'RStudio', ‘Python’, ‘Pythonw’) RETURN 'MLworkloadGroup';
RETURN 'default'
END;
GO

Once the function has been created, then the Resource Governor is directed to use the function so that all of the Python and R code are monitored in the external resource pool and turns on the Resource Governor with the reconfigure command.

ALTER RESOURCE GOVERNOR WITH (classifier_function = dbo.is_ML_app);
ALTER RESOURCE GOVERNOR   reconfigure;
GO

Going forward, all processes running R or Python will be classified and use all available memory.  After these steps are completed, you can obtain performance information from the DMVs sys.dm_resource_governor_resource_pool and  sys.dm_resource_governor_workload_groups by creating a query like this

USE master
GO
SELECT a.session_id, a.login_name,  b.name
FROM sys.dm_exec_sessions AS a
JOIN sys.dm_resource_governor_workload_groups AS b
ON a.group_id = b.group_id

 

Using the Windows Performance Monitor, you will now be able to take a look at the resources being used for Machine Learning Services and can then determine how much memory is needed based upon actual usage on the server.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur