Articles for the Month of December 2017

SQL Server 2017 Machine Learning Services Part 3 – Internals

After you have installed SQL Server 2017 with Machine Learning Services, you may notice a couple of interesting things.  One is that by default you will have 20 new users created.  These user ids are  by default named MSSQLSQLServer01, MSSQLSQLServer02, MSSQLSQLServer03… MSSQLSQLServer20, but if you have a named instance, like I have called SQLServer2017, the users are named with the named instance.  There is a subdirectory created for each User ID with is by default located in  \Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityDataExternal .  You do not want to remove these User IDs or rename them.  You may be wondering Why do you have all of these User IDs to use Machine Learning Services and what are they for? Keep on reading to find the answer

SQL Server Launchpad and User IDs

When calling external processes, internally SQL Server uses User IDs to call the Launchpad service, which is installed as part of Machine Learning Services and must be running for SQL Server to be able to execute code written in R or Python.  The number of users is set by default.  To change the number of users, open  up SQL Server Configuration Manager by typing SQLServerManager14.msc at the run prompt. For some unknowable reason Microsoft decided to hide this application which was previously available by looking at the installed programs on the server.  Now for some reason they think everyone should memorize this obscure command. Once you have the SQL Server Configuration Manager open, right click on the SQL Server Launchpad service and select the properties which will show the window, as shown below.  You will notice I am running an instance called SQLServer2017 which is listed in parenthesis in the window name.

SQL Server 2017 Launchpad Configuration

Clicking on the Advanced Tab shows an entry for External Users Count, which is shown highlighted. This value is set by default to 20 users.  This means that 20 different threads can concurrently call an R or Python process.  If you reduce this number to 0, no R or Python code can be run, and the SQL Server Launchpad service will not run.  The minimum number of users you can have and have the launchpad service still run is two, but changing the users to that low number is not recommended as those processes are needed to run Machine Learning Services to rn.  If you have more than 20 concurrent R or Python processes running, SQL Server will wait until one of these threads is no longer in use and once one is free, will use it to call another process. While the process is running you may see some GUIs or other non-decipherable data appear in the folders for a user.  The garbage cleanup runs soon after to delete anything that is in the folder, as they will eventually all be empty. What does the Launchpad Service do and what does that have to do with Machine Learning Services and SQL Server? Well, the short answer is the launchpad.exe is used to call R and Python.

SQL Server Internal Machine Learning Components

To run R or Python code in SQL Server, you will need to execute an external script, which I talked about in the first post of this series.  The following diagram illustrates what happens when that call is made and what executables are called.  When a request to run R or Python code is received by the sqlservr.exe, using a named pipe, SQL Server calls the Launchpad.exe. Every time a stored procedure or call to run R or Python is requested an Rlauncher or Python process is run.  Windows job objects to process the are also created if none exist, but if there are unused windows job objects initiated by a previous call and not presently in use they will be utilized.

The job objects containers will execute the code using the rterm.exe or Python.exe. The rxlink.dll processes messages to the BxlServer to process any SQL/R functions written in the R code, send monitoring information to the SQLPAL, create XEvents.  The Python35.dll will run the python code.  If the Python code is using the revoscalepy library it will call the SQLPAL to create XEvents to use it. Otherwise it will call the BxlServer and call the sqlsatellite.dll to send and retrieve data from SQL Server.  The data is sent back to SQL Server from the sqlsatellite.dll back to SQL Server.  The named pipe used to call launchpad.exe is created internally and is not part of any other named pipe process.  The launchpad.exe uses the User IDs to call R or Python external processes. The R and Python code is executed outside of SQLPAL and the processed data is returned by sqlsatellite.dll to SQL Server.

Hopefully this post answered the questions you had about what SQL Server is doing when you run Machine Learning Services. If you have any additional questions, please let me know by asking me on twitter @desertislesql or leaving me a comment on this post.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

 

SQL Server 2017 Machine Learning Services Part 2 – Memory Allocation

SQL Server 2017 fundamentally changed the underlying structure of SQL Server for reasons that had nothing to do with Machine Learning Services.  Understanding this new architecture will help you configure SQLServer to optimally run R and Python. When Microsoft set out to get SQL Server to work on Linux, the goal was to provide the nearly 30 years of development effort to a new operating system without having to re-write all of the code used to make SQL Server run on the Linux operating system. For SQL Server 2005, Microsoft created a SQLOS, which created an abstraction layer between the hardware and SQL Server.  This abstraction layer allowed SQL Server to take advantages of hardware changes by expanding the capability of SQL Server to take advantage of hardware changes even when the operating system had not implemented all of the code needed to fully implemented it. From a practical perspective, this mean when you configured SQL Server internally to use 100% of all available memory, this didn’t mean all of the memory on the server, it mean 100% of the memory allocated to SQL Server.

For SQL Server 2017, Microsoft created the SQL Server Platform Abstraction Layer [SQLPAL].  Like SQLOS before it, SQLPAL abstracts the calls to the operating system. It implemented the ability to be operating system independent by separating SQL Server Code from the operating system by creating abstraction layer between SQL Server and the Operating system which includes the management of memory, processing thread and IO. This layer of abstraction provides the ability to create one version of SQL Server code which can then be run both platforms, Linux or Windows operating systems.  SQL PAL manages all memory and threads used by SQL Server.

Machine Learning Resources and SQL Server Memory Allocation

Enabling Machine Learning Services on SQL Server which I discussed in a previous blog post, requires you to enable external scripts.  Machine Learning Services are run as external processes to SQLPAL. This means that when you are running Python or R code you are running it outside of the managed processes of SQL Server and SQLPAL.  This design means that the resources used to run Machine Learning Services will run outside of the resources allocated for SQL Server.  If you are planning on using Machine Learning Services you will want to review the server memory options which you may have set for SQL Server.  If you have set the max server memory For example, if your server has 16 GB of RAM memory, and you have allocated  8 GB to SQL Server and you estimate that the operating system will use an additional 4 GB, that means that machine learning services will have 4 GB remaining which it can use.

By design, Machine Learning Services will not starve out all of the memory for SQL Server because it doesn’t use it.  This means DBAs to not have to worry about SQL Server processes not running because some R program is using all the memory as it does not use the memory SQL Server has allocated.  You do have to worry about the amount of memory allocated to Machine Learning Services as by default, using our previous example where there was 4 GB which Machine Learning Services can use, it will only use 20% of the available memory or  819 KB of memory.  That  is not a lot of memory.  Most likely if you are doing a lot of Machine Learning Services work you will want to use more memory which means you will want to change the default memory allocation for external services.

SQL Server Resource Allocation

SQL Server manages all resources using the application layer, SQLOS. SQLOS is the interface between SQL Server and all of the underlying hardware resources, including of course memory.  Using the Resource Governor within SQL Server it is possible to allocate the resources used by specific processes to ensure that no single process will for example use all the memory, starving out other processes running on the machine. Configuring and using Resource Pools provides more important functions such as production applications to be allocated the majority of the SQL Server resources used by the SQLOS. This will ensure for example that an ad-hoc reporting query will not adversely impact the primary application.

Machine Learning Services Resource Allocation within SQL Server

The allocations for the Resource Governor for all SQLPAL functions can be found by running

SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'default'

By default, the max cpu, memory and cpu cap are all set to 100 percent. To look at the resource allocation for Machine Learning Services, you will need to  look at the the external resource pools.

SELECT * FROM sys.resource_governor_external_resource_pools WHERE name = 'default'

By default, the maximum memory that Machine Learning Services can use, outside of the memory that has been allocated to SQL Server, is 20% of the remaining memory. If the processes running require more memory, the allocated percentage amounts for memory and external pool resources may need to be adjusted. The following settings will decrease the overall memory settings for SQLOS and increase the memory allocated to external processes from 20% to 50%

ALTER EXTERNAL RESOURCE POOL "default" WITH (max_memory_percent = 40);
ALTER RESOURCE GOVERNOR reconfigure;
GO

Using our previous example of 4 GB of memory available after the memory allocation to SQL Server and the OS, the memory available for Machine Learning Services would go from .819 GB to 2 GB.  Setting resources for the external resource pool will in no way impact the resources SQL Server uses.  If you run the previous queries listed above you will see the changes made to the external pool while the standard resource governor pool is not changed.

Determining How Much Memory is needed for Machine Learning Services with SQL Server

How do you know how much memory SQL Server needs for Machine Learning Services? Well since I am a consultant I feel compelled to say, it depends.  Given the relative newness of the Machine Learning Tools, there are not any really good guidelines as the memory which you are using greatly depends on the complexity and quantity of the R or Python code you are running as well as how much data these processes are running against.  It also depends what language you are using.  R is more memory intensive than R and unless you are using the Rx functions which are a part of the Machine Learning Services service, will not swap items in and out of memory. The best way to determine how much memory you are using is to monitor its use over time, and the best way to do that is to create a process for monitoring the external resources.

Best Practice Method for Monitoring Machine Learning Services Resources

Creating resource pools for machine learning to monitor use over time is considered a best practice method for ongoing monitoring of resources. The following code will create an external resource pool for processes running Machine Learning Services and classifying the resources run to use it. If you are familiar with setting up resource pools in SQL Server, this process is the same, it just needs to be applied to external resources as well to use the external resources. To monitor the Machine Learning Services, the first step is to create an external resource pool called ML_Resources instead of just using the default. I am going to allocate all of the external resources to it.

CREATE EXTERNAL RESOURCE POOL ML_Resources WITH (max_memory_percent = 100);

The next step in the process is to create a workload group.  The workload group, named MLworkloadGroup  in the code, is used as a container to hold processes which have been classified as ML processes.

CREATE WORKLOAD GROUP MLworkloadGroup WITH (importance = medium) USING "default", EXTERNAL "ML_resources";

The next step is to create a function for classifying processes running as R or Python so that they can be monitored in the workload group.

USE master
GO
CREATE FUNCTION is_ML_app()
RETURNS sysname
WITH schemabinding
AS
BEGIN
IF program_name() in ('Microsoft R Host', 'RStudio', ‘Python’, ‘Pythonw’) RETURN 'MLworkloadGroup';
RETURN 'default'
END;
GO

Once the function has been created, then the Resource Governor is directed to use the function so that all of the Python and R code are monitored in the external resource pool and turns on the Resource Governor with the reconfigure command.

ALTER RESOURCE GOVERNOR WITH (classifier_function = dbo.is_ML_app);
ALTER RESOURCE GOVERNOR   reconfigure;
GO

Going forward, all processes running R or Python will be classified and use all available memory.  After these steps are completed, you can obtain performance information from the DMVs sys.dm_resource_governor_resource_pool and  sys.dm_resource_governor_workload_groups by creating a query like this

USE master
GO
SELECT a.session_id, a.login_name,  b.name
FROM sys.dm_exec_sessions AS a
JOIN sys.dm_resource_governor_workload_groups AS b
ON a.group_id = b.group_id

 

Using the Windows Performance Monitor, you will now be able to take a look at the resources being used for Machine Learning Services and can then determine how much memory is needed based upon actual usage on the server.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur