Articles

SQL Server 2017 Machine Learning Services – Installation

With the release of SQL Server 2017, you now have the capability to incorporate both R and Python into SQL Server. As there  is a lot of material on the topic, this is the first post in the series which covers installation.  In future topics we will be covering the internal components, monitoring a R and Python code to determine performance impact on SQL Server, creating and maintaining R code, creating and maintaining Python code, and other related topics.

Installing Machine Learning Services

R was first introduced in the SQL Server 2016 and it was called R Services.  For SQL Server 2017, this same service was renamed to Machine Learning Services, and expanded to include Python. In the install there are three options, installing the Machine Learning Services, then selecting R and/or Python as you see in the attached picture.

Why you want to select Machine Learning Services(In-Database)

There are two installation options:  In-Database or Standalone.  If you are evaluating Machine Learning Services and you have no knowledge of what the load may be, start by selecting the Machine Learning Service In-Database.  There are several reasons why by default you want to select the In-Database option. One of the problems that Microsoft was looking to solve by incorporating advanced data analytics was to improve performance of the native code by greatly reducing data latency.  If you are analyzing a lot of data which is stored within SQL Server, the performance will be improved if the data does not need to be moved around on a network. Also, the licensing costs of installing R Server standalone also need to be evaluated with a Microsoft representative as well. An evaluation of the resource load on the network, as well as analysis of the code running on SQL Server should be performed prior to the decision to install the Machine Learning Server Standalone.

Internet Access Requirements for installing R and Python

The Machine Learning Service is an optional part of the SQL Server Install.  Because R and Python are both open source applications, Microsoft cannot include the R or Python executables within the install of SQL Server.  The executables must be downloaded from their respective locations on the internet, and the installation process is a little different if there is no internet access.  Each language has two installs, one for the executable and one for the server. If you do not access to the internet on the server where you are installing SQL Server 2017, you can download the files needed for the install.

Here are the links for SQL Server 2017 CU2.

Microsoft R Open

Microsoft R Server

Microsoft Python Open

Microsoft Python Server

If you are installing a different version, use the links provided on the Offline Installation screen. These links each will download a .cab file.  You will need to copy the cab files to a location where the server can access them and provide the path in the Offline Installation window.

 

What is Installed with Machine Learning Services

Machine Learning is an external process and communicates with launchpad.exe to access either R or Python.  For a quick check to see if the Machine Learning Services were installed, look for the SQL Server Launchpad service in the list of running services. It will also create by default 20 different external users which are used to call R or Python. There will be a subdirectory created for each user, with the name of the SQL Server instance name, which is by default MSSQLSERVER, followed by a number 00-20.  These subdirectories have nothing in them, as they are used temporarily when R or Python needs them and the information in them is eventually deleted by SQL Server. The default location is C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\ExtensibilityData.

The R tools are located in the C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES directory, and include RLaucher.dll which will load R.  If you want to run R on the server, which can be handy when updating R libraries, run the RGUI.exe, which will load up an interface where you can run R code.  In SQL Server 2017 CU2, Microsoft R Open 3.3.3 is installed along with R Server 9.2.0.  Microsoft R Open is a version that is 100% Open Source and completely compatible with the standard Open Source version of R, which is commonly referred to as Comprehensive R Archive Network [CRAN] R.  Microsoft rewrote some of the underlying functions so that they would be multi-threaded, which R is not, and incorporate the Intel Math Kernel libraries to improve the performance.   R Server is the version of R which contains the proprietary functions which Microsoft created for SQL Server which include the ability to load code in and out of memory, which will be discussed in a future post.

The Python tools are located in the C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES directory, and include Pythonlauncher.dll which will load Python 3.5. The installation includes the Anaconda distribution for Python, which includes not only the data science components of Anaconda, but also SciKitLearn and Pandas.  Microsoft has also included machine learning algorithms which were created for Python in the microsoftml and revoscalepy libraries.  There is a lot of interesting content will be discussing more about these libraries in a later post.

Configuring SQL Server to Run R or Python

Once the Machine Learning components are installed, there are some configuration steps which must be completed to permit R or Python to run on SQL Server.  If this is a new server, make sure to install SQL Server Management Studio, since it is not included in the SQL Server Install. From within an SSMS query window, the following script needs to be run to enable R

SP_CONFIGURE 'external scripts enabled', 1
GO
RECONFIGURE
GO

After this step completes successfully, a restart of SQL Server Services is required.  When stopping the service, you will be notified that SQL Server Launchpad also will need to be restarted.  I have noticed that for  some reason, Launchpad does not always restart when SQL Server is restarted, so you might want to check to make sure that it is running, as you cannot run R or Python unless the SQL Server Launchpad service is running.  After the restart, to check to see if R is working properly, run the following code from within an SSMS query window.

EXEC sp_execute_external_script @language =N'R',
@script=N'OutputDataSet <-InputDataSet',
@input_data_1 =N'SELECT 1 as CheckToSeeIfRIsWorking'
WITH RESULT SETS (([CheckToSeeIfRIsWorking] int not null));
GO

When run successfully, this script will return a 1. SQL Server is now ready to run R.  To check to see if Python can be run successfully, run this script.

EXEC sp_execute_external_script  @language =N'Python',
@script=N'OutputDataSet = InputDataSet',
@input_data_1 = N'SELECT 1 AS CheckToSeeIfPythonIsWorking'
WITH RESULT SETS ((CheckToSeeIfPythonIsWorking int not null));
GO

In my next post I will cover the SQL Server internal components which are run when R or Python code is run.  Please subscribe to my blog to be notified when the next installment will be available. If you have any questions, comments or ideas for future post topics, please leave me comments as I would really enjoy any feedback.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Applying Data Science to SQL Server

Data has been getting a lot of attention in the business world for a while now.  First there was big data, which was another way to store data so that later the data could be analyzed.  Recently the talk has been all about analyzing the data with new tools such as R and Python.  The reality is that people who have been working with databases doing work in business intelligence have been analyzing data for a while.  Learning a different toolset for analyzing data is not such a big leap, but an expansion of what they know.  As the field is rapidly expanding now, and demand is huge, now is a great time to learn the tools.

Traditional Data Science Development

Data scientist have created analysis solutions with data for a number of years.  The data is analyzed, cleaned, processed with various algorithms, and results are created.  When the process is complete, code has been created to provide meaning from a portion of the data and is ready to be migrated to production. Traditionally there has been a big gap between creating a solution and implementing the solution to be run against data on a regular basis.  Data Scientists traditionally are not part of the IT organization, they are actuaries or analysts, not the people who have anything to do with system processing. Recently I did some work for a company and after the data scientists were done creating a solution, they turned over all of their code to the Java team.  Six weeks later the code was released into production. This solution made no one happy.  Management thought it took too long.  The data scientist didn’t believe that the code that they created was what was implemented into production, and the java developers were tired of people blaming them for wrong code which required a long time to implement.

SQL Server Implementation of Data Science

Since SQL Server 2016 incorporates R and SQL Server 2017 has added the ability to include Python code into SQL Server, data science solutions can be incorporated as part of a scheduled process with SQL Server.  There is now a dev ops solution for incorporating R and Python into SQL Server.  One way of learning about the technology is through blogs and other online training which can help you get up to speed.  Many times though there is no substitute for hands on learning.  If you are attending PASS Summit 2017, and want to learn not only about data science, but how to incorporate it into SQL Server, I hope you can sign up for my all day training session on Applied Data Science for the SQL Server Professional.  I hope to see you there.

I have recently created a You Tube channel where I plan on sharing more data related content where I have included my first video about this conference.

If you are at PASS Summit, please introduce yourself as I would love to meet people who read my blog personally.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Getting Started with Integrated Python and SQL Server 2017

As part of the effort Microsoft is making at incorporating analytics, Python is being added into SQL Server 2017.   This means SQL Server will support the two primary languages of Data Science within SQL Server, R and Python.  As I have previously reviewed using R in SQL Server, I wanted to also review using Python with SQL Server. Since Python is near the top of the most popular programming language charts, many people are interested in learning more about it.  As many data professionals are unfamiliar with Python, I wanted to introduce the topic not just here, but in my upcoming webinar for 24 Hours of PASS on Implementing Advanced Analytics with SQL Server 2017 and Python.

Installing Python in SQL Server

SQL Server 2017 Install Window

SQL Server 2017 Install Window

The process for using Python in SQL Server is very similar to the previous process of installing R.  Microsoft renamed R Services to Machine Learning Services, and now allows both R and Python to be installed, as shown in the screen.  Microsoft’s version of Python uses Anaconda, which is an open source analytics platform created by Continuum. This is where Python differs from other open source languages, as Continuum is providing the version of Python as it contains data science components which are not included in the standard distribution of Python. Continuum also sells an enterprise version of Anaconda, with of course more features than come with the free version. It is important to remember the python environment as you will need select the same distribution when running Python code outside of SQL Server.

Configuration Changes for Python

The last thing needed to run Python is to configure and restart the SQL Server Services. In a new query type the following command

sp_configure 'external scripts enabled', 1
GO
Reconfigure
GO

After restarting the SQL Server Service, SQL Server will now run Python code, or if you installed SQL Server with both R and Python as I did, both languages can be used.

Python Development Environments

SQL Server Management Studio is designed for writing TSQL code, not Python.  The process for implementing Python code in SQL Server would be first to create and test the code in Python, then once the code is working, deploy the code in SQL Server.  There are a number of different User Interfaces that you might want to consider when writing Python.  Python comes with IDLE, but as it rather a feature bereft application, chances are that if one is coding Python, they want to use some other user interface.  Some of the more common ones are JetBrain’s PyCharm , Atom Python Tools or the UI Windows developers use the most, Visual Studio with Python language support.  Selecting and setting up the environments is a surprisingly complex process.  Python is a very flexible language and is widely used beyond the realm of data science to do things like create web applications.  For this reason, the environments selected matter as they create different ecosystems.

Incorporating Python to solve Data Science Solutions

24HoursofPASS2017-PreconPreviewIn my upcoming session for 24 hours of PASS, I will review the pros and cons of several development environments, and let you know which one I selected and the steps needed to make it work.  We will also take a look at implementing some Python code in SQL Server so that we can perform advanced analytical analysis with Python.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur