Articles

Applying Data Science to SQL Server

Data has been getting a lot of attention in the business world for a while now. First there was big data, which was another way to store data so that later the data could be analyzed. Recently the talk has been all about analyzing the data with new tools such as R and Python. The reality is that people who have been working with databases doing work in business intelligence have been analyzing data for a while. Learning a different toolset for analyzing data is not such a big leap, but an expansion of what they know.

As the field is rapidly expanding now, and demand is huge, now is a great time to learn the tools. With the advent of more advanced software, it seems essential to create fast and reliable processing mechanisms. In addition, concepts such as Web3 seem to be developing and becoming realities at an accelerated rate. Data plays an essential role in the development of such marvels of information technology. It is high time companies consider combining traditional data science tools, like MATLAB, Apache Spark, and SQL, with modern tools, like web3 sql. This could help them organize and process the data at relative speeds while reducing data analytics costs.

Traditional Data Science Development

Data scientist have created analysis solutions with data for a number of years. The data is analyzed, cleaned, processed with various algorithms, and results are created. When the process is complete, code has been created to provide meaning from a portion of the data and is ready to be migrated to production. Traditionally there has been a big gap between creating a solution and implementing the solution to be run against data on a regular basis. Data Scientists traditionally are not part of the IT organization, they are actuaries or analysts, not the people who have anything to do with system processing. Recently I did some work for a company and after the data scientists were done creating a solution, they turned over all of their code to the Java team. Six weeks later the code was released into production. This solution made no one happy. Management thought it took too long. The data scientist didn’t believe that the code that they created was what was implemented into production, and the java developers were tired of people blaming them for wrong code which required a long time to implement.

SQL Server Implementation of Data Science

Since SQL Server 2016 incorporates R and SQL Server 2017 has added the ability to include Python code into SQL Server, data science solutions can be incorporated as part of a scheduled process with SQL Server. There is now a dev ops solution for incorporating R and Python into SQL Server. One way of learning about the technology is through blogs and other online training which can help you get up to speed. Many times though there is no substitute for hands on learning. If you are attending PASS Summit 2017, and want to learn not only about data science, but how to incorporate it into SQL Server, I hope you can sign up for my all day training session on Applied Data Science for the SQL Server Professional. I hope to see you there.

I have recently created a You Tube channel where I plan on sharing more data related content where I have included my first video about this conference.

If you are at PASS Summit, please introduce yourself as I would love to meet people who read my blog personally.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

What to do when SQL Server 2016 R Installation Does Not Work

I was recently helping a customer with their R installation.  After R was installed successfully, I went to the SQL Server Management Studio and ran the script needed to configure external services

EXEC sp_configure  'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE
GO
RECONFIGURE;
GO

After this script ran successfully, I closed SSMS, and restarted the SQL Server service, noting when I did so that it told me that it was also going to be restarting SQL Server Launchpad, which is installed as part of R Server, was also going to be restarted.  Everything looked good until I tried to run my first R command.  I went for the super generic script

EXEC sp_execute_external_script @language=N'R'
,@script=N'OutputDataSet <-InputDataSet'
,@input_data_1=N'SELECT 1 as CheckToSeeIfRIsWorking'
WITH RESULT SETS
(
(
[CheckToSeeIfRIsWorking] INT NOT NULL
)
);

Much to my surprise after this I received an error

Msg 39019, Level 16, State 1, Line 1
An external script error occurred:
Unable to launch the runtime. ErrorCode 0x80070490: 1168(Element not found.).
Msg 11536, Level 16, State 1, Line 1
EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.

I looked in the log files and didn’t find any errors.  I checked the configuration manager to ensure that I had some user ids configured in the configuration manager.  Nothing seemed to make any difference.  Looking online, the only error that I saw which might possibly be close was a different error message about 8.3 naming and the working directory.

Resolving Error Code 0x80070490 Unable to launch the runtime for R in SQL Server

When I saw this error, I first checked the various SQL Log files. I didn’t find any errors.  I checked the configuration manager to ensure that I had some user ids configured in the configuration manager.  The default 20 users were there as expected. Nothing seemed to make any difference.  Looking online, the only error that I saw which might possibly be close was a different error message about 8.3 naming and the working directory.  I ran into a similar problem with R and Power BI.  R natively doesn’t understand anything but directories which are only 8 characters long.  If there are more letters than this in a directory, which is always the case with the default SQL Server directories, internally the directory is read as the first 8 characters and a tilde(~). This particular server also had an installation of SQL Server 2014 on it, and if you looked at the filename, it was impossible to tell which one was which as the directory read like this C:\PROGRA~1\MICROS~2\MSSQL1~1.MSS\MSSQL\EXTENS~1    I created a new directory with short names D:\Rwork and changed the working directory to the new subdirectory Rwork by updating the rlauncher.config file and restarting the SQL Server Launchpad Service.  I was really hopeful that this would work.  But when I ran the external script listed above, it still did not work.  What had I done wrong?  I didn’t delete the previous working directory, which was in the default location C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\ExtensibilityData . When SQL Server creates this subdirectory, it also includes subdirectories for all of the users.  When I created the new directory, it didn’t have those subdirectories added.  I copied the subdirectories from the old working directory to the new one and tried again.  This time, success!

I hope you find this post helpful, as I know I could have used it when I was trying to figure this out. If it saved you time too, drop me a line and let me know.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

SQL Server R Services and the 20 User IDs

What is the Reason why installing R creates 20 User IDs?

If you have installed SQL, you may have noticed that it creates twenty user ids as part of the installation process. To many people the automatic creation of a number for SQL Server User IDs is alarming, and they want to know what these IDs are for and who will be using them. The answer for who could be using those IDs, is just about anyone running R and they are going to be using the resources of the server to do it. If you are a DBA and want to figure out how to stop this, keep reading as I promise to tell you, after I provide some context about SQL Server and R internals.

R Server and Launchpad

SQLServerManagementConsoleWhen R Server is installed as part of SQL Server, one way you can check to see if it is installed is to look to see if the Launchpad service is running. When R code is running it does not run within SQL Server OS. It is by definition an external process and the Launchpad exe serves as a conduit between SQL Server and the space where R is running. If you want to know more about R and SQL Server Internals, this article I wrote for SQL Mag will provide a lot more details. Microsoft designed the Launchpad service so that other languages might someday also run as R does on SQL Server. It also supports a feature of R Server which I wrote about, context switching. Context Switching provides the ability for users to utilize Server memory instead of the memory on their computers for running R, and access is granted through the use of one of the twenty ids created when R is installed.

Launchpad Settings – Where the External Users are Referenced

launchpadThere are many reasons why a DBA might want to not allow clients to access server memory as that will tax the server. Turning it off is relatively simple. Go to the SQL Server Management Console and select SQL Server Launchpad for the instance of SQL Server running R Server.

In the picture of the screen, the instance of SQL Server I have running R Services is in SS2016. Right click on the server and select Properties, then click on the Advanced tab. When looking at the number of external users allowed by default, the number might look familiar. The reason there are twenty User IDs created for R Server is because Launchpad allocates by default external twenty users to connect from SQL Server to run R. If you don’t want to allow external users to run on a server, you will need to prevent the users from connecting by not enabling them to run R. To run R, users need to have db_rrerole permissions. If they do not have that, they cannot run R. On the production server, it is probably best that this permission not be granted to non-system users.

Since the External Users created are used by SQL Server when running R, it is not possible to set the number of external users to 0 as the Launchpad Service will not run, and no R Code can be executed anywhere. If the number of external users is modified, Configuration Manager provides a prompt window as a restart is required. If the number of External Users is set to 0, the Launchpad Service will not start. When the Launchpad Service tries to start, it will generate Error 1053: The Service Did Not Start in a Timely Fashion. The number of users has to be at least 1 for the service to be able to communicate with the external R components. If you add or reduce the number of External Users, the IDs will be either created or deleted to match the number listed.

Let me know if you found this information regarding SQL Server R Service information by commenting or messaging me on twitter. If you are interested in finding out more regarding the internals of SQL Server and R, you might be interested in reading this Article about the topic. I would also like to thank Bob Ward b | t of the Microsoft for helping me better understand the SQL Server R internals, and for patiently answering my questions on the topic.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

PolyBase – Another Method for Creating a Stretch Database in SQL Server 2016

PolyBase, which was released with SQL Server 2016, provides another method to access live data either locally or in the cloud, very similar to the SQL Server Stretch database feature. Polybase can also provide the ability to provide a more cost-effective availability for cold data, streamlines on-premises data maintenance, and keeps data secure even during migration. Polybase differs from Stretch database in a few ways, as the SQL must be different, the speed is noticeably slower, and it is a lot less expensive. The cost is significantly less because storing data in a Azure blob store starts at 1 cent a month and Stretch database starts at $2.50 an hour. In this post,I will show how to take data which was archived due to the age of the data, which was created in 2012 and store it in an Azure Blob Storage file which will be available via Polybase when I needed.

Implementing a PolyBase Stretch Database

PolyBase is an optional feature of SQL Server 2016, and the Instance Feature PolyBase Query Service for External Data needs to be selected as part of the installation process. Two services are ssmsexternalinstalled with the feature, SQL Server PolyBase Engine and SQL Server PolyBase Data Movement. Both of these services must be running and TCP/IP must be enabled for PolyBase to work. Either check for those services or run the query SELECT SERVERPROPERTY (‘IsPolybaseInstalled’) AS IsPolybaseInstalled; which will return a 1 when PolyBase has been installed. The next step is to tell SQL Server what the external source is by configuring which Hadoop Connectivity will be used.


EXEC sp_configure ‘hadoop connectivity’, 7;
GO
RECONFIGURE;

To use Azure Blob Storage, there are 3 different options, 1,4 and 7, which include the ability to access blob storage. These options also allow for various other Hadoop engines to be accessed as well. I have selected 7, which also allows me to use Hortonworks on Linux. For more information on the connectivity options, check out this link. A secure database ke is required to make the connection a well. The next step is to let SQL Server know where exactly the blob storage data is stored, which creates an entry in the External Data Sources. PolyBase needs to know how the data is formatted, which will be stored in an External File Format. By definition, Hadoop data has no schema, so a schema is going to have to be created and stored in an external table. The location for each of these items is stored within SQL Server Management studio as shown here.

This code will create a secure key using a password I made up

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'P@ssW0rdPolybase'

A PolyBase will need the Azure Blob Storage key to be able to make a connection.

CREATE DATABASE SCOPED CREDENTIAL AzureStorageCred
WITH IDENTITY = 'user', Secret = 'lEGL66LiK2KE2U0WEb435PH15BwFOInrHqQ1AJigjoRVfEOrOge+TLbBNu861cqbC+a26io92o5sw/b+OTD/C4w=='; --Note: Replace with your Blob Storage key


CREATE EXTERNAL DATA SOURCE AzureBlobStorage with (
TYPE = HADOOP,
LOCATION ='wasbs://gingeriottest@gingertestiotblobstore.blob.core.windows.net', --This is the location of the folder inside blob storage containing my data
CREDENTIAL = AzureStorageCred
);

The data is stored in blob storage as a CSV, so I will need to create that file Format

CREATE EXTERNAL FILE FORMAT CSV WITH (
FORMAT_TYPE = DELIMITEDTEXT
,FORMAT_OPTIONS (FIELD_TERMINATOR =',', DATE_FORMAT='MM/dd/yyyy') )

Lastly, a table definition must be created for the file so that it can be accessed as a table

CREATE External TABLE OntimePerformance2012sampleall(
[DimAirlineKey] int,
[DimOriginAirportKey] int,
[DimArrivalAirportKey] int,
[DimCancellationReasonKey] int,
[DimDelayLengthKey] int,
[DimDepartureBlockKey] int,
[DimArrivalBlockKey] int,
[DimDistanceGroupKey] int,
[FlightDateKey] int,
[FlightNumber] varchar(50),
[ScheduleDepartureTime] int,
[ActualDepartureTime] int,
[DepartureDelayInMinutes] int,
[TaxiOutTime] int,
[TaxiInTime] int,
[ScheduleArrivalTime] int,
[ActualArrivalTime] int,
[ArrivalDelayInMinutes] int,
[ScheduleElapsedTime] int,
[ActualElapsedTime] int ,
[DistanceInMiles] int,
[CarrierDelayInMinutes] int,
[WeatherDelayInMinutes] int,
[NASDelayInMinutes] int,
[SecurityDelayInMinutes] int
)

WITH
(   LOCATION = '/OntimePerformance2012.csv',
DATA_SOURCE = AzureBlobStorage,
FILE_FORMAT = CSV
)

 

All of the external tables created can be found in the DMV sys.external_tables or by looking at sys.tables where the new column is_external is set to 1.  Once the setup of PolyBase is complete, I can query the external and internal tables all at the same time, like I do here.

 

SELECT count(*)
, r.cancellationReason
, d.CalendarYear
FROM dbo.CancellationReason r
JOIN dbo.OntimePerformance2012sampleall c
ON c.dimcancellationReasonKey = r.CancellationReasonKey
JOIN dbo.DimDate d
ON c.flightDateKey = d.DateKey
GROUP BY r.cancellationReason, d.CalendarYear

PolyBase Table Considerations

It is not possible to edit the data in PolyBase, as the data is stored read only. Also since the data is being pointed to externally, it cannot be indexed and querying it is not terribly quick. All of the data will be brought down and then filtered, unless PolyBase is configured to improve performance, which I will discuss in a future blog post.

PolyBase and Azure Blob Storage provides the ability to access data stored offsite in a secure location in a cost effective manner. If you saw my presentation on this topic at PASS Summit or have questions how to make it work in your environment, please contact me on twitter or write a comment on my blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Monitoring Reports for R on SQL Server 2016

Whenever I talk about R, in the question and answer period I get a lot of questions from DBAs on how to monitor the impact of running R code on SQL Server. Fortunately Microsoft recently created a series of custom reports for SQL Server which provide a lot of useful information on how R is running on SQL Server. The custom reports are available on github here in a folder called SSMS-Custom-Reports. There are 8 different reports: R Services – Active Sessions, R Services – Configuration, R Services – Configure Instance, R Services – Execution Statistics, R Services – Extended Events, R Services – Packages, R Services – Resource Usage, R Services – Script. If it has been a while since you have added a custom report to SQL Server, this link should help you get the reports installed.

R Services Reports

The titles for the reports provide a good indication of what they do. Active Sessions will show how many users currently have their compute context set to run on the server. Configuration shows the current R Configuration, which is very useful information for checking on the R components on the server.

rconfigrpt

The report R Services – Configure Instance is for configuring R for the first time. If you have R setup already, chances are you won’t need this report, as it is really not a report but a script you need to run to configure R. R Services – Execution Statistics is a really neat way of monitoring the ongoing use of R. It keeps track of every time R is run on the server. If you run an R job then refresh the report, the execution count and time will go up.

sqlserverrrpt

R Services – Extended Events is also not a report but a list of all the extended events that are available for R Services. This is a handy bit of information, which can be a great reference tool for extended events monitoring. R Services – Packages lists the packages which are currently installed on SQL Server. When people write R, many lot of different packages are used within the script. Prior to running a package, check the information on this report to ensure the libraries used are installed on SQL Server. If the library is missing the code will not work. R Services – Resource Usage is a great way to see how R has been configured to run on the server. Notice I have created an External Pool for R. This is a configuration recommended by Microsoft to better monitor your R Services.

resourceusagerpt

The last report R Services – TSQL Script is a bit of a mystery. The screen contains this line of text. This is the T-SQL script for generating the data in report: No Report . I don’t know what was the intent of the report. The Readme.md in github provides this documentation about the reports.
SSMS Custom Reports

Custom reports for SQL Server R Services built using SQL Server Report Builder. The reports can be added to SQL Server Management Studio to view various configuration and runtime information about R scripts execution in SQL Server.

That doesn’t provide any clues to me as to what the intent of the report is, so you may want to skip installing it altogether. If anyone does figure out how this report can be used, please drop me a line as I am curious. I was really excited when I found these reports and I think that they provide a lot of valuable insight for monitoring R on SQL Server 2016.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Memory Usage in SQL Server for R

While R is an open source language, there are a number of different versions of R and each handles memory a little differently. Knowing which version is being used is important, especially when the code is going to be migrated to a server. As part of a SQL Server implementation, there are three different versions of R which come into play. The first is standard open source R, commonly known as CRAN R. This is the standard open source version of R which runs code in memory and is single threaded. The next version which will be installed as part of a SQL Server Installation is Microsoft R Open. This version of R was written to take advantage of the Intel Math Kernel Libraries [MLK]. Using the libraries speeds up many statistical calculations which use matrix operations. It also adds multi-threading capability to R as the rewrite provides the ability to use all available cores and processors and process in parallel. More information on how it works and how much faster Microsoft R Open is compared to standard R is available here. To use Microsoft R Open, once it is installed, in Rstudio should automatically start using it. To check out what version of R that is in use, within R Studio, go to Tools->Global Options and look at the R version.

rstudioversion

R studio here is using Microsoft R Open. To take a look at the version in Visual Studio which has R Tools installed, go to Rtools and Options. Microsoft R Open is open source and can be used by anyone to speed up the execution of their R code. The version of R which is implemented within SQL Server is not this version of R. The R server version is proprietary, and Microsoft is not giving it way for free, as to run it on a server you need to purchase either SQL Server 2016 or R Standalone. That version is R Server. There is full compatibility on all of the versions of R. Code written in either CRAN R or Microsoft R Open will work in R Server. To write code for R Server, the R Client needs to be installed. The Visual Studio Screen shows this version of R installed on a PC which also is running SQL Server 2016 Developer edition.

vsr

R and Memory Consumption

One of R’s strengths and weaknesses is the fact that R runs in memory. This is good thing because it means R is very fast. It is not such a good thing when you need more memory than your machine has. When reading through the product information surrounding Revolution Analytics’ version of R [Revo R] which Microsoft bought, one of the features is the ability of this version of R to be able to use not only memory but disk, by rxapplying chunking logic. The code is broken into smaller pieces, executed then put back together. Using Revo R, it’s possible to do this by using the specifically designed functions which all start with rx which implement this functionality, known as ScaleR. This does not mean that all R code running on SQL Server will be using these functions. Chances are most of it won’t, especially if the R code is being migrated from the existing data science team. Prior to being bought by Microsoft, Revolution Analytics charged money for their product. If your organization didn’t have a license, chances are no one was writing rx Functions. For people who don’t know anything about R, and have been asked to create packages to run R, there is an easy way to tell if the code has been written to use the chunking. If there are no words in the code that start with rx, like rxSummary, rxCube or rxHistogram to name a few, the code will not use chunking.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Creating New SSRS and SSIS Projects for SQL Server 2016

SSDTNow that SQL Server 2016 has been released, it is time to start creating new SSIS and SSRS projects for it. Since SQL Server 2014, SSIS has migrated to Visual Studio. The latest version, Visual Studio 2015, has a free Community edition, and can be found here. If you have it installed and try to create a new Reporting or Integration Services Project, you will notice that there are no templates listed which will allow you to create one of these projects.

Making SSIS and SSRS Projects for SQL Server 2016

To create SSIS projects in Visual Studio, you will need to click on this link to download the SQL Server Data Tools [SSDT] in the language of your choice. Visual Studio must not be running during the install. After about 5 minutes, when the install completes you will have a new application installed, SQL Server Data Tools 2015. You will still have the Visual Studio 2015 application as well, providing two methods for creating new packages. which means that you can click on this icon instead of opening up Visual studio. SSDT also contains the templates for database projects, so you can now start using Visual Studio.

Creating SSIS and SSRS Packages for Different Versions of SQL Server

Visual Studio SQL Server versioningIn this version of SQL Server Data Tools, Microsoft has finally addressed the common problem of needing to maintain multiple versions of SSIS packages for the different server versions. No longer do you need three different applications to maintain code for SQL Server 2012, 2014 and now 2016. All of these versions are supported with SSDT for Visual Studio 2015. SQL Server will detect which version the code was last saved in so that you don’t have to worry about accidently migrating code. You also have the ability to create an SSIS package in 2012, 2014 or 2016. To select the version you want, right click on the project and select Properties. Under Configuration Properties->General as shown in the picture, the TargetServerVersion, which defaults to SQL Server 2016, has a dropdown box making it possible to create a new package in Visual Studio 2015 for whatever version you need to support. Supporting the ability to write for different versions, is a great new feature and one which I am really happy is included in SSDT for Visual Studio 2015.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Resolving Errors Running R code on SQL Server

computer_With_ErrorSQL Server 2016 contains the ability to not only to run R code from within SQL Server Management Studio, but to also use an R client to run code which executes on SQL Server, using SQL Server’s memory instead of the client. To make this work the following must be loaded on your PC.

  • Open source R tools
  • Microsoft R Open
  • R Client
  • R Studio or Visual Studio 2015 (Pick one, I’m using Visual Studio)

 

For those people who have read most of the documentation out there to set up R on your PC, you will notice this is a longer list. There is a difference between just running R and running R on SQL Server. Why? Because R Server is not Open Source R but an enhanced version of R containing features which are not found in the open source version, including the ability to run R code on the SQL Server from within the R UI, which is R Studio or Visual Studio 2015.

SQL Server needs R Client 8.0.3

I was working on SQL Server 2016 on two different environments so I got two different errors. Running SQL Server 2016 Enterprise Edition on a Server I got the error [Microsoft][ODBC Driver Manager] Function sequence error. On my laptop, I received this error.

RInteractiveError

If you look at the code from the interactive window, you will notice that the error occurred with trying to run rxSummary. In both cases I didn’t get the error when I changed the compute context to SQL Server from local, but when I tried to run a function which runs on the server. In both cases the R tools where installed prior to installing SQL Server 2016. The Open Source R tools install to C:\Program Files\R\R-3.3.0 (your version number may be higher). The Microsoft R Open installs to C:\Program Files\Microsoft\MRO\R-3.2.5. To use the libraries needed for the RevoScaleR libraries included in R Server, the version of Microsoft R required is Microsoft RRE, which is installed here C:\Program Files\Microsoft\MRO-for-RRE\8.0. Unfortunately, SQL Server 2016 shipped with version 8.0.3 not 8.0.0. If you are getting data and using a local compute context, you will have no problems. However, when you want to change your compute context to run on SQL Server, you will get an error.

While I received a different error on the server than my laptop, the reason for both messages was the same. Neither computer was running version 8.0.0.3 of the R client tools. On the server I was able to fix the error without downloading a thing. After installing a stand-alone version of R Server from the SQL Server Installation Center, the error went away and I got results when trying to run rxSummary. Unfortunately, it was not possible for me to run R Server on my laptop, as R Server is disabled from within the Installation Center. I believe that is because I have SQL Server 2016 developer edition on a laptop, not on a server. I needed to do something else to make it work.

Problems with Installing R Client Tools

On June 6th, Microsoft released R Client Tools. This will install version 8.0.3 on the client so it will be compatible with SQL Server. Here’s the link. This is where it got tricky. In order, to get the tools, you need to have an id for Visual Studio. No problem, I have two Visual Studio Accounts, a work one and a non-work one. I was already logged in to my work computer, so I just clicked the link, and got this screen.

ScreenPrintNoTools

No downloads for me?! What does that mean. Well, it means it is broken. I could not get the client tools, so I could not resolve my problem. I wondered if this issue was unique to me so I asked someone else that I work with who has a Visual Studio account to click on the link and try to install it from his Visual Studio account. That didn’t work either. I emailed Microsoft, and I got an answer on a Saturday morning, which frankly shocked me. They told me that the link was working for them. At that point I read the screen more carefully. “To continue Please join Visual Studio Dev Essentials…”. That sounded like it could be a permissions issue on my account. Fortunately, I have two accounts, a work one and a personal one. I logged out of my work account and logged into my personal account. This is the picture of what the same paged looked like while logged into the other account.

WorkingRClientTooks

I have contacted Microsoft about this error, and they are looking into it. What I thought was interesting is that this update is instead of being freely available, it is account dependent. If you don’t have an account or as in my case, the account isn’t working correctly, the ability to use R on SQL Server is unavailable. While I understand that SQL Server 2016 is a brand new release, it is supposed to be ready to use. Unless you have R Client Tools, which may or may not be able to download depending upon your Visual Studio account.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

T-SQL Tuesday #79 – Creating R Code to run on SQL Server 2016

TSQL2SDAY-300x300

As SQL Server 2016 was recently released, many people have not yet had used R with SQL Server. I thought that T-SQL Tuesday would be a great way to introduce this topic.  This post contains everything you need to run your first R program from the UI, get data from SQL Server and run the R code on SQL Server from the UI. If you are running open source R, this code will not work. If you are using Microsoft R Open, this code will not work. Only if you are running the version of R which Microsoft released with SQL Server 2016 will this code work.

The Two Versions of Microsoft R

Microsoft has not one version of R, they have two but two. These two different versions are needed because they have two different purposes in mind. Microsoft R Open, is open source and fully R compatible and is faster than open source R because they rewrote a number of the algorithms to include multi-threaded math libraries. If you want to run R code on SQL Server, this is the not the version you want to use. You want to use the non-open source version designed to run on R Server, which is included with SQL Server 2016, Microsoft RRE Open. This version will run R code not only in memory but swap to disk, to create code which can access SQL Server data without needing to create a file, and can run code on the server from the client. The version of RRE Open which is included in SQL Server 2016 is 8.0.3.

Running R on SQL Server

As a handy mnemonic device, all the RRE functions start with Rx, like prescription drugs. None of these features will work in R, unless you are using the Microsoft RRE Open version. For more information on how to set up Visual Studio 2015 to use the correct libraries, please read my previous post for instructions.

SQL Server R Code Walk-through

This code was created on a PC with SQL Server 2016 Developer Edition installed with the R tools, and the Community Edition of Visual Studio 2015. On my SQL Server instance, I have created a database called TestR and loaded the sample file AirlineDemoSmall.csv included with R server as a table with the same name. If you have SQL Server 2016 installed, the real directory for the sample files can be found here

C:\Program Files\Microsoft SQL Server\130\R_SERVER\library\RevoScaleR\SampleData

The table dbo.AirlineDemoSmall has 600,000 rows. Prior to running this code, create a table on SQL Server to hold the data. The code will load the table data and using some Rx commands, load the data from SQL Server, run the code on the R Server, and draw a histogram.

library(RevoScaleR)


sqlConnString <- "Driver=SQL Server;Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"
sqlsampleTable <- "AirlineDemoSmall"
# Set ComputeContext.
sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)


rxSetComputeContext(serverside)

sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1, table = sqlsampleTable)
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)
rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)

Detailed Description of the R Code

To better understand each line of code, I provided the description for each line, along with some tips to resolve some possible erors.

library("RevoScaleR")

If you get an error running this line, chances are the R compiler doesn’t know where to find the library. Maybe you need to install it. If so run this command in the interactive window

install.packages('RevoScaleR')

If this command gives you an error, R can’t find where the library is. Resolve this issue by adding the path Run this command in the immediate window. Notice the slashes go the opposite way file explorer puts them

.libPaths(c(.libPaths(),"C:/Program Files/Microsoft SQL Server/130/R_SERVER/library"))

After setting the path, run the previous command to resolve the package, and then run the first line again, as this should resolve any previous errors.

sqlConnString <- "Driver=SQL Server; Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"

This line sets the value of the connection string. I am running SQL Server 2016 on my laptop, in an instance called SQLServer2016. Notice I had to put two slashes going the wrong way to set my connection. I have hard coded a user id and password in plain text. For test, I would use a window authentication, which does require an ODBC connection so that I would not have to put the user id and password in code in plain text.

sqlsampleTable <- "AirlineDemoSmall"

This line of code sets a variable to the name of the table created in SQL Server with the data from the csv file.

sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")

R needs a temporary directory to serialize the R objects when the connection is created, which I am creating here.

sqlWait <- TRUE

Setting the state to wait means that I am creating a blocking transaction which will prevent the later code from being run until this statement is complete. This is a good setting for testing and if you other commands which cannot be run until you have data, such as rxHistogram which requires the dataset to wait.

sqlConsoleOutput <- FALSE

Setting the console output to false decreases the amount of informational messages I get in the immediate window. Since the messages aren’t really that helpful as they show things like how many records were read at the time, I generally set it to false.

serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)

This line uses the Revo R function RxInSqlServer (remember unlike SQL case is important) to create a connection to SQL Server, using the variables we created earlier to a variable called serverside.

rxSetComputeContext(serverside)

Setting the compute context dictates where my code is going to run. If the compute context is set to local, I am going to run on my local PC. Since I set it to the variable I set connecting my SQL Server connection, this means all of my R code will be using the available memory on the SQL Server PC, not mine. Yes,this does mean that I can starve out the resources on the server, a topic I will address at a later time. Since I am running everything on my laptop it doesn’t matter, but it could.

sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1,
table = sqlsampleTable )

This line gets the data from SQL Server, using the connection string, and specifies what data to get. I could have used a query to get data as well, but in this case I grabbed everything from the table.

rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)

To validate that some data was retrieved, rxGetInfo shows the information retrieved from three rows. Why three rows? Because numRows = 3

rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)

One of the big strengths of R is the ability to create data visualizations, so I felt compelled to include the command which creates a Histogram. HistogramThe ~ (tilde) is in front of the column name CRSDepTime from the table AirlineDemoSmall, and the data comes from the variable sqlPlaneDS where all of the data was loaded.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Using Visual Studio to develop R for SQL Server 2016

As Microsoft released SQL Server 2016 on June 1, a lot of people are starting to investigate how to write R which will run in SQL Server rather than using their local machine. People who have a background in R will automatically migrate to R Studio, the open source UI that has been around for years, but there may be a reason to switch. Visual Studio 2015 Community is also an open source application which can be used to write R code, which is definitely worth investigating.

Which R tool should I use: R Studio or Visual Studio?

For those people who haven’t made the decision as far as which tool to use, let me offer two compelling reasons to pick Visual Studio [VS] instead of R Studio: Intellisense and Improved Debugging Tools. R studio does not have intellisense and it is not possible to debug your code by stepping through it in the manner that many developers of VS are already quite familiar. You will need to configure VS to use R tools, which are detailed below.

Configuring Visual Studio to Run R

Only Visual Studio 2015 can be configured to use R and you must be using a 64 bit operating system to load R tools. If you have a different version of VS, download it here. The next step is to download VS R Tools and lastly download Microsoft R Open. There are two versions of Microsoft R open, one for R Server 2016, which is the one you want if you plan to integrate R with SQL Server 2016, and the standard version of Microsoft R Open, which does not include any of the R Server features. If you like, you can use either version Microsoft R Open in R Studio as well. The standard version is only available for 64 bit platforms, but does include versions for Windows and various flavors of Linux, including Red Hat, SUSE, and Ubuntu. The R open for Microsoft R Server 2016 can be found here.***Update***On June 6, 2016, Microsoft released a new tool called R client. Installing the version of R found in the client 8.0.3 is required to match the version of R released with SQL Server 2016. It is required to log into Visual Studio to be able to access this R client link.

After the tools have been installed, they appear in VS under R Tools, as shown on my screen below. The VS environment looks no different, with the exception of the new menu item for R Tools. This really isn’t an IDE set up for writing R, yet. Time to fix that.

Visual Studio R Tools

Click on RTools->Data Science Settings and the screen goes from the standard VS screen shown above to anR configured VS environment tailored to writing  R code as it has the specific panes used when writing R, such as R interactive and R Plot.  If you want to move these screens around, or close the start page,  feel free to organize the windows in VS in the same manner as one does  when using VS for other development tasks and languages.

If you have multiple R versions loaded, or you just want to see how it works, go to RTools->Options and look at the R engine entry. This code be pointing to C:\Program Files\R\R-3.3.0 for the open source version of R, C:\Program Files\Microsoft\MRO\r-3.2.4 for the Microsoft Open R. For R with SQL Server 2016, after installing the R Client, the R engine needs to point to C:\Program Files\Microsoft SQL Server\130\R_SERVER, assuming you have the developer edition of SQL Server 2016 loaded on your PC. If you change this entry, you will need to restart VS.

 LocationforRToolsinSQLServer2016

After you click ok, it might be a good idea to check the intellisense settings for R. that can be done by going to Go to R Tools-> Editor Options-> Advanced.

Running R in SQL Server 2016

Now that I am using Microsoft’s Version of R, I can use the libraries which allow me to run on the server, which this R code allows me to do. My server name is called MyServer\SQLServer 2016. Notice that I need to put two slashes in my code to be able to connect to the server to be able to get to the SQLServer2016 instance.  To connect can use either a SQL login, or integrated Windows authentication. For this example I am using a SQL Server ID to access the data, and yes I do need to put the password in readable text within my code if I use that option. For Windows authentication, and ODBC account would be needed to connect. The user also needs SQL Server rights granted in order to run R code from within SQL Server. The command rxSetComputeContext(runonServer) changes the location the code will be run from my local machine to SQL Server 2016

library(RevoScaleR)

# Define the SQL connection string
connStr <- "Driver=SQL Server;Server=MYSERVER\\SQLSERVER2016;Database=Review;Uid=ReadData;Pwd=P@$$word"

# Set ComputeContext.
sqlShareDir <- paste("C:\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
runonServer <-  RxInSqlServer(connectionString = connStr, shareDir = sqlShareDir,
                    wait = sqlWait, consoleOutput = sqlConsoleOutput)
rxSetComputeContext(runonServer)

As this post hardly scratches the surface of running R code on SQL Server, I intend to cover more in greater detail in a later post. Please subscribe to my blog to be notified when my later post with more information on the specific coding techniques unique to running R in SQL Server 2016.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur