Articles

SQL Server R Services and the 20 User IDs

What is the Reason why installing R creates 20 User IDs?

If you have installed SQL, you may have noticed that it creates twenty user ids as part of the installation process. To many people the automatic creation of a number for SQL Server User IDs is alarming, and they want to know what these IDs are for and who will be using them. The answer for who could be using those IDs, is just about anyone running R and they are going to be using the resources of the server to do it. If you are a DBA and want to figure out how to stop this, keep reading as I promise to tell you, after I provide some context about SQL Server and R internals.

R Server and Launchpad

SQLServerManagementConsoleWhen R Server is installed as part of SQL Server, one way you can check to see if it is installed is to look to see if the Launchpad service is running. When R code is running it does not run within SQL Server OS. It is by definition an external process and the Launchpad exe serves as a conduit between SQL Server and the space where R is running. If you want to know more about R and SQL Server Internals, this article I wrote for SQL Mag will provide a lot more details. Microsoft designed the Launchpad service so that other languages might someday also run as R does on SQL Server. It also supports a feature of R Server which I wrote about, context switching. Context Switching provides the ability for users to utilize Server memory instead of the memory on their computers for running R, and access is granted through the use of one of the twenty ids created when R is installed.

Launchpad Settings – Where the External Users are Referenced

launchpadThere are many reasons why a DBA might want to not allow clients to access server memory as that will tax the server. Turning it off is relatively simple. Go to the SQL Server Management Console and select SQL Server Launchpad for the instance of SQL Server running R Server.

In the picture of the screen, the instance of SQL Server I have running R Services is in SS2016. Right click on the server and select Properties, then click on the Advanced tab. When looking at the number of external users allowed by default, the number might look familiar. The reason there are twenty User IDs created for R Server is because Launchpad allocates by default external twenty users to connect from SQL Server to run R. If you don’t want to allow external users to run on a server, you will need to prevent the users from connecting by not enabling them to run R. To run R, users need to have db_rrerole permissions. If they do not have that, they cannot run R. On the production server, it is probably best that this permission not be granted to non-system users.

Since the External Users created are used by SQL Server when running R, it is not possible to set the number of external users to 0 as the Launchpad Service will not run, and no R Code can be executed anywhere. If the number of external users is modified, Configuration Manager provides a prompt window as a restart is required. If the number of External Users is set to 0, the Launchpad Service will not start. When the Launchpad Service tries to start, it will generate Error 1053: The Service Did Not Start in a Timely Fashion. The number of users has to be at least 1 for the service to be able to communicate with the external R components. If you add or reduce the number of External Users, the IDs will be either created or deleted to match the number listed.

Let me know if you found this information regarding SQL Server R Service information by commenting or messaging me on twitter. If you are interested in finding out more regarding the internals of SQL Server and R, you might be interested in reading this Article about the topic. I would also like to thank Bob Ward b | t of the Microsoft for helping me better understand the SQL Server R internals, and for patiently answering my questions on the topic.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

PolyBase – Another Method for Creating a Stretch Database in SQL Server 2016

PolyBase, which was released with SQL Server 2016, provides another method to access live data either locally or in the cloud, very similar to the SQL Server Stretch database feature. Polybase can also provide the ability to provide a more cost-effective availability for cold data, streamlines on-premises data maintenance, and keeps data secure even during migration. Polybase differs from Stretch database in a few ways, as the SQL must be different, the speed is noticeably slower, and it is a lot less expensive. The cost is significantly less because storing data in a Azure blob store starts at 1 cent a month and Stretch database starts at $2.50 an hour. In this post,I will show how to take data which was archived due to the age of the data, which was created in 2012 and store it in an Azure Blob Storage file which will be available via Polybase when I needed.

Implementing a PolyBase Stretch Database

PolyBase is an optional feature of SQL Server 2016, and the Instance Feature PolyBase Query Service for External Data needs to be selected as part of the installation process. Two services are ssmsexternalinstalled with the feature, SQL Server PolyBase Engine and SQL Server PolyBase Data Movement. Both of these services must be running and TCP/IP must be enabled for PolyBase to work. Either check for those services or run the query SELECT SERVERPROPERTY (‘IsPolybaseInstalled’) AS IsPolybaseInstalled; which will return a 1 when PolyBase has been installed. The next step is to tell SQL Server what the external source is by configuring which Hadoop Connectivity will be used.


EXEC sp_configure ‘hadoop connectivity’, 7;
GO
RECONFIGURE;

To use Azure Blob Storage, there are 3 different options, 1,4 and 7, which include the ability to access blob storage. These options also allow for various other Hadoop engines to be accessed as well. I have selected 7, which also allows me to use Hortonworks on Linux. For more information on the connectivity options, check out this link. A secure database ke is required to make the connection a well. The next step is to let SQL Server know where exactly the blob storage data is stored, which creates an entry in the External Data Sources. PolyBase needs to know how the data is formatted, which will be stored in an External File Format. By definition, Hadoop data has no schema, so a schema is going to have to be created and stored in an external table. The location for each of these items is stored within SQL Server Management studio as shown here.

This code will create a secure key using a password I made up

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'P@ssW0rdPolybase'

A PolyBase will need the Azure Blob Storage key to be able to make a connection.

CREATE DATABASE SCOPED CREDENTIAL AzureStorageCred
WITH IDENTITY = 'user', Secret = 'lEGL66LiK2KE2U0WEb435PH15BwFOInrHqQ1AJigjoRVfEOrOge+TLbBNu861cqbC+a26io92o5sw/b+OTD/C4w=='; --Note: Replace with your Blob Storage key


CREATE EXTERNAL DATA SOURCE AzureBlobStorage with (
TYPE = HADOOP,
LOCATION ='wasbs://gingeriottest@gingertestiotblobstore.blob.core.windows.net', --This is the location of the folder inside blob storage containing my data
CREDENTIAL = AzureStorageCred
);

The data is stored in blob storage as a CSV, so I will need to create that file Format

CREATE EXTERNAL FILE FORMAT CSV WITH (
FORMAT_TYPE = DELIMITEDTEXT
,FORMAT_OPTIONS (FIELD_TERMINATOR =',', DATE_FORMAT='MM/dd/yyyy') )

Lastly, a table definition must be created for the file so that it can be accessed as a table

CREATE External TABLE OntimePerformance2012sampleall(
[DimAirlineKey] int,
[DimOriginAirportKey] int,
[DimArrivalAirportKey] int,
[DimCancellationReasonKey] int,
[DimDelayLengthKey] int,
[DimDepartureBlockKey] int,
[DimArrivalBlockKey] int,
[DimDistanceGroupKey] int,
[FlightDateKey] int,
[FlightNumber] varchar(50),
[ScheduleDepartureTime] int,
[ActualDepartureTime] int,
[DepartureDelayInMinutes] int,
[TaxiOutTime] int,
[TaxiInTime] int,
[ScheduleArrivalTime] int,
[ActualArrivalTime] int,
[ArrivalDelayInMinutes] int,
[ScheduleElapsedTime] int,
[ActualElapsedTime] int ,
[DistanceInMiles] int,
[CarrierDelayInMinutes] int,
[WeatherDelayInMinutes] int,
[NASDelayInMinutes] int,
[SecurityDelayInMinutes] int
)

WITH
(   LOCATION = '/OntimePerformance2012.csv',
DATA_SOURCE = AzureBlobStorage,
FILE_FORMAT = CSV
)

 

All of the external tables created can be found in the DMV sys.external_tables or by looking at sys.tables where the new column is_external is set to 1.  Once the setup of PolyBase is complete, I can query the external and internal tables all at the same time, like I do here.

 

SELECT count(*)
, r.cancellationReason
, d.CalendarYear
FROM dbo.CancellationReason r
JOIN dbo.OntimePerformance2012sampleall c
ON c.dimcancellationReasonKey = r.CancellationReasonKey
JOIN dbo.DimDate d
ON c.flightDateKey = d.DateKey
GROUP BY r.cancellationReason, d.CalendarYear

PolyBase Table Considerations

It is not possible to edit the data in PolyBase, as the data is stored read only. Also since the data is being pointed to externally, it cannot be indexed and querying it is not terribly quick. All of the data will be brought down and then filtered, unless PolyBase is configured to improve performance, which I will discuss in a future blog post.

PolyBase and Azure Blob Storage provides the ability to access data stored offsite in a secure location in a cost effective manner. If you saw my presentation on this topic at PASS Summit or have questions how to make it work in your environment, please contact me on twitter or write a comment on my blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Monitoring Reports for R on SQL Server 2016

Whenever I talk about R, in the question and answer period I get a lot of questions from DBAs on how to monitor the impact of running R code on SQL Server. Fortunately Microsoft recently created a series of custom reports for SQL Server which provide a lot of useful information on how R is running on SQL Server. The custom reports are available on github here in a folder called SSMS-Custom-Reports. There are 8 different reports: R Services – Active Sessions, R Services – Configuration, R Services – Configure Instance, R Services – Execution Statistics, R Services – Extended Events, R Services – Packages, R Services – Resource Usage, R Services – Script. If it has been a while since you have added a custom report to SQL Server, this link should help you get the reports installed.

R Services Reports

The titles for the reports provide a good indication of what they do. Active Sessions will show how many users currently have their compute context set to run on the server. Configuration shows the current R Configuration, which is very useful information for checking on the R components on the server.

rconfigrpt

The report R Services – Configure Instance is for configuring R for the first time. If you have R setup already, chances are you won’t need this report, as it is really not a report but a script you need to run to configure R. R Services – Execution Statistics is a really neat way of monitoring the ongoing use of R. It keeps track of every time R is run on the server. If you run an R job then refresh the report, the execution count and time will go up.

sqlserverrrpt

R Services – Extended Events is also not a report but a list of all the extended events that are available for R Services. This is a handy bit of information, which can be a great reference tool for extended events monitoring. R Services – Packages lists the packages which are currently installed on SQL Server. When people write R, many lot of different packages are used within the script. Prior to running a package, check the information on this report to ensure the libraries used are installed on SQL Server. If the library is missing the code will not work. R Services – Resource Usage is a great way to see how R has been configured to run on the server. Notice I have created an External Pool for R. This is a configuration recommended by Microsoft to better monitor your R Services.

resourceusagerpt

The last report R Services – TSQL Script is a bit of a mystery. The screen contains this line of text. This is the T-SQL script for generating the data in report: No Report . I don’t know what was the intent of the report. The Readme.md in github provides this documentation about the reports.
SSMS Custom Reports

Custom reports for SQL Server R Services built using SQL Server Report Builder. The reports can be added to SQL Server Management Studio to view various configuration and runtime information about R scripts execution in SQL Server.

That doesn’t provide any clues to me as to what the intent of the report is, so you may want to skip installing it altogether. If anyone does figure out how this report can be used, please drop me a line as I am curious. I was really excited when I found these reports and I think that they provide a lot of valuable insight for monitoring R on SQL Server 2016.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Memory Usage in SQL Server for R

While R is an open source language, there are a number of different versions of R and each handles memory a little differently. Knowing which version is being used is important, especially when the code is going to be migrated to a server. As part of a SQL Server implementation, there are three different versions of R which come into play. The first is standard open source R, commonly known as CRAN R. This is the standard open source version of R which runs code in memory and is single threaded. The next version which will be installed as part of a SQL Server Installation is Microsoft R Open. This version of R was written to take advantage of the Intel Math Kernel Libraries [MLK]. Using the libraries speeds up many statistical calculations which use matrix operations. It also adds multi-threading capability to R as the rewrite provides the ability to use all available cores and processors and process in parallel. More information on how it works and how much faster Microsoft R Open is compared to standard R is available here. To use Microsoft R Open, once it is installed, in Rstudio should automatically start using it. To check out what version of R that is in use, within R Studio, go to Tools->Global Options and look at the R version.

rstudioversion

R studio here is using Microsoft R Open. To take a look at the version in Visual Studio which has R Tools installed, go to Rtools and Options. Microsoft R Open is open source and can be used by anyone to speed up the execution of their R code. The version of R which is implemented within SQL Server is not this version of R. The R server version is proprietary, and Microsoft is not giving it way for free, as to run it on a server you need to purchase either SQL Server 2016 or R Standalone. That version is R Server. There is full compatibility on all of the versions of R. Code written in either CRAN R or Microsoft R Open will work in R Server. To write code for R Server, the R Client needs to be installed. The Visual Studio Screen shows this version of R installed on a PC which also is running SQL Server 2016 Developer edition.

vsr

R and Memory Consumption

One of R’s strengths and weaknesses is the fact that R runs in memory. This is good thing because it means R is very fast. It is not such a good thing when you need more memory than your machine has. When reading through the product information surrounding Revolution Analytics’ version of R [Revo R] which Microsoft bought, one of the features is the ability of this version of R to be able to use not only memory but disk, by rxapplying chunking logic. The code is broken into smaller pieces, executed then put back together. Using Revo R, it’s possible to do this by using the specifically designed functions which all start with rx which implement this functionality, known as ScaleR. This does not mean that all R code running on SQL Server will be using these functions. Chances are most of it won’t, especially if the R code is being migrated from the existing data science team. Prior to being bought by Microsoft, Revolution Analytics charged money for their product. If your organization didn’t have a license, chances are no one was writing rx Functions. For people who don’t know anything about R, and have been asked to create packages to run R, there is an easy way to tell if the code has been written to use the chunking. If there are no words in the code that start with rx, like rxSummary, rxCube or rxHistogram to name a few, the code will not use chunking.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Creating New SSRS and SSIS Projects for SQL Server 2016

SSDTNow that SQL Server 2016 has been released, it is time to start creating new SSIS and SSRS projects for it. Since SQL Server 2014, SSIS has migrated to Visual Studio. The latest version, Visual Studio 2015, has a free Community edition, and can be found here. If you have it installed and try to create a new Reporting or Integration Services Project, you will notice that there are no templates listed which will allow you to create one of these projects.

Making SSIS and SSRS Projects for SQL Server 2016

To create SSIS projects in Visual Studio, you will need to click on this link to download the SQL Server Data Tools [SSDT] in the language of your choice. Visual Studio must not be running during the install. After about 5 minutes, when the install completes you will have a new application installed, SQL Server Data Tools 2015. You will still have the Visual Studio 2015 application as well, providing two methods for creating new packages. which means that you can click on this icon instead of opening up Visual studio. SSDT also contains the templates for database projects, so you can now start using Visual Studio.

Creating SSIS and SSRS Packages for Different Versions of SQL Server

Visual Studio SQL Server versioningIn this version of SQL Server Data Tools, Microsoft has finally addressed the common problem of needing to maintain multiple versions of SSIS packages for the different server versions. No longer do you need three different applications to maintain code for SQL Server 2012, 2014 and now 2016. All of these versions are supported with SSDT for Visual Studio 2015. SQL Server will detect which version the code was last saved in so that you don’t have to worry about accidently migrating code. You also have the ability to create an SSIS package in 2012, 2014 or 2016. To select the version you want, right click on the project and select Properties. Under Configuration Properties->General as shown in the picture, the TargetServerVersion, which defaults to SQL Server 2016, has a dropdown box making it possible to create a new package in Visual Studio 2015 for whatever version you need to support. Supporting the ability to write for different versions, is a great new feature and one which I am really happy is included in SSDT for Visual Studio 2015.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Resolving Errors Running R code on SQL Server

computer_With_ErrorSQL Server 2016 contains the ability to not only to run R code from within SQL Server Management Studio, but to also use an R client to run code which executes on SQL Server, using SQL Server’s memory instead of the client. To make this work the following must be loaded on your PC.

  • Open source R tools
  • Microsoft R Open
  • R Client
  • R Studio or Visual Studio 2015 (Pick one, I’m using Visual Studio)

 

For those people who have read most of the documentation out there to set up R on your PC, you will notice this is a longer list. There is a difference between just running R and running R on SQL Server. Why? Because R Server is not Open Source R but an enhanced version of R containing features which are not found in the open source version, including the ability to run R code on the SQL Server from within the R UI, which is R Studio or Visual Studio 2015.

SQL Server needs R Client 8.0.3

I was working on SQL Server 2016 on two different environments so I got two different errors. Running SQL Server 2016 Enterprise Edition on a Server I got the error [Microsoft][ODBC Driver Manager] Function sequence error. On my laptop, I received this error.

RInteractiveError

If you look at the code from the interactive window, you will notice that the error occurred with trying to run rxSummary. In both cases I didn’t get the error when I changed the compute context to SQL Server from local, but when I tried to run a function which runs on the server. In both cases the R tools where installed prior to installing SQL Server 2016. The Open Source R tools install to C:\Program Files\R\R-3.3.0 (your version number may be higher). The Microsoft R Open installs to C:\Program Files\Microsoft\MRO\R-3.2.5. To use the libraries needed for the RevoScaleR libraries included in R Server, the version of Microsoft R required is Microsoft RRE, which is installed here C:\Program Files\Microsoft\MRO-for-RRE\8.0. Unfortunately, SQL Server 2016 shipped with version 8.0.3 not 8.0.0. If you are getting data and using a local compute context, you will have no problems. However, when you want to change your compute context to run on SQL Server, you will get an error.

While I received a different error on the server than my laptop, the reason for both messages was the same. Neither computer was running version 8.0.0.3 of the R client tools. On the server I was able to fix the error without downloading a thing. After installing a stand-alone version of R Server from the SQL Server Installation Center, the error went away and I got results when trying to run rxSummary. Unfortunately, it was not possible for me to run R Server on my laptop, as R Server is disabled from within the Installation Center. I believe that is because I have SQL Server 2016 developer edition on a laptop, not on a server. I needed to do something else to make it work.

Problems with Installing R Client Tools

On June 6th, Microsoft released R Client Tools. This will install version 8.0.3 on the client so it will be compatible with SQL Server. Here’s the link. This is where it got tricky. In order, to get the tools, you need to have an id for Visual Studio. No problem, I have two Visual Studio Accounts, a work one and a non-work one. I was already logged in to my work computer, so I just clicked the link, and got this screen.

ScreenPrintNoTools

No downloads for me?! What does that mean. Well, it means it is broken. I could not get the client tools, so I could not resolve my problem. I wondered if this issue was unique to me so I asked someone else that I work with who has a Visual Studio account to click on the link and try to install it from his Visual Studio account. That didn’t work either. I emailed Microsoft, and I got an answer on a Saturday morning, which frankly shocked me. They told me that the link was working for them. At that point I read the screen more carefully. “To continue Please join Visual Studio Dev Essentials…”. That sounded like it could be a permissions issue on my account. Fortunately, I have two accounts, a work one and a personal one. I logged out of my work account and logged into my personal account. This is the picture of what the same paged looked like while logged into the other account.

WorkingRClientTooks

I have contacted Microsoft about this error, and they are looking into it. What I thought was interesting is that this update is instead of being freely available, it is account dependent. If you don’t have an account or as in my case, the account isn’t working correctly, the ability to use R on SQL Server is unavailable. While I understand that SQL Server 2016 is a brand new release, it is supposed to be ready to use. Unless you have R Client Tools, which may or may not be able to download depending upon your Visual Studio account.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

T-SQL Tuesday #79 – Creating R Code to run on SQL Server 2016

R

TSQL2SDAY-300x300

As SQL Server 2016 was recently released, many people have not yet had used R with SQL Server. I thought that T-SQL Tuesday would be a great way to introduce this topic.  This post contains everything you need to run your first R program from the UI, get data from SQL Server and run the R code on SQL Server from the UI. If you are running open source R, this code will not work. If you are using Microsoft R Open, this code will not work. Only if you are running the version of R which Microsoft released with SQL Server 2016 will this code work.

The Two Versions of Microsoft R

Microsoft has not one version of R, they have two but two. These two different versions are needed because they have two different purposes in mind. Microsoft R Open, is open source and fully R compatible and is faster than open source R because they rewrote a number of the algorithms to include multi-threaded math libraries. If you want to run R code on SQL Server, this is the not the version you want to use. You want to use the non-open source version designed to run on R Server, which is included with SQL Server 2016, Microsoft RRE Open. This version will run R code not only in memory but swap to disk, to create code which can access SQL Server data without needing to create a file, and can run code on the server from the client. The version of RRE Open which is included in SQL Server 2016 is 8.0.3.

Running R on SQL Server

As a handy mnemonic device, all the RRE functions start with Rx, like prescription drugs. None of these features will work in R, unless you are using the Microsoft RRE Open version. For more information on how to set up Visual Studio 2015 to use the correct libraries, please read my previous post for instructions.

SQL Server R Code Walk-through

This code was created on a PC with SQL Server 2016 Developer Edition installed with the R tools, and the Community Edition of Visual Studio 2015. On my SQL Server instance, I have created a database called TestR and loaded the sample file AirlineDemoSmall.csv included with R server as a table with the same name. If you have SQL Server 2016 installed, the real directory for the sample files can be found here

C:\Program Files\Microsoft SQL Server\130\R_SERVER\library\RevoScaleR\SampleData

The table dbo.AirlineDemoSmall has 600,000 rows. Prior to running this code, create a table on SQL Server to hold the data. The code will load the table data and using some Rx commands, load the data from SQL Server, run the code on the R Server, and draw a histogram.

library(RevoScaleR)


sqlConnString <- "Driver=SQL Server;Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"
sqlsampleTable <- "AirlineDemoSmall"
# Set ComputeContext.
sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)


rxSetComputeContext(serverside)

sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1, table = sqlsampleTable)
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)
rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)

Detailed Description of the R Code

To better understand each line of code, I provided the description for each line, along with some tips to resolve some possible erors.

library("RevoScaleR")

If you get an error running this line, chances are the R compiler doesn’t know where to find the library. Maybe you need to install it. If so run this command in the interactive window

install.packages('RevoScaleR')

If this command gives you an error, R can’t find where the library is. Resolve this issue by adding the path Run this command in the immediate window. Notice the slashes go the opposite way file explorer puts them

.libPaths(c(.libPaths(),"C:/Program Files/Microsoft SQL Server/130/R_SERVER/library"))

After setting the path, run the previous command to resolve the package, and then run the first line again, as this should resolve any previous errors.

sqlConnString <- "Driver=SQL Server; Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"

This line sets the value of the connection string. I am running SQL Server 2016 on my laptop, in an instance called SQLServer2016. Notice I had to put two slashes going the wrong way to set my connection. I have hard coded a user id and password in plain text. For test, I would use a window authentication, which does require an ODBC connection so that I would not have to put the user id and password in code in plain text.

sqlsampleTable <- "AirlineDemoSmall"

This line of code sets a variable to the name of the table created in SQL Server with the data from the csv file.

sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")

R needs a temporary directory to serialize the R objects when the connection is created, which I am creating here.

sqlWait <- TRUE

Setting the state to wait means that I am creating a blocking transaction which will prevent the later code from being run until this statement is complete. This is a good setting for testing and if you other commands which cannot be run until you have data, such as rxHistogram which requires the dataset to wait.

sqlConsoleOutput <- FALSE

Setting the console output to false decreases the amount of informational messages I get in the immediate window. Since the messages aren’t really that helpful as they show things like how many records were read at the time, I generally set it to false.

serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)

This line uses the Revo R function RxInSqlServer (remember unlike SQL case is important) to create a connection to SQL Server, using the variables we created earlier to a variable called serverside.

rxSetComputeContext(serverside)

Setting the compute context dictates where my code is going to run. If the compute context is set to local, I am going to run on my local PC. Since I set it to the variable I set connecting my SQL Server connection, this means all of my R code will be using the available memory on the SQL Server PC, not mine. Yes,this does mean that I can starve out the resources on the server, a topic I will address at a later time. Since I am running everything on my laptop it doesn’t matter, but it could.

sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1,
table = sqlsampleTable )

This line gets the data from SQL Server, using the connection string, and specifies what data to get. I could have used a query to get data as well, but in this case I grabbed everything from the table.

rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)

To validate that some data was retrieved, rxGetInfo shows the information retrieved from three rows. Why three rows? Because numRows = 3

rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)

One of the big strengths of R is the ability to create data visualizations, so I felt compelled to include the command which creates a Histogram. HistogramThe ~ (tilde) is in front of the column name CRSDepTime from the table AirlineDemoSmall, and the data comes from the variable sqlPlaneDS where all of the data was loaded.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Using Visual Studio to develop R for SQL Server 2016

As Microsoft released SQL Server 2016 on June 1, a lot of people are starting to investigate how to write R which will run in SQL Server rather than using their local machine. People who have a background in R will automatically migrate to R Studio, the open source UI that has been around for years, but there may be a reason to switch. Visual Studio 2015 Community is also an open source application which can be used to write R code, which is definitely worth investigating.

Which R tool should I use: R Studio or Visual Studio?

For those people who haven’t made the decision as far as which tool to use, let me offer two compelling reasons to pick Visual Studio [VS] instead of R Studio: Intellisense and Improved Debugging Tools. R studio does not have intellisense and it is not possible to debug your code by stepping through it in the manner that many developers of VS are already quite familiar. You will need to configure VS to use R tools, which are detailed below.

Configuring Visual Studio to Run R

Only Visual Studio 2015 can be configured to use R and you must be using a 64 bit operating system to load R tools. If you have a different version of VS, download it here. The next step is to download VS R Tools and lastly download Microsoft R Open. There are two versions of Microsoft R open, one for R Server 2016, which is the one you want if you plan to integrate R with SQL Server 2016, and the standard version of Microsoft R Open, which does not include any of the R Server features. If you like, you can use either version Microsoft R Open in R Studio as well. The standard version is only available for 64 bit platforms, but does include versions for Windows and various flavors of Linux, including Red Hat, SUSE, and Ubuntu. The R open for Microsoft R Server 2016 can be found here.***Update***On June 6, 2016, Microsoft released a new tool called R client. Installing the version of R found in the client 8.0.3 is required to match the version of R released with SQL Server 2016. It is required to log into Visual Studio to be able to access this R client link.

After the tools have been installed, they appear in VS under R Tools, as shown on my screen below. The VS environment looks no different, with the exception of the new menu item for R Tools. This really isn’t an IDE set up for writing R, yet. Time to fix that.

Visual Studio R Tools

Click on RTools->Data Science Settings and the screen goes from the standard VS screen shown above to anR configured VS environment tailored to writing  R code as it has the specific panes used when writing R, such as R interactive and R Plot.  If you want to move these screens around, or close the start page,  feel free to organize the windows in VS in the same manner as one does  when using VS for other development tasks and languages.

If you have multiple R versions loaded, or you just want to see how it works, go to RTools->Options and look at the R engine entry. This code be pointing to C:\Program Files\R\R-3.3.0 for the open source version of R, C:\Program Files\Microsoft\MRO\r-3.2.4 for the Microsoft Open R. For R with SQL Server 2016, after installing the R Client, the R engine needs to point to C:\Program Files\Microsoft SQL Server\130\R_SERVER, assuming you have the developer edition of SQL Server 2016 loaded on your PC. If you change this entry, you will need to restart VS.

 LocationforRToolsinSQLServer2016

After you click ok, it might be a good idea to check the intellisense settings for R. that can be done by going to Go to R Tools-> Editor Options-> Advanced.

Running R in SQL Server 2016

Now that I am using Microsoft’s Version of R, I can use the libraries which allow me to run on the server, which this R code allows me to do. My server name is called MyServer\SQLServer 2016. Notice that I need to put two slashes in my code to be able to connect to the server to be able to get to the SQLServer2016 instance.  To connect can use either a SQL login, or integrated Windows authentication. For this example I am using a SQL Server ID to access the data, and yes I do need to put the password in readable text within my code if I use that option. For Windows authentication, and ODBC account would be needed to connect. The user also needs SQL Server rights granted in order to run R code from within SQL Server. The command rxSetComputeContext(runonServer) changes the location the code will be run from my local machine to SQL Server 2016

library(RevoScaleR)

# Define the SQL connection string
connStr <- "Driver=SQL Server;Server=MYSERVER\\SQLSERVER2016;Database=Review;Uid=ReadData;Pwd=P@$$word"

# Set ComputeContext.
sqlShareDir <- paste("C:\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
runonServer <-  RxInSqlServer(connectionString = connStr, shareDir = sqlShareDir,
                    wait = sqlWait, consoleOutput = sqlConsoleOutput)
rxSetComputeContext(runonServer)

As this post hardly scratches the surface of running R code on SQL Server, I intend to cover more in greater detail in a later post. Please subscribe to my blog to be notified when my later post with more information on the specific coding techniques unique to running R in SQL Server 2016.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Using R in SQL Server 2016

As SQL Server 2016 will be released soon and to follow up on the talk I gave at SQL Saturday Atlanta, this post will guide you through the steps needed to make R work. Like many features of SQL Server 2016, R is optionally installed.  To install R make sure that when you install R, the Option for R Services (In-Database) is checked, as shown below.

SQLServer2016_InstallR

Microsoft is incorporating the version of R sold by Revolution Analytics, which they called R Server. This version, while fully compatible with Open Source R, has some additional features which allow the R code to be run not only in memory, but use of a chunking technology to swap the data to disk so that the R code will not run out of memory. The commands to use this additional functionality all start with rx and are part of the proprietary ScaleR feature set. To use the R Server as a separate stand along product, instead of selecting the R Server in database option, select the R Server Standalone shared features. A R server could be useful if you want to perform large scale data analysis on a Hadoop Cluster, or other Non-SQL database like Teradata.

SQL Server Steps to Enable R

SQL Server 2016 is installed, especially if it was installed by others, you may be wondering if the R service really is installed. Take a look at the services running on the machine with SQL Server. If the SQL Server Launchpad service is running, as shown below, the R services are installed and running.  ServicesRunningR

The last thing needed to run R is to configure and restart the SQL Server Services. In a new query type the following command
sp_configure 'external scripts enabled', 1
GO
Reconfigure
GO

After restarting the SQL Server Service, SQL Server will now run R code. The following text can be run on any SQL Server 2016 instance to see if R has been configured correctly

EXEC sp_execute_external_script @language =N'R',
@script=N'OutputDataSet <-InputDataSet',
@input_data_1 =N'SELECT 1 as CheckToSeeIfRIsWorking'
WITH result sets (([CheckToSeeIfRIsWorking] int not null));
GO

The code is executed as an external script, specifying that the language used should be R. @script contains the R code, which is a simple command to take the mean of the data coming from the InputDataSet. @Input_Data_1 contains the location of the data to be processed. Of course the R code could of course be more complicated, but this code example is generic enough to test for everyone to ensure R is set up properly on SQL Server 2016.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

SQL Server 2016 Polybase Setup

After reading my last post you should be all ready to install SQL Server 2016 to be able to use Hadoop.  I went through all of these steps in my webinar, but I thought it might also be a good idea to include them here so you don’t have to watch the video.

Polybase Install Feature

When installing SQL Server in the Feature Selection List shown below,  PolyBase Query Service for External Data must be selected. SQL Server 2016 Polybase

To check to see if Polybase has been successfully installed, go to Control Panel->Administrative Tools->Services.  There are two services added for polybase, SQL Server PolyBase Data Movement and SQL Server Polybase Engine, as well as  a bunch of other new ones for SQL Server 2016. The polybase services and SQL Server will need to be restarted in a later step.  When starting SQL Server Management Studio for SQL Server 2016, it is hard to spot the differences between it and previous versions of SQL Server as even the icon looks exactly the same. Once you create a database for SQL Server, you will notice a slight difference under the tables folder. When polybase is installed there are two new folders, External Tables and External Resources. Data which is accessed via polybase will be stored under the External Files folder and the External Resources will contain the references to the external resources, the HDFS cluster where the data is stored, as well as the file formats of the underlying data.

Configuring Polybase on SQL Server

SQL Server needs to be configured to use polybase by using the sp_configure command

EXEC sp_configure ‚'hadoop connectivity', 5;
GO
RECONFIGURE; 

The number 5 indicates the kind of Hadoop connectivity desired.  If you were using HDInsight’s Azure Blob storage or Hortonworks on Windows, the number listed would be 4.  The number 5 indicates connectivity to Hortonworks Data Platform on Linux.  Go ahead and exit out of SQL Server as you will be restarting it soon.

Configuration File Modification

Open up your favorite text editor to modify the Hadoop.config file.  The default location is

C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Settings\Hadoop.config

Keep in mind, if you installed SQL Server with a named instance the path will include the name of that instance.  The config file contains a default password for Hadoop.  The password, which is initially set to pdw_user is probably a holdover from pre-2016 days when polybase was only available on Microsoft’s Big Data Appliance, the Analytics Platform System [APS] which was previously called Parallel Data Warehouse [PDW].  Hortonworks’ default password is hue, so you will want to modify the file so that the HadoopUserName has the right password, hue. I circled it below in a clip of the Hortonworks.config file.

Once the changes to the Hadoop.config file are saved, to get polybase to work, the last thing which must be done is to restart the two new polybase services and SQL Server 2016.

Setting up the External Data Source and External Data Files

At this point, we can now tell SQL Server where the Hadoop files are loaded. Open SQL Server Management Studio [SSMS] and create a new query to create the new an external data source, which is going to appear in the new folder, External Data Source.

CREATE EXTERNAL DATA SOURCE HDP2 WITH
(
TYPE = HADOOP,
LOCATION = 'hdfs://sandbox.hortonworks.com:8020'
)

After running this command and refreshing, the source HDP2 will appear in the folder External Data Source ->Data Sources

SQL Server needs to know what the underlying data file formats of the data stored in Hadoop. Assuming the data is stored in a tab delimited format, this command will tell SQL Server how to read the tab delimited data and the format of the dates.  This will allow polybase to read a sample file which comes with Hortonworks, the file SAMPLE_07

CREATE EXTERNAL FILE FORMAT TSV
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = '\t',
DATE_FORMAT = 'MM/dd/yyyy'
)
)

SQL Server needs a schema to read the data, as it doesn’t know where the fields are without it. The following code creates a table which will appear under the External Tables folder in SSMS as well as load the data by telling it where the data lives and to use the file format which we just added.

CREATE EXTERNAL TABLE Sample_07
(
code nvarchar(255),
description nvarchar(255),
total_emp int,
salary nvarchar(255)
)
WITH
(
LOCATION = '/apps/hive/warehouse/sample_07',
DATA_SOURCE = HDP2,
FILE_FORMAT = TSV,
REJECT_TYPE = value,
REJECT_VALUE=0
)

After this step is complete, you can use the new table to join to data on SQL Server from inside a HDFS cluster.

I hope this helps you get started with using polybase with SQL Server 2016.  Let me know what you think by posting a comment.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur