As SQL Server 2016 was recently released, many people have not yet had used R with SQL Server. I thought that T-SQL Tuesday would be a great way to introduce this topic. This post contains everything you need to run your first R program from the UI, get data from SQL Server and run the R code on SQL Server from the UI. If you are running open source R, this code will not work. If you are using Microsoft R Open, this code will not work. Only if you are running the version of R which Microsoft released with SQL Server 2016 will this code work.
The Two Versions of Microsoft R
Microsoft has not one version of R, they have two but two. These two different versions are needed because they have two different purposes in mind. Microsoft R Open, is open source and fully R compatible and is faster than open source R because they rewrote a number of the algorithms to include multi-threaded math libraries. If you want to run R code on SQL Server, this is the not the version you want to use. You want to use the non-open source version designed to run on R Server, which is included with SQL Server 2016, Microsoft RRE Open. This version will run R code not only in memory but swap to disk, to create code which can access SQL Server data without needing to create a file, and can run code on the server from the client. The version of RRE Open which is included in SQL Server 2016 is 8.0.3.
Running R on SQL Server
As a handy mnemonic device, all the RRE functions start with Rx, like prescription drugs. None of these features will work in R, unless you are using the Microsoft RRE Open version. For more information on how to set up Visual Studio 2015 to use the correct libraries, please read my previous post for instructions.
SQL Server R Code Walk-through
This code was created on a PC with SQL Server 2016 Developer Edition installed with the R tools, and the Community Edition of Visual Studio 2015. On my SQL Server instance, I have created a database called TestR and loaded the sample file AirlineDemoSmall.csv included with R server as a table with the same name. If you have SQL Server 2016 installed, the real directory for the sample files can be found here
C:\Program Files\Microsoft SQL Server\130\R_SERVER\library\RevoScaleR\SampleData
The table dbo.AirlineDemoSmall has 600,000 rows. Prior to running this code, create a table on SQL Server to hold the data. The code will load the table data and using some Rx commands, load the data from SQL Server, run the code on the R Server, and draw a histogram.
library(RevoScaleR)
sqlConnString <- "Driver=SQL Server;Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"
sqlsampleTable <- "AirlineDemoSmall"
# Set ComputeContext.
sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)
rxSetComputeContext(serverside)
sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1, table = sqlsampleTable)
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)
rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)
Detailed Description of the R Code
To better understand each line of code, I provided the description for each line, along with some tips to resolve some possible erors.
library("RevoScaleR")
If you get an error running this line, chances are the R compiler doesn’t know where to find the library. Maybe you need to install it. If so run this command in the interactive window
install.packages('RevoScaleR')
If this command gives you an error, R can’t find where the library is. Resolve this issue by adding the path Run this command in the immediate window. Notice the slashes go the opposite way file explorer puts them
.libPaths(c(.libPaths(),"C:/Program Files/Microsoft SQL Server/130/R_SERVER/library"))
After setting the path, run the previous command to resolve the package, and then run the first line again, as this should resolve any previous errors.
sqlConnString <- "Driver=SQL Server; Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"
This line sets the value of the connection string. I am running SQL Server 2016 on my laptop, in an instance called SQLServer2016. Notice I had to put two slashes going the wrong way to set my connection. I have hard coded a user id and password in plain text. For test, I would use a window authentication, which does require an ODBC connection so that I would not have to put the user id and password in code in plain text.
sqlsampleTable <- "AirlineDemoSmall"
This line of code sets a variable to the name of the table created in SQL Server with the data from the csv file.
sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
R needs a temporary directory to serialize the R objects when the connection is created, which I am creating here.
sqlWait <- TRUE
Setting the state to wait means that I am creating a blocking transaction which will prevent the later code from being run until this statement is complete. This is a good setting for testing and if you other commands which cannot be run until you have data, such as rxHistogram which requires the dataset to wait.
sqlConsoleOutput <- FALSE
Setting the console output to false decreases the amount of informational messages I get in the immediate window. Since the messages aren’t really that helpful as they show things like how many records were read at the time, I generally set it to false.
serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)
This line uses the Revo R function RxInSqlServer (remember unlike SQL case is important) to create a connection to SQL Server, using the variables we created earlier to a variable called serverside.
rxSetComputeContext(serverside)
Setting the compute context dictates where my code is going to run. If the compute context is set to local, I am going to run on my local PC. Since I set it to the variable I set connecting my SQL Server connection, this means all of my R code will be using the available memory on the SQL Server PC, not mine. Yes,this does mean that I can starve out the resources on the server, a topic I will address at a later time. Since I am running everything on my laptop it doesn’t matter, but it could.
sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1,
table = sqlsampleTable )
This line gets the data from SQL Server, using the connection string, and specifies what data to get. I could have used a query to get data as well, but in this case I grabbed everything from the table.
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)
To validate that some data was retrieved, rxGetInfo shows the information retrieved from three rows. Why three rows? Because numRows = 3
rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)
One of the big strengths of R is the ability to create data visualizations, so I felt compelled to include the command which creates a Histogram. The ~ (tilde) is in front of the column name CRSDepTime from the table AirlineDemoSmall, and the data comes from the variable sqlPlaneDS where all of the data was loaded.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur