Applying Data Science to SQL Server

Data has been getting a lot of attention in the business world for a while now. First there was big data, which was another way to store data so that later the data could be analyzed. Recently the talk has been all about analyzing the data with new tools such as R and Python. The reality is that people who have been working with databases doing work in business intelligence have been analyzing data for a while. Learning a different toolset for analyzing data is not such a big leap, but an expansion of what they know.

As the field is rapidly expanding now, and demand is huge, now is a great time to learn the tools. With the advent of more advanced software, it seems essential to create fast and reliable processing mechanisms. In addition, concepts such as Web3 seem to be developing and becoming realities at an accelerated rate. Data plays an essential role in the development of such marvels of information technology. It is high time companies consider combining traditional data science tools, like MATLAB, Apache Spark, and SQL, with modern tools, like web3 sql. This could help them organize and process the data at relative speeds while reducing data analytics costs.

Traditional Data Science Development

Data scientist have created analysis solutions with data for a number of years. The data is analyzed, cleaned, processed with various algorithms, and results are created. When the process is complete, code has been created to provide meaning from a portion of the data and is ready to be migrated to production. Traditionally there has been a big gap between creating a solution and implementing the solution to be run against data on a regular basis. Data Scientists traditionally are not part of the IT organization, they are actuaries or analysts, not the people who have anything to do with system processing. Recently I did some work for a company and after the data scientists were done creating a solution, they turned over all of their code to the Java team. Six weeks later the code was released into production. This solution made no one happy. Management thought it took too long. The data scientist didn’t believe that the code that they created was what was implemented into production, and the java developers were tired of people blaming them for wrong code which required a long time to implement.

SQL Server Implementation of Data Science

Since SQL Server 2016 incorporates R and SQL Server 2017 has added the ability to include Python code into SQL Server, data science solutions can be incorporated as part of a scheduled process with SQL Server. There is now a dev ops solution for incorporating R and Python into SQL Server. One way of learning about the technology is through blogs and other online training which can help you get up to speed. Many times though there is no substitute for hands on learning. If you are attending PASS Summit 2017, and want to learn not only about data science, but how to incorporate it into SQL Server, I hope you can sign up for my all day training session on Applied Data Science for the SQL Server Professional. I hope to see you there.

I have recently created a You Tube channel where I plan on sharing more data related content where I have included my first video about this conference.

If you are at PASS Summit, please introduce yourself as I would love to meet people who read my blog personally.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

What to do when SQL Server 2016 R Installation Does Not Work

I was recently helping a customer with their R installation.  After R was installed successfully, I went to the SQL Server Management Studio and ran the script needed to configure external services

EXEC sp_configure  'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE
GO
RECONFIGURE;
GO

After this script ran successfully, I closed SSMS, and restarted the SQL Server service, noting when I did so that it told me that it was also going to be restarting SQL Server Launchpad, which is installed as part of R Server, was also going to be restarted.  Everything looked good until I tried to run my first R command.  I went for the super generic script

EXEC sp_execute_external_script @language=N'R'
,@script=N'OutputDataSet <-InputDataSet'
,@input_data_1=N'SELECT 1 as CheckToSeeIfRIsWorking'
WITH RESULT SETS
(
(
[CheckToSeeIfRIsWorking] INT NOT NULL
)
);

Much to my surprise after this I received an error

Msg 39019, Level 16, State 1, Line 1
An external script error occurred:
Unable to launch the runtime. ErrorCode 0x80070490: 1168(Element not found.).
Msg 11536, Level 16, State 1, Line 1
EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.

I looked in the log files and didn’t find any errors.  I checked the configuration manager to ensure that I had some user ids configured in the configuration manager.  Nothing seemed to make any difference.  Looking online, the only error that I saw which might possibly be close was a different error message about 8.3 naming and the working directory.

Resolving Error Code 0x80070490 Unable to launch the runtime for R in SQL Server

When I saw this error, I first checked the various SQL Log files. I didn’t find any errors.  I checked the configuration manager to ensure that I had some user ids configured in the configuration manager.  The default 20 users were there as expected. Nothing seemed to make any difference.  Looking online, the only error that I saw which might possibly be close was a different error message about 8.3 naming and the working directory.  I ran into a similar problem with R and Power BI.  R natively doesn’t understand anything but directories which are only 8 characters long.  If there are more letters than this in a directory, which is always the case with the default SQL Server directories, internally the directory is read as the first 8 characters and a tilde(~). This particular server also had an installation of SQL Server 2014 on it, and if you looked at the filename, it was impossible to tell which one was which as the directory read like this C:\PROGRA~1\MICROS~2\MSSQL1~1.MSS\MSSQL\EXTENS~1    I created a new directory with short names D:\Rwork and changed the working directory to the new subdirectory Rwork by updating the rlauncher.config file and restarting the SQL Server Launchpad Service.  I was really hopeful that this would work.  But when I ran the external script listed above, it still did not work.  What had I done wrong?  I didn’t delete the previous working directory, which was in the default location C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\ExtensibilityData . When SQL Server creates this subdirectory, it also includes subdirectories for all of the users.  When I created the new directory, it didn’t have those subdirectories added.  I copied the subdirectories from the old working directory to the new one and tried again.  This time, success!

I hope you find this post helpful, as I know I could have used it when I was trying to figure this out. If it saved you time too, drop me a line and let me know.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

SQL Server R Services and the 20 User IDs

What is the Reason why installing R creates 20 User IDs?

If you have installed SQL, you may have noticed that it creates twenty user ids as part of the installation process. To many people the automatic creation of a number for SQL Server User IDs is alarming, and they want to know what these IDs are for and who will be using them. The answer for who could be using those IDs, is just about anyone running R and they are going to be using the resources of the server to do it. If you are a DBA and want to figure out how to stop this, keep reading as I promise to tell you, after I provide some context about SQL Server and R internals.

R Server and Launchpad

SQLServerManagementConsoleWhen R Server is installed as part of SQL Server, one way you can check to see if it is installed is to look to see if the Launchpad service is running. When R code is running it does not run within SQL Server OS. It is by definition an external process and the Launchpad exe serves as a conduit between SQL Server and the space where R is running. If you want to know more about R and SQL Server Internals, this article I wrote for SQL Mag will provide a lot more details. Microsoft designed the Launchpad service so that other languages might someday also run as R does on SQL Server. It also supports a feature of R Server which I wrote about, context switching. Context Switching provides the ability for users to utilize Server memory instead of the memory on their computers for running R, and access is granted through the use of one of the twenty ids created when R is installed.

Launchpad Settings – Where the External Users are Referenced

launchpadThere are many reasons why a DBA might want to not allow clients to access server memory as that will tax the server. Turning it off is relatively simple. Go to the SQL Server Management Console and select SQL Server Launchpad for the instance of SQL Server running R Server.

In the picture of the screen, the instance of SQL Server I have running R Services is in SS2016. Right click on the server and select Properties, then click on the Advanced tab. When looking at the number of external users allowed by default, the number might look familiar. The reason there are twenty User IDs created for R Server is because Launchpad allocates by default external twenty users to connect from SQL Server to run R. If you don’t want to allow external users to run on a server, you will need to prevent the users from connecting by not enabling them to run R. To run R, users need to have db_rrerole permissions. If they do not have that, they cannot run R. On the production server, it is probably best that this permission not be granted to non-system users.

Since the External Users created are used by SQL Server when running R, it is not possible to set the number of external users to 0 as the Launchpad Service will not run, and no R Code can be executed anywhere. If the number of external users is modified, Configuration Manager provides a prompt window as a restart is required. If the number of External Users is set to 0, the Launchpad Service will not start. When the Launchpad Service tries to start, it will generate Error 1053: The Service Did Not Start in a Timely Fashion. The number of users has to be at least 1 for the service to be able to communicate with the external R components. If you add or reduce the number of External Users, the IDs will be either created or deleted to match the number listed.

Let me know if you found this information regarding SQL Server R Service information by commenting or messaging me on twitter. If you are interested in finding out more regarding the internals of SQL Server and R, you might be interested in reading this Article about the topic. I would also like to thank Bob Ward b | t of the Microsoft for helping me better understand the SQL Server R internals, and for patiently answering my questions on the topic.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Calculated Tables and Role-Playing Dimensions

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Working with role playing dimensions, which are found when you have say multiple dates in a table and you want to relate them back to a single date table, have always been problematic in SQL Server Analysis Services Tabular. Tabular models only allow one active relationship to a single column at a time. The picture on the left shows how tabular models represent a role playing dimension, and the model on the right is the recommended method for how to model the relationships in Analysis Services Tabular as then users can filter the data on a number of different date tables.

TabularRolePlaying dimension Modeling

 

RolePlayingDimension

The big downside to this is one has to import the date table into the model multiple times, meaning the same data is imported again and again. At least that was the case until SQL Server 2016 was released. This weeks TSQL topic Fixing Old Problems with Shiny New Toys is really good reason to describe a better way of handling this problem.

 

Calculated Columns: The solution for Role Playing Dimensions

SQL Server 2016 provides a new method of solving the role playing dimension problem, using a calculated column. Instead of copying in the source from the date table, instead create a formula to get a copy. First switch to the data view, of the model. Then select Table->New Calculated Table. ThSSASScreenCalcTablee screen will change to the new table screen and the cursor will be pointed to the formula.

In my model I have one table called date. I am going to add a calculated table called order date. The DAX is couldn’t be simpler. Just select the table named ‘Date’ which is shown in the picture below. Rename the table to something more meaningful, like Order Date and that is it. The modeling required is the same, but now the model size does not increase to accommodate all of the date tables needed, as there is only one copy of the date table referenced multiple times. If you are using Power BI this same concept can be used for handling role playing dimensions as well.

SQL Server 2016 had a lot of great new features, and in addition to the flashy ones like R there are a lot of great enhancements to the Tabular model that are worth investigating as well.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Context Switching in R Server

R code tends to be very memory intensive as R processes primarily in memory. If you want R to perform well, you want as much memory as you can get your hands on to run your code, especially with larger datasets. This is a problem as many individual laptops have pitifully low memory capacity, and unless you have a computer with say as much memory as you can put in this one, if you are analyzing large datasets you may run out of memory. If a new computer is not in the budget, why not develop on the server? You may be thinking that there is no way the administrator of the box is going to provide you the means to be able to use the server memory. Well, if you have a SQL Server 2016 with R Server installed, chances are you can use the memory capacity of the server by connecting your R process to run on it from your computer, without the need to install anything on the server.

Microsoft’s R Server contains some specialized functions which are not part of the standard CRAN R installation. One of the ScaleR functions, RxInSqlServer will allow code to be processed on the server from the client. To make this work, you must have R Server and R Client installed. If you are doing a test on a local machine, you will need both R Client and R Server installed on that computer.

How to use the Server Memory not Local memory for Running R

If you are developing R in your IDE of choice, either R Studio or Visual Studio with R Tools, here is the code you need to make that work, which includes code running on the server

#First you will need to install ('RevoScaleR') if not there already as context switching is included in that library
if (!require("RevoScaleR")) {
install.packages("RevoScaleR")
}
#Load the library
library(RevoScaleR)
#Create a connection to your SQL Server 2016 server instance. Note the double slashes which I needed to identify the instance name
#
sqlConnString <- "Driver=SQL Server;Server=DevSQLServer\\SS2016;Database=TestR;Uid=ReadDataID;Pwd=readd@t@!!!"
#Set the variable containing RxInSQLServer. Note All specific R Server libraries start with Rx
#
serverside #Set the Compute context to SQL server. After this the code will run using Server Memory, not local memory
#
rxSetComputeContext(serverside)
#Check to see what the compute context is. Not this is for informational purposes. You do not need to do this to make anything work.
#
rxGetComputeContext()
#If you want to change the compute context back to your computer run this command
#rxSetComputeContext("local")
#Until the context is switched back, I am now running on the server, not locally.
#Here I am going to take a look at a table in my TestR database called AirlineDemoSmall
#
sqlsampleTable <- "AirlineDemoSmall"
#
sqlPlaneDS<- RxSqlServerData(connectionString = sqlConnString, verbose = 1,table = sqlsampleTable )
#To take a look at the content of the data, I am going to take a look at 30 rows in table in my TestR database called AirlineDemoSmall
#
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 10)
#To visually investigate the data, this command will plot a histogram displaying the frequencies of values in #one of the columns, CRSDepTime
#
rxHistogram(~CRSDepTime, data = sqlPlaneDS)

Here’s the output I get back in the R interactive Window.

Data Source: SQLSERVER
Number of variables: 3
Variable information:
Var 1: ArrDelay, Type: character
Var 2: CRSDepTime, Type: numeric
Var 3: DayOfWeek, Type: character
Data (10 rows starting with row 1):
ArrDelay CRSDepTime DayOfWeek
1       -14 16.283333   Monday
2       -1   6.166667   Monday
3       -2   7.000000   Monday
4         0 10.266666   Monday
5         0 13.483334   Monday
6       -10 16.833334   Monday
7       -10 19.949999   Monday
8       350 14.650001   Monday
9       292   9.416667   Monday
10       M   6.000000   Monday

RxHistogram

RxHistogram

Let me know if you found this post helpful, by posting a comment. Thanks also to Mario, who asked me about context switching which gave me the idea to answer his questions on this site. If you are interested in seeing more information about SQL Server and R, please subscribe as I tend to answer more of the questions I receive here.

 

 

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Questions and Answers on Machine Learning with R

opensourcerlogoRecently, I did a webinar on Machine Learning and R. I received a number of questions during the presentation. Due to time constraints, I was unable to answer all of them, so I have provided the Question and Answers here.

Question: Can I Use R in SQL Server to plot non-linear regression curves? We use IC50 and others in Michaelis-Menten kinetics for bio-chemical work.

R running on SQL Server provides the functionality of standard CRAN R packages with the additional capability to run the SCALER functions provided by SQL Server’s implementation of R. Any other functionality performed in R can therefore also be performed on SQL Server. Like all R code, you will need to install the required R libraries in the appropriate subdirectory. The directory is something like C:\Program Files\Microsoft SQL Server\MSSQL13.<instancename>\R_SERVICES\library

Question: Can you execute/invoke an external R script from T-SQL?

Yes. With SQL Server 2016 it is possible to run an R script from within T-SQL by using the sp_execute_external_script command. This link should help you with more detailed information on how to execute R scripts within T-SQL

Question: Is it possible to run R processes in diffrent boxes other than SQL Server itself for scalability reasons?

You have the option of installing the R Server on another server. Just keep in mind that you do have to account for the additional overhead of moving all the data over the network, which needs to weigh in on your decision to move processing to a different server.

Question: Can we join data generated from an R script to SQL data directly or does it have to be inserted into a table first?

Data generated from an R script interacts with the data the same way other data in a stored procedure does. It is not necessary to write the data to a table for the data to be processed.

Question: When would you use R Server versus R on SQL Server?

R on SQL server is an implementation of R server. When running R on SQL Server, R Server is running.

Question: What is the maximum file size I can load on R server?

R Server uses SCALR which provides the ability to not only use memory but also use disk storage, providing nearly limitless capability for file size processing.

Question: Is knitr integrated or can it be integrated into R for SQL Server?

While you can install the knitr library on SQL Server to implement the functionality provided, there is no direct integration path for incorporating the wide functionality knitr provides within SQL Server. For more information about knitr, please see this link.

Question: Does MS R Open have all the functionalities of CRAN R? Does it support all packages?

MS R Open is fully CRAN R compliant. All code which runs in CRAN R can run in MS R Open and all packages are supported. Since MS R Open was rewritten to use the Intel Math Kernel Libraries, it provides technical advantages to CRAN R. MS R Open is partially Multi-threaded, instead of single threaded like CRAN R, and is up to 38% faster

Question: Do you have a link for the Microsoft Machine learning cheat sheet?

Sure. The link can be found here.

Question: Why Use Microsoft R Open over other Analytics Tools such as Open R, Python, Matlab?

Microsoft R Open is fully CRAN R compliant, and also multi-threaded and faster, providing a clear benefit over CRAN R. Python is a great tool for data analytics, but unlike R, it is not designed solely for statistically analysis but has a wider functionality scope. R is focused solely on providing statistical data analysis. Matlab is a great tool, but given its complicated user interface and high licensing cost, many users may gravitate towards R, as it is free and there is a lot of good support for learning R available online as well.

Question: Can R be used with older versions of SQL Server or only 2016?

While it is possible to use R with other versions of SQL Server using tools such as RODBC, the R Server and full integration of R in SQL Server is only available in SQL Server 2016.

Question: Do I need all R server and R client and Microsoft R Open to be installed to run R from SQL Server?

To run R on SQL Server, R Server needs to be installed. For more information on how to install R on SQL Server please see this link. To connect to the server and use the SCALR functionality, the client machine will need to have R Client installed.

Question: What are the benefits of Microsoft Standalone R Server?

R Standalone Server is available to connect to data stored in HDFS and Teradata in addition to SQL Server. To incorporate data from other data sources and provide the ability to process more data than one has memory, it might make a lot of sense to deploy a standalone R Server.

Question: Are there any memory limitations in R when handling big data?

R is designed to run in memory, meaning if you have more data than memory you may run out of memory. When running using the SCALR functions, memory and disk are used to provide nearly limitless ability to process data.

Question: Can one call R from Python?

While it may be possible to create Python code which calls R, I am unaware of the syntax. This functionality would not be supported in the context of SQL Server.

Question: Can you use R charts in SSRS?

There are two different ways to incorporate R charts in SQL Server. Since a stored procedure can provide and R visualization, it is possible to use SSRS to call a stored procedure which would create a graphic to be rendered in SSRS. Also since Microsoft announced the incorporation of Power BI with SSRS, and Power BI provides the ability to create R graphics, incorporating Power BI which renders R would be another way to use charges in SSRS

Question: Any classes like “R for the nervous DBA” that you can recommend? I have a statistician who is running R on my SQL server.

I do not know of any R classes specifically designed for SQL Server DBAs, but I am working on a post for additional configuration information which you might find helpful. Please check back in a few weeks for updates.

Question: What is a Pirate’s favorite language?

RRRRRR Mad Props to Phillip for sending me a pirate joke. I love pirate jokes. I feel compelled to tell a few when I am presenting R and I have been told all of mine are really bad.

If you have any additional questions about SQL Server and R or any pirate jokes, please comment on this post or send me a message via twitter @desertislesql.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

PolyBase – Another Method for Creating a Stretch Database in SQL Server 2016

PolyBase, which was released with SQL Server 2016, provides another method to access live data either locally or in the cloud, very similar to the SQL Server Stretch database feature. Polybase can also provide the ability to provide a more cost-effective availability for cold data, streamlines on-premises data maintenance, and keeps data secure even during migration. Polybase differs from Stretch database in a few ways, as the SQL must be different, the speed is noticeably slower, and it is a lot less expensive. The cost is significantly less because storing data in a Azure blob store starts at 1 cent a month and Stretch database starts at $2.50 an hour. In this post,I will show how to take data which was archived due to the age of the data, which was created in 2012 and store it in an Azure Blob Storage file which will be available via Polybase when I needed.

Implementing a PolyBase Stretch Database

PolyBase is an optional feature of SQL Server 2016, and the Instance Feature PolyBase Query Service for External Data needs to be selected as part of the installation process. Two services are ssmsexternalinstalled with the feature, SQL Server PolyBase Engine and SQL Server PolyBase Data Movement. Both of these services must be running and TCP/IP must be enabled for PolyBase to work. Either check for those services or run the query SELECT SERVERPROPERTY (‘IsPolybaseInstalled’) AS IsPolybaseInstalled; which will return a 1 when PolyBase has been installed. The next step is to tell SQL Server what the external source is by configuring which Hadoop Connectivity will be used.


EXEC sp_configure ‘hadoop connectivity’, 7;
GO
RECONFIGURE;

To use Azure Blob Storage, there are 3 different options, 1,4 and 7, which include the ability to access blob storage. These options also allow for various other Hadoop engines to be accessed as well. I have selected 7, which also allows me to use Hortonworks on Linux. For more information on the connectivity options, check out this link. A secure database ke is required to make the connection a well. The next step is to let SQL Server know where exactly the blob storage data is stored, which creates an entry in the External Data Sources. PolyBase needs to know how the data is formatted, which will be stored in an External File Format. By definition, Hadoop data has no schema, so a schema is going to have to be created and stored in an external table. The location for each of these items is stored within SQL Server Management studio as shown here.

This code will create a secure key using a password I made up

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'P@ssW0rdPolybase'

A PolyBase will need the Azure Blob Storage key to be able to make a connection.

CREATE DATABASE SCOPED CREDENTIAL AzureStorageCred
WITH IDENTITY = 'user', Secret = 'lEGL66LiK2KE2U0WEb435PH15BwFOInrHqQ1AJigjoRVfEOrOge+TLbBNu861cqbC+a26io92o5sw/b+OTD/C4w=='; --Note: Replace with your Blob Storage key


CREATE EXTERNAL DATA SOURCE AzureBlobStorage with (
TYPE = HADOOP,
LOCATION ='wasbs://gingeriottest@gingertestiotblobstore.blob.core.windows.net', --This is the location of the folder inside blob storage containing my data
CREDENTIAL = AzureStorageCred
);

The data is stored in blob storage as a CSV, so I will need to create that file Format

CREATE EXTERNAL FILE FORMAT CSV WITH (
FORMAT_TYPE = DELIMITEDTEXT
,FORMAT_OPTIONS (FIELD_TERMINATOR =',', DATE_FORMAT='MM/dd/yyyy') )

Lastly, a table definition must be created for the file so that it can be accessed as a table

CREATE External TABLE OntimePerformance2012sampleall(
[DimAirlineKey] int,
[DimOriginAirportKey] int,
[DimArrivalAirportKey] int,
[DimCancellationReasonKey] int,
[DimDelayLengthKey] int,
[DimDepartureBlockKey] int,
[DimArrivalBlockKey] int,
[DimDistanceGroupKey] int,
[FlightDateKey] int,
[FlightNumber] varchar(50),
[ScheduleDepartureTime] int,
[ActualDepartureTime] int,
[DepartureDelayInMinutes] int,
[TaxiOutTime] int,
[TaxiInTime] int,
[ScheduleArrivalTime] int,
[ActualArrivalTime] int,
[ArrivalDelayInMinutes] int,
[ScheduleElapsedTime] int,
[ActualElapsedTime] int ,
[DistanceInMiles] int,
[CarrierDelayInMinutes] int,
[WeatherDelayInMinutes] int,
[NASDelayInMinutes] int,
[SecurityDelayInMinutes] int
)

WITH
(   LOCATION = '/OntimePerformance2012.csv',
DATA_SOURCE = AzureBlobStorage,
FILE_FORMAT = CSV
)

 

All of the external tables created can be found in the DMV sys.external_tables or by looking at sys.tables where the new column is_external is set to 1.  Once the setup of PolyBase is complete, I can query the external and internal tables all at the same time, like I do here.

 

SELECT count(*)
, r.cancellationReason
, d.CalendarYear
FROM dbo.CancellationReason r
JOIN dbo.OntimePerformance2012sampleall c
ON c.dimcancellationReasonKey = r.CancellationReasonKey
JOIN dbo.DimDate d
ON c.flightDateKey = d.DateKey
GROUP BY r.cancellationReason, d.CalendarYear

PolyBase Table Considerations

It is not possible to edit the data in PolyBase, as the data is stored read only. Also since the data is being pointed to externally, it cannot be indexed and querying it is not terribly quick. All of the data will be brought down and then filtered, unless PolyBase is configured to improve performance, which I will discuss in a future blog post.

PolyBase and Azure Blob Storage provides the ability to access data stored offsite in a secure location in a cost effective manner. If you saw my presentation on this topic at PASS Summit or have questions how to make it work in your environment, please contact me on twitter or write a comment on my blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Monitoring Reports for R on SQL Server 2016

Whenever I talk about R, in the question and answer period I get a lot of questions from DBAs on how to monitor the impact of running R code on SQL Server. Fortunately Microsoft recently created a series of custom reports for SQL Server which provide a lot of useful information on how R is running on SQL Server. The custom reports are available on github here in a folder called SSMS-Custom-Reports. There are 8 different reports: R Services – Active Sessions, R Services – Configuration, R Services – Configure Instance, R Services – Execution Statistics, R Services – Extended Events, R Services – Packages, R Services – Resource Usage, R Services – Script. If it has been a while since you have added a custom report to SQL Server, this link should help you get the reports installed.

R Services Reports

The titles for the reports provide a good indication of what they do. Active Sessions will show how many users currently have their compute context set to run on the server. Configuration shows the current R Configuration, which is very useful information for checking on the R components on the server.

rconfigrpt

The report R Services – Configure Instance is for configuring R for the first time. If you have R setup already, chances are you won’t need this report, as it is really not a report but a script you need to run to configure R. R Services – Execution Statistics is a really neat way of monitoring the ongoing use of R. It keeps track of every time R is run on the server. If you run an R job then refresh the report, the execution count and time will go up.

sqlserverrrpt

R Services – Extended Events is also not a report but a list of all the extended events that are available for R Services. This is a handy bit of information, which can be a great reference tool for extended events monitoring. R Services – Packages lists the packages which are currently installed on SQL Server. When people write R, many lot of different packages are used within the script. Prior to running a package, check the information on this report to ensure the libraries used are installed on SQL Server. If the library is missing the code will not work. R Services – Resource Usage is a great way to see how R has been configured to run on the server. Notice I have created an External Pool for R. This is a configuration recommended by Microsoft to better monitor your R Services.

resourceusagerpt

The last report R Services – TSQL Script is a bit of a mystery. The screen contains this line of text. This is the T-SQL script for generating the data in report: No Report . I don’t know what was the intent of the report. The Readme.md in github provides this documentation about the reports.
SSMS Custom Reports

Custom reports for SQL Server R Services built using SQL Server Report Builder. The reports can be added to SQL Server Management Studio to view various configuration and runtime information about R scripts execution in SQL Server.

That doesn’t provide any clues to me as to what the intent of the report is, so you may want to skip installing it altogether. If anyone does figure out how this report can be used, please drop me a line as I am curious. I was really excited when I found these reports and I think that they provide a lot of valuable insight for monitoring R on SQL Server 2016.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Memory Usage in SQL Server for R

While R is an open source language, there are a number of different versions of R and each handles memory a little differently. Knowing which version is being used is important, especially when the code is going to be migrated to a server. As part of a SQL Server implementation, there are three different versions of R which come into play. The first is standard open source R, commonly known as CRAN R. This is the standard open source version of R which runs code in memory and is single threaded. The next version which will be installed as part of a SQL Server Installation is Microsoft R Open. This version of R was written to take advantage of the Intel Math Kernel Libraries [MLK]. Using the libraries speeds up many statistical calculations which use matrix operations. It also adds multi-threading capability to R as the rewrite provides the ability to use all available cores and processors and process in parallel. More information on how it works and how much faster Microsoft R Open is compared to standard R is available here. To use Microsoft R Open, once it is installed, in Rstudio should automatically start using it. To check out what version of R that is in use, within R Studio, go to Tools->Global Options and look at the R version.

rstudioversion

R studio here is using Microsoft R Open. To take a look at the version in Visual Studio which has R Tools installed, go to Rtools and Options. Microsoft R Open is open source and can be used by anyone to speed up the execution of their R code. The version of R which is implemented within SQL Server is not this version of R. The R server version is proprietary, and Microsoft is not giving it way for free, as to run it on a server you need to purchase either SQL Server 2016 or R Standalone. That version is R Server. There is full compatibility on all of the versions of R. Code written in either CRAN R or Microsoft R Open will work in R Server. To write code for R Server, the R Client needs to be installed. The Visual Studio Screen shows this version of R installed on a PC which also is running SQL Server 2016 Developer edition.

vsr

R and Memory Consumption

One of R’s strengths and weaknesses is the fact that R runs in memory. This is good thing because it means R is very fast. It is not such a good thing when you need more memory than your machine has. When reading through the product information surrounding Revolution Analytics’ version of R [Revo R] which Microsoft bought, one of the features is the ability of this version of R to be able to use not only memory but disk, by rxapplying chunking logic. The code is broken into smaller pieces, executed then put back together. Using Revo R, it’s possible to do this by using the specifically designed functions which all start with rx which implement this functionality, known as ScaleR. This does not mean that all R code running on SQL Server will be using these functions. Chances are most of it won’t, especially if the R code is being migrated from the existing data science team. Prior to being bought by Microsoft, Revolution Analytics charged money for their product. If your organization didn’t have a license, chances are no one was writing rx Functions. For people who don’t know anything about R, and have been asked to create packages to run R, there is an easy way to tell if the code has been written to use the chunking. If there are no words in the code that start with rx, like rxSummary, rxCube or rxHistogram to name a few, the code will not use chunking.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Creating New SSRS and SSIS Projects for SQL Server 2016

SSDTNow that SQL Server 2016 has been released, it is time to start creating new SSIS and SSRS projects for it. Since SQL Server 2014, SSIS has migrated to Visual Studio. The latest version, Visual Studio 2015, has a free Community edition, and can be found here. If you have it installed and try to create a new Reporting or Integration Services Project, you will notice that there are no templates listed which will allow you to create one of these projects.

Making SSIS and SSRS Projects for SQL Server 2016

To create SSIS projects in Visual Studio, you will need to click on this link to download the SQL Server Data Tools [SSDT] in the language of your choice. Visual Studio must not be running during the install. After about 5 minutes, when the install completes you will have a new application installed, SQL Server Data Tools 2015. You will still have the Visual Studio 2015 application as well, providing two methods for creating new packages. which means that you can click on this icon instead of opening up Visual studio. SSDT also contains the templates for database projects, so you can now start using Visual Studio.

Creating SSIS and SSRS Packages for Different Versions of SQL Server

Visual Studio SQL Server versioningIn this version of SQL Server Data Tools, Microsoft has finally addressed the common problem of needing to maintain multiple versions of SSIS packages for the different server versions. No longer do you need three different applications to maintain code for SQL Server 2012, 2014 and now 2016. All of these versions are supported with SSDT for Visual Studio 2015. SQL Server will detect which version the code was last saved in so that you don’t have to worry about accidently migrating code. You also have the ability to create an SSIS package in 2012, 2014 or 2016. To select the version you want, right click on the project and select Properties. Under Configuration Properties->General as shown in the picture, the TargetServerVersion, which defaults to SQL Server 2016, has a dropdown box making it possible to create a new package in Visual Studio 2015 for whatever version you need to support. Supporting the ability to write for different versions, is a great new feature and one which I am really happy is included in SSDT for Visual Studio 2015.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur