Articles for the Month of February 2017

SQL Server R Services and the 20 User IDs

What is the Reason why installing R creates 20 User IDs?

If you have installed SQL, you may have noticed that it creates twenty user ids as part of the installation process. To many people the automatic creation of a number for SQL Server User IDs is alarming, and they want to know what these IDs are for and who will be using them. The answer for who could be using those IDs, is just about anyone running R and they are going to be using the resources of the server to do it. If you are a DBA and want to figure out how to stop this, keep reading as I promise to tell you, after I provide some context about SQL Server and R internals.

R Server and Launchpad

SQLServerManagementConsoleWhen R Server is installed as part of SQL Server, one way you can check to see if it is installed is to look to see if the Launchpad service is running. When R code is running it does not run within SQL Server OS. It is by definition an external process and the Launchpad exe serves as a conduit between SQL Server and the space where R is running. If you want to know more about R and SQL Server Internals, this article I wrote for SQL Mag will provide a lot more details. Microsoft designed the Launchpad service so that other languages might someday also run as R does on SQL Server. It also supports a feature of R Server which I wrote about, context switching. Context Switching provides the ability for users to utilize Server memory instead of the memory on their computers for running R, and access is granted through the use of one of the twenty ids created when R is installed.

Launchpad Settings – Where the External Users are Referenced

launchpadThere are many reasons why a DBA might want to not allow clients to access server memory as that will tax the server. Turning it off is relatively simple. Go to the SQL Server Management Console and select SQL Server Launchpad for the instance of SQL Server running R Server.

In the picture of the screen, the instance of SQL Server I have running R Services is in SS2016. Right click on the server and select Properties, then click on the Advanced tab. When looking at the number of external users allowed by default, the number might look familiar. The reason there are twenty User IDs created for R Server is because Launchpad allocates by default external twenty users to connect from SQL Server to run R. If you don’t want to allow external users to run on a server, you will need to prevent the users from connecting by not enabling them to run R. To run R, users need to have db_rrerole permissions. If they do not have that, they cannot run R. On the production server, it is probably best that this permission not be granted to non-system users.

Since the External Users created are used by SQL Server when running R, it is not possible to set the number of external users to 0 as the Launchpad Service will not run, and no R Code can be executed anywhere. If the number of external users is modified, Configuration Manager provides a prompt window as a restart is required. If the number of External Users is set to 0, the Launchpad Service will not start. When the Launchpad Service tries to start, it will generate Error 1053: The Service Did Not Start in a Timely Fashion. The number of users has to be at least 1 for the service to be able to communicate with the external R components. If you add or reduce the number of External Users, the IDs will be either created or deleted to match the number listed.

Let me know if you found this information regarding SQL Server R Service information by commenting or messaging me on twitter. If you are interested in finding out more regarding the internals of SQL Server and R, you might be interested in reading this Article about the topic. I would also like to thank Bob Ward b | t of the Microsoft for helping me better understand the SQL Server R internals, and for patiently answering my questions on the topic.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Calculated Tables and Role-Playing Dimensions

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Working with role playing dimensions, which are found when you have say multiple dates in a table and you want to relate them back to a single date table, have always been problematic in SQL Server Analysis Services Tabular. Tabular models only allow one active relationship to a single column at a time. The picture on the left shows how tabular models represent a role playing dimension, and the model on the right is the recommended method for how to model the relationships in Analysis Services Tabular as then users can filter the data on a number of different date tables.

TabularRolePlaying dimension Modeling

 

RolePlayingDimension

The big downside to this is one has to import the date table into the model multiple times, meaning the same data is imported again and again. At least that was the case until SQL Server 2016 was released. This weeks TSQL topic Fixing Old Problems with Shiny New Toys is really good reason to describe a better way of handling this problem.

 

Calculated Columns: The solution for Role Playing Dimensions

SQL Server 2016 provides a new method of solving the role playing dimension problem, using a calculated column. Instead of copying in the source from the date table, instead create a formula to get a copy. First switch to the data view, of the model. Then select Table->New Calculated Table. ThSSASScreenCalcTablee screen will change to the new table screen and the cursor will be pointed to the formula.

In my model I have one table called date. I am going to add a calculated table called order date. The DAX is couldn’t be simpler. Just select the table named ‘Date’ which is shown in the picture below. Rename the table to something more meaningful, like Order Date and that is it. The modeling required is the same, but now the model size does not increase to accommodate all of the date tables needed, as there is only one copy of the date table referenced multiple times. If you are using Power BI this same concept can be used for handling role playing dimensions as well.

SQL Server 2016 had a lot of great new features, and in addition to the flashy ones like R there are a lot of great enhancements to the Tabular model that are worth investigating as well.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Context Switching in R Server

R code tends to be very memory intensive as R processes primarily in memory. If you want R to perform well, you want as much memory as you can get your hands on to run your code, especially with larger datasets. This is a problem as many individual laptops have pitifully low memory capacity, and unless you have a computer with say as much memory as you can put in this one, if you are analyzing large datasets you may run out of memory. If a new computer is not in the budget, why not develop on the server? You may be thinking that there is no way the administrator of the box is going to provide you the means to be able to use the server memory. Well, if you have a SQL Server 2016 with R Server installed, chances are you can use the memory capacity of the server by connecting your R process to run on it from your computer, without the need to install anything on the server.

Microsoft’s R Server contains some specialized functions which are not part of the standard CRAN R installation. One of the ScaleR functions, RxInSqlServer will allow code to be processed on the server from the client. To make this work, you must have R Server and R Client installed. If you are doing a test on a local machine, you will need both R Client and R Server installed on that computer.

How to use the Server Memory not Local memory for Running R

If you are developing R in your IDE of choice, either R Studio or Visual Studio with R Tools, here is the code you need to make that work, which includes code running on the server

#First you will need to install ('RevoScaleR') if not there already as context switching is included in that library
if (!require("RevoScaleR")) {
install.packages("RevoScaleR")
}
#Load the library
library(RevoScaleR)
#Create a connection to your SQL Server 2016 server instance. Note the double slashes which I needed to identify the instance name
#
sqlConnString <- "Driver=SQL Server;Server=DevSQLServer\\SS2016;Database=TestR;Uid=ReadDataID;Pwd=readd@t@!!!"
#Set the variable containing RxInSQLServer. Note All specific R Server libraries start with Rx
#
serverside #Set the Compute context to SQL server. After this the code will run using Server Memory, not local memory
#
rxSetComputeContext(serverside)
#Check to see what the compute context is. Not this is for informational purposes. You do not need to do this to make anything work.
#
rxGetComputeContext()
#If you want to change the compute context back to your computer run this command
#rxSetComputeContext("local")
#Until the context is switched back, I am now running on the server, not locally.
#Here I am going to take a look at a table in my TestR database called AirlineDemoSmall
#
sqlsampleTable <- "AirlineDemoSmall"
#
sqlPlaneDS<- RxSqlServerData(connectionString = sqlConnString, verbose = 1,table = sqlsampleTable )
#To take a look at the content of the data, I am going to take a look at 30 rows in table in my TestR database called AirlineDemoSmall
#
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 10)
#To visually investigate the data, this command will plot a histogram displaying the frequencies of values in #one of the columns, CRSDepTime
#
rxHistogram(~CRSDepTime, data = sqlPlaneDS)

Here’s the output I get back in the R interactive Window.

Data Source: SQLSERVER
Number of variables: 3
Variable information:
Var 1: ArrDelay, Type: character
Var 2: CRSDepTime, Type: numeric
Var 3: DayOfWeek, Type: character
Data (10 rows starting with row 1):
ArrDelay CRSDepTime DayOfWeek
1       -14 16.283333   Monday
2       -1   6.166667   Monday
3       -2   7.000000   Monday
4         0 10.266666   Monday
5         0 13.483334   Monday
6       -10 16.833334   Monday
7       -10 19.949999   Monday
8       350 14.650001   Monday
9       292   9.416667   Monday
10       M   6.000000   Monday

RxHistogram

RxHistogram

Let me know if you found this post helpful, by posting a comment. Thanks also to Mario, who asked me about context switching which gave me the idea to answer his questions on this site. If you are interested in seeing more information about SQL Server and R, please subscribe as I tend to answer more of the questions I receive here.

 

 

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Using Data Analysis to Pick Super Bowl Winners

I know that there is no way to compete with the major sports networks in the compilation of statistics about the two teams footballplaying in the super bowl. Instead I am going to focus on one feature, self-interest. Like many people, I have money in the stock market and I want my investments to make money next year. For this reason, I am an unqualified supporter of the Atlanta Falcons in the 2017 super bowl. The single data point I am using for my analysis is the fact that the falcons are an NFC Team, and when the NFC wins the stock market goes up. Go Falcons!

Correlation without Causation

Correlation does not imply causation is a common term in statistics and data analysis. It means that just because two variables move in relation to one another one does not mean that there is a cause and effect relationship between the two, even though it may seem like it. Just because when I washed my car it rained does not mean that I can control the precipitation patterns in the desert based on my propensity to visit the car wash. You may be thinking that having an NFC team win the super bowl and the stock market is an example of correlation without causation. After all the NFL does not control the world wide financial markets. If you look at the data though, 80% of the time the markets go up when the NFC wins. That is 50 years of data that supports that the winner does impact the market. Why might that be? Perhaps it follows Quantum Mechanics.

Observer Effect of Quantum Mechanics

When studying physics, specifically quantum mechanics researchers noticed that the observation changed the results. This is QuantumMechanicssomething commonly looked at when creating forecasts. Are the forecasts correct because the models are correct or because people believe them enough to make it happen. The superbowl winner impact on the stockmarket is well known. Perhaps it is for this reason that it becomes a self-fulfilling prophesy. This is the entire belief of many self-help ideas. If you believe it will happen, work to make it happen, it will happen. For whatever the reason, one cannot ignore 50 years of data.

Perhaps Patriots fans may think that I am pulling a lot of esoteric facts out of the air because I want the Patriots to lose. In all seriousness though, it is all about the data, and the observable effects of data knowledge. If you are watching the game and your team did not make the playoffs, and you are wondering who to root for because you do not care about the winner, perhaps this post helped you to decide.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur