Writing R code in RTVS and Data Analytics with SQL Server

sqlbitslogoIn preparation for my session at SQLBits on Data Analytics with SQL Server, I reviewed all of my instructions for configuring computer to run R Client for SQL Server.  The instructions have changed with the release of R tools 1.0 for Visual Studio 2015 [RTVS].  Unfortunately, there are no R Tools for Visual Studio 2017. In the documentation for RTVS, they state that a version for VS 2107 will be released “soon”.  This new version makes it easier than ever to set up R for SQL Server as it contains all of the links needed for R Client, and invalidates most of the documentation for RTVS for changing the version.

Configuring a R Environment to use R on SQL Server

In addition to having an SQL Server 2016 instance with R Server installed, the following components are needed on a client machine

The Comprehensive R Archive Network

RStudio (optional)

Visual Studio 2015 R Tools

This list is a change from the previous list I have provided as RTVS contains an installation of R Client, there is no need to download that as well. You do not need to download Microsoft R Open if you are using R Server either.  Once RTVS is installed, there is a menu option on the R Tools window. Selecting Install R Client from the menu will handle the information. Unfortunately, there is no change to the menu option once R Client is installed, it always looks like you should install it.  To find out if R Client has been installed, look in the Workspaces window.

Selecting the Right Version of R within Visual Studio

RTVSWorkspacesPrior to RTVS 1.0, the version of R running was selected in the R Tools->Options. This has moved to the Workspaces window, which if you have the default version of RTVS, is the second tab in the bottom right corner of the screen.  This window will show the version of R that are installed.  In order to use R Client functions, you will need to select Microsoft R Client, as shown in the Workspaces tab.  The version selected will have a green check next to it as shown in the picture. To change the selection, click on the blue arrow near the gear, where you will be prompted with a message asking you if you are sure that you want to switch.

Changes to R Client make RevoScaleR Libraries Not Work

The latest version of the R client tools changes more than where to find the version of R running. The new client tools remove the need to install the RevoScaleR library.  With R Client 3.3.2, the library is no longer compatible and you will get an error if you try to install it.  The libraries are no longer needed as the functionality is included in R Client.  This means no additional libraries are need to for the rx commands like rxSetComputeContext(“local”). The functionality is included in R Client. If when trying to use R Client the error You are running version 9.0.1 of Microsoft R client on your computer, which is incompatible with the Microsoft R server version 8.0.3 appears, then you need to update SQL Server to the latest version, which is SP1 CU2, which you can get here.  If you haven’t installed SP1 for SQL Server, you will need to do that first.

Due to the changes in the R Client, a lot of documentation is no longer accurate, which is why if you are looking for information on R Client, make sure to check the date of the information to ensure what you are looking at is pretty current as things change a lot, which provides continual information for my blog.  I am looking forward to meeting more people who read it here at SQLBits 2017.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Security Updates to Power BI

powerbiyellowlogo
Office 365 Admin Screen for granting Power BI Admin rights

Office 365 Admin Screen for granting Power BI Admin rights

In the past month, Microsoft has made a number of security changes to Power BI. The first one, is not really a feature update, but a PowerShell replacement. No longer do you need to use PowerShell to become a Power BI Admin. Any Office 365 Admin can grant Power BI Admin permissions via this screen in the Admin Center. The Power BI Admin role was first created in October, but the screen was not complete, which was just fixed in February.

Power BI Security Changed from Tenant Only

People who have been granted Power BI administrator rights will also notice a modification to the Admin screen. The March 2017 update to Power BI provides a major change to the security model in Power BI. Previously all the security settings were set at the Tenant Level, meaning that all the privileges were granted to all users. If I wanted to allow one group within the organization to be able to publish reports to the web, but I did not want to allow everyone to publish reports to the web, there was no way that this could be accomplished. All that has changed. It is now possible to include or exclude groups of users from having rights in Power BI. Users can be classified into security groups in Azure Active Directory, either through the Office 365 Admin Center or via the Azure AD Admin Center. Once created the security groups can be used in Power BI. Security Groups are not the same thing as the groups created in Power BI when a new work group is created.

Using Security Groups in Power BI Admin

PowerBINonTenantAdmin

Power BI Admin Portal

The new Power BI Screen looks different. It now lists which rights can be specified to different groups of users. Share content to external users, Export Data, Export reports as PowerPoint presentations, Printing dashboards and reports, Content pack publishing, and Use Analyze in Excel with on-premises datasets now have the ability to be assigned to security groups so that the rights do not have to be the same throughout the entire tenant.

Unfortunately, some of the permissions are still tenant based. For example, the setting Publish to web, which is one permission I would definitely like to turn on only for some users, is still only available as a tenant level option.  These security changes are a welcome improvement to the product as they provide more options for administrators to grant rights to Power BI.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

SQL Server R Services and the 20 User IDs

What is the Reason why installing R creates 20 User IDs?

If you have installed SQL, you may have noticed that it creates twenty user ids as part of the installation process. To many people the automatic creation of a number for SQL Server User IDs is alarming, and they want to know what these IDs are for and who will be using them. The answer for who could be using those IDs, is just about anyone running R and they are going to be using the resources of the server to do it. If you are a DBA and want to figure out how to stop this, keep reading as I promise to tell you, after I provide some context about SQL Server and R internals.

R Server and Launchpad

SQLServerManagementConsoleWhen R Server is installed as part of SQL Server, one way you can check to see if it is installed is to look to see if the Launchpad service is running. When R code is running it does not run within SQL Server OS. It is by definition an external process and the Launchpad exe serves as a conduit between SQL Server and the space where R is running. If you want to know more about R and SQL Server Internals, this article I wrote for SQL Mag will provide a lot more details. Microsoft designed the Launchpad service so that other languages might someday also run as R does on SQL Server. It also supports a feature of R Server which I wrote about, context switching. Context Switching provides the ability for users to utilize Server memory instead of the memory on their computers for running R, and access is granted through the use of one of the twenty ids created when R is installed.

Launchpad Settings – Where the External Users are Referenced

launchpadThere are many reasons why a DBA might want to not allow clients to access server memory as that will tax the server. Turning it off is relatively simple. Go to the SQL Server Management Console and select SQL Server Launchpad for the instance of SQL Server running R Server.

In the picture of the screen, the instance of SQL Server I have running R Services is in SS2016. Right click on the server and select Properties, then click on the Advanced tab. When looking at the number of external users allowed by default, the number might look familiar. The reason there are twenty User IDs created for R Server is because Launchpad allocates by default external twenty users to connect from SQL Server to run R. If you don’t want to allow external users to run on a server, you will need to prevent the users from connecting by not enabling them to run R. To run R, users need to have db_rrerole permissions. If they do not have that, they cannot run R. On the production server, it is probably best that this permission not be granted to non-system users.

Since the External Users created are used by SQL Server when running R, it is not possible to set the number of external users to 0 as the Launchpad Service will not run, and no R Code can be executed anywhere. If the number of external users is modified, Configuration Manager provides a prompt window as a restart is required. If the number of External Users is set to 0, the Launchpad Service will not start. When the Launchpad Service tries to start, it will generate Error 1053: The Service Did Not Start in a Timely Fashion. The number of users has to be at least 1 for the service to be able to communicate with the external R components. If you add or reduce the number of External Users, the IDs will be either created or deleted to match the number listed.

Let me know if you found this information regarding SQL Server R Service information by commenting or messaging me on twitter. If you are interested in finding out more regarding the internals of SQL Server and R, you might be interested in reading this Article about the topic. I would also like to thank Bob Ward b | t of the Microsoft for helping me better understand the SQL Server R internals, and for patiently answering my questions on the topic.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Calculated Tables and Role-Playing Dimensions

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Working with role playing dimensions, which are found when you have say multiple dates in a table and you want to relate them back to a single date table, have always been problematic in SQL Server Analysis Services Tabular. Tabular models only allow one active relationship to a single column at a time. The picture on the left shows how tabular models represent a role playing dimension, and the model on the right is the recommended method for how to model the relationships in Analysis Services Tabular as then users can filter the data on a number of different date tables.

TabularRolePlaying dimension Modeling

 

RolePlayingDimension

The big downside to this is one has to import the date table into the model multiple times, meaning the same data is imported again and again. At least that was the case until SQL Server 2016 was released. This weeks TSQL topic Fixing Old Problems with Shiny New Toys is really good reason to describe a better way of handling this problem.

 

Calculated Columns: The solution for Role Playing Dimensions

SQL Server 2016 provides a new method of solving the role playing dimension problem, using a calculated column. Instead of copying in the source from the date table, instead create a formula to get a copy. First switch to the data view, of the model. Then select Table->New Calculated Table. ThSSASScreenCalcTablee screen will change to the new table screen and the cursor will be pointed to the formula.

In my model I have one table called date. I am going to add a calculated table called order date. The DAX is couldn’t be simpler. Just select the table named ‘Date’ which is shown in the picture below. Rename the table to something more meaningful, like Order Date and that is it. The modeling required is the same, but now the model size does not increase to accommodate all of the date tables needed, as there is only one copy of the date table referenced multiple times. If you are using Power BI this same concept can be used for handling role playing dimensions as well.

SQL Server 2016 had a lot of great new features, and in addition to the flashy ones like R there are a lot of great enhancements to the Tabular model that are worth investigating as well.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Context Switching in R Server

R code tends to be very memory intensive as R processes primarily in memory. If you want R to perform well, you want as much memory as you can get your hands on to run your code, especially with larger datasets. This is a problem as many individual laptops have pitifully low memory capacity, and unless you have a computer with say as much memory as you can put in this one, if you are analyzing large datasets you may run out of memory. If a new computer is not in the budget, why not develop on the server? You may be thinking that there is no way the administrator of the box is going to provide you the means to be able to use the server memory. Well, if you have a SQL Server 2016 with R Server installed, chances are you can use the memory capacity of the server by connecting your R process to run on it from your computer, without the need to install anything on the server.

Microsoft’s R Server contains some specialized functions which are not part of the standard CRAN R installation. One of the ScaleR functions, RxInSqlServer will allow code to be processed on the server from the client. To make this work, you must have R Server and R Client installed. If you are doing a test on a local machine, you will need both R Client and R Server installed on that computer.

How to use the Server Memory not Local memory for Running R

If you are developing R in your IDE of choice, either R Studio or Visual Studio with R Tools, here is the code you need to make that work, which includes code running on the server

#First you will need to install ('RevoScaleR') if not there already as context switching is included in that library
if (!require("RevoScaleR")) {
install.packages("RevoScaleR")
}
#Load the library
library(RevoScaleR)
#Create a connection to your SQL Server 2016 server instance. Note the double slashes which I needed to identify the instance name
#
sqlConnString <- "Driver=SQL Server;Server=DevSQLServer\\SS2016;Database=TestR;Uid=ReadDataID;Pwd=readd@t@!!!"
#Set the variable containing RxInSQLServer. Note All specific R Server libraries start with Rx
#
serverside #Set the Compute context to SQL server. After this the code will run using Server Memory, not local memory
#
rxSetComputeContext(serverside)
#Check to see what the compute context is. Not this is for informational purposes. You do not need to do this to make anything work.
#
rxGetComputeContext()
#If you want to change the compute context back to your computer run this command
#rxSetComputeContext("local")
#Until the context is switched back, I am now running on the server, not locally.
#Here I am going to take a look at a table in my TestR database called AirlineDemoSmall
#
sqlsampleTable <- "AirlineDemoSmall"
#
sqlPlaneDS<- RxSqlServerData(connectionString = sqlConnString, verbose = 1,table = sqlsampleTable )
#To take a look at the content of the data, I am going to take a look at 30 rows in table in my TestR database called AirlineDemoSmall
#
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 10)
#To visually investigate the data, this command will plot a histogram displaying the frequencies of values in #one of the columns, CRSDepTime
#
rxHistogram(~CRSDepTime, data = sqlPlaneDS)

Here’s the output I get back in the R interactive Window.

Data Source: SQLSERVER
Number of variables: 3
Variable information:
Var 1: ArrDelay, Type: character
Var 2: CRSDepTime, Type: numeric
Var 3: DayOfWeek, Type: character
Data (10 rows starting with row 1):
ArrDelay CRSDepTime DayOfWeek
1       -14 16.283333   Monday
2       -1   6.166667   Monday
3       -2   7.000000   Monday
4         0 10.266666   Monday
5         0 13.483334   Monday
6       -10 16.833334   Monday
7       -10 19.949999   Monday
8       350 14.650001   Monday
9       292   9.416667   Monday
10       M   6.000000   Monday

RxHistogram

RxHistogram

Let me know if you found this post helpful, by posting a comment. Thanks also to Mario, who asked me about context switching which gave me the idea to answer his questions on this site. If you are interested in seeing more information about SQL Server and R, please subscribe as I tend to answer more of the questions I receive here.

 

 

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Using Data Analysis to Pick Super Bowl Winners

I know that there is no way to compete with the major sports networks in the compilation of statistics about the two teams footballplaying in the super bowl. Instead I am going to focus on one feature, self-interest. Like many people, I have money in the stock market and I want my investments to make money next year. For this reason, I am an unqualified supporter of the Atlanta Falcons in the 2017 super bowl. The single data point I am using for my analysis is the fact that the falcons are an NFC Team, and when the NFC wins the stock market goes up. Go Falcons!

Correlation without Causation

Correlation does not imply causation is a common term in statistics and data analysis. It means that just because two variables move in relation to one another one does not mean that there is a cause and effect relationship between the two, even though it may seem like it. Just because when I washed my car it rained does not mean that I can control the precipitation patterns in the desert based on my propensity to visit the car wash. You may be thinking that having an NFC team win the super bowl and the stock market is an example of correlation without causation. After all the NFL does not control the world wide financial markets. If you look at the data though, 80% of the time the markets go up when the NFC wins. That is 50 years of data that supports that the winner does impact the market. Why might that be? Perhaps it follows Quantum Mechanics.

Observer Effect of Quantum Mechanics

When studying physics, specifically quantum mechanics researchers noticed that the observation changed the results. This is QuantumMechanicssomething commonly looked at when creating forecasts. Are the forecasts correct because the models are correct or because people believe them enough to make it happen. The superbowl winner impact on the stockmarket is well known. Perhaps it is for this reason that it becomes a self-fulfilling prophesy. This is the entire belief of many self-help ideas. If you believe it will happen, work to make it happen, it will happen. For whatever the reason, one cannot ignore 50 years of data.

Perhaps Patriots fans may think that I am pulling a lot of esoteric facts out of the air because I want the Patriots to lose. In all seriousness though, it is all about the data, and the observable effects of data knowledge. If you are watching the game and your team did not make the playoffs, and you are wondering who to root for because you do not care about the winner, perhaps this post helped you to decide.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Appending Data – It is OK to be different

AppendQueries

 

One of the more powerful features of Power BI Desktop is the query feature, which was called Power Query back when Power BI was part of Excel. Using the Query feature, if the data which you want to use is bad, has unneeded columns or contains data formatted differently than desired all of that can be readily fixed. The best thing about the query feature is that it uses the M language and records each step. Mess up a step? No problem just delete it and keep on going.

Appending in Power BI

AppendQueriesRecently I worked on a Power BI project where I needed to merge data provided in spreadsheets. The spreadsheets came from different vendors and while they contained mostly the same data, the columns were not in the same order. I wanted all of the data to reside in one table. In Query, that means that I wanted to Append the data. The files which I were merging were very wide, and I missed the fact until after I was done that some of the columns were in different order. Power BI is smart enough to figure out the order on its own. I didn’t need to change the order of the columns at all, as long as they have the same column names. Here’s an example using three different files.

 

File 1

File 1

 

Notice each of these files is a little different

 

 

 

 

 

 

 

File 2

File 2

File 3

File 3

 

 

 

 

 

 

 

 

 

 

I want to Append these files together so that all of the columns containing the like information will be in the same column. To do this the columns do not need to be reordered. As long as the column names are the same the contents will merge. I am going to need to modify File 3 to have the same file names, so I will rename Date to Expected Duration in Minutes and Location to Plant.  Since I know that File 3 came from Slingback Central, I am going to want to add that column to File 3 as well.  Othewise I will get a null value in the Maintenance ProvidAdd Custom Columner Column.  I do not need to place the column in any specific location as long as the name is the same. Renaming the columns is pretty easy.  All one needs to do is right click the column and select rename and type in the correct column name.  To add a new column, in Query select the tab Add Column and click on the Custom Column option. As you can see in the window pictured below, the text name Slingback Central has double quotes around it. If you forget to do that, you will get a syntax error.

Putting it All Together

Now that all of my queries have the same file names, I am ready to append them together.  To do that I select one of the queries and from the Home tab click on the icon on the far right side to Append Queries.  Since I want to paste three files together, I select the option for three or more files, and select all of them so that they appear on the right in the Tables to Append section of the screen.

After appending the data together, it merges all of the like columns together regardless of the order of the original files as shown below.

append

Not having to reorder columns is a great feature as it saved me a lot of time and I hope this post can do the same for someone else.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power BI and R Links

I presented Power BI and R today, and for those who were not able to attend, when available my engagement page will have the recording as all of my recorded presentations are always listed

Links for Ropensourcerlogo

There were a lot of links used both for R and for R and Power BI integration. All items needed are included here as well as a brief description.

Comprehensive Resource Archive Network [CRAN] is where one can download Open Source R, packages and contains lots of information about R.

Microsoft R Open which is a fully CRAN compatible version created using the Intel MKL for improved performance can be downloaded here.

Microsoft R Client used with SQL Server 2016. R Server is included in SQL Server 2016. The R Client is needed to connect to R Server even if they are on the same computer.

R Studio is the most popular IDE for developing R.

Visual Studio R Tools are required for Visual Studio 2015 to become and R development environment

CRAN Package List from Microsoft not only contains the list of all of the packages, but the ability to go back in time to look at previous versions of the packages

Power BI Custom R Visuals are found in a tab separate from the other Power BI Visuals on this page. There are six visualizations available.

CRAN R Library for Forecasting model is needed for Power BI Visualizations and the download is required for this visual.

I hope that you found this one stop location for everything you need to use R helpful. For more information on how to use these tools, please take a look at all of my R posts where I describe how to use them.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Data Hierarchies and Drill Through in Power BI

Hierarchies provide a method of organizing data in a table to recognize that one value encompasses all of the values beneath it. One very common hierarchy is a date hierarchy, which is used to show data summarized by year, then all the values for the quarters, then the values for three months in each quarter and each month which at the lowest level includes all the values for dates. There are other hierarchies which may also exist in data, such as sales regions. A sales region could include countries which include states or provinces which include cities which include actual addresses. Because this is how data is categorized, visualizations need to reflect this organization by containing hierarches.

Creating a Data Hierarchy in Power BI

Finding where to create hierarchies is the hardest part of creating them in Power BI, especially if one has ever created datahierarchiespowerbihierarchies in Excel Power Pivot as they are not it the same place. Hierarchies are not in the Relationships data view, instead they are found in the Report view. Right clicking on the ellipse next to any field in a table displays a menu, and the second item on the menu is New hierarchy. Hierarchies can also be created by clicking and dragging a field on top of another field, which also will create a hierarchy. Once the hierarchy has been created, to add another field to the hierarchy, drag a new value on top of the value with the hierarchy icon. If the value added is not added to the location you want it, click on the ellipse next to the field named and move the field up or down as you wish.

Drill Through Reporting in Power BI

There are two ways to do drill through in Power BI, either by adding fields to the group section on a visualization or by adding a hierarchy as a field. If I create a date hierarchy and add the hierarchy to the axis of a bar chart visualization, the top Right and left corners will have arrows in them. Drilling down and back up occurs by selecting either the up and down Arrow keys, then clicking on a bar.powerbibardrilldown

For example, if the down arrow in the left corner is selected, clicking on the bar for 2013 will show data for the 2013 quarters. Repeated clicking will provide data down to the month, then the days in that month. The double arrow buttons provide the ability to show the data in the next level slightly differently. Clicking on the left double arrow will drill down to the next level. Clicking on it would provide a bar visualization for 4 quarters with data from all years, then 12 months of the combined years then the days. The double arrow with the line connecting the arrow only works on the highest level of visualization. When the double arrow line button is selected initially, it will show the quarters listed for every year, then the months and years, then the day, month and year. I hope you found this post helpful in explaining some of the features natively included in Power BI. If you are interested in learning how to expand the visualization capabilities of Power BI by including R visuals, please attend my upcoming webinar.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

DIY – Using Custom Power BI R-Powered Visuals

In a recent post I introduced using R custom Visuals in Power BI, and this post details how to use the correlation visual in a Power BI Report. The first step in the process is to download R, if it is not on the computer already. if SQL Server 2016 with R integration is installed, there is no need to download R, as it was installed already. If the computer does not have R installed, click here. Once R is installed, go to the Power BI Custom Visualization page and select the R tab to pick one of the six R visualizations. I picked the correlation plot. To use the R visualization, if the machine does not already contain the packages used in the visualization, as screen will prompt you to install some packages also. This may take a little while to do and when it is complete, a window will appear showing the packages were successfully installed. Now the custom Visual can be used in Power BI.

Using the Correlation Visualization in Power BI

Power BI will show the new visual which you can place on the report. It is important to understand a little about R to better understand the error messages received. For example if you are using the fcoorelationplotorecasting tool and have selected the year value instead of the date, you may receive an error about an invalid time series. The underlying code is expecting to receive a date value, and a year is not a date, meaning you have to reference a date field in order to make it work.

Differences between R Visualizations and other Power BI Visualizations

Interacting with R visuals works differently than with other report visualizations as you cannot click on elements within the visualization and filter other items on the page. Other visuals on the page will filter the data contained within the R visual. For example, let’s say my report contains a total field, a slicer which contains years and a correlation plot which contains products. If the slicker is changed to select a year, total field and the data within the R visual will change to reflect that. If on the other hand, I choose to click on the R visual to select one of the product categories, the total field will not change and the R visual will not change. The R visual’s appearance will not change in any way.

nointernetrmessage

One interesting thing to note is if you have created an R visual in Power BI and are working without an internet connection, the report will throw an error when trying to open the report. This occurred when it was a report I created or a sample report, so it appears with these reports an internet connection is required.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur