Articles

Writing R code in RTVS and Data Analytics with SQL Server

sqlbitslogoIn preparation for my session at SQLBits on Data Analytics with SQL Server, I reviewed all of my instructions for configuring computer to run R Client for SQL Server.  The instructions have changed with the release of R tools 1.0 for Visual Studio 2015 [RTVS].  Unfortunately, there are no R Tools for Visual Studio 2017. In the documentation for RTVS, they state that a version for VS 2107 will be released “soon”.  This new version makes it easier than ever to set up R for SQL Server as it contains all of the links needed for R Client, and invalidates most of the documentation for RTVS for changing the version.

Configuring a R Environment to use R on SQL Server

In addition to having an SQL Server 2016 instance with R Server installed, the following components are needed on a client machine

The Comprehensive R Archive Network

RStudio (optional)

Visual Studio 2015 R Tools

This list is a change from the previous list I have provided as RTVS contains an installation of R Client, there is no need to download that as well. You do not need to download Microsoft R Open if you are using R Server either.  Once RTVS is installed, there is a menu option on the R Tools window. Selecting Install R Client from the menu will handle the information. Unfortunately, there is no change to the menu option once R Client is installed, it always looks like you should install it.  To find out if R Client has been installed, look in the Workspaces window.

Selecting the Right Version of R within Visual Studio

RTVSWorkspacesPrior to RTVS 1.0, the version of R running was selected in the R Tools->Options. This has moved to the Workspaces window, which if you have the default version of RTVS, is the second tab in the bottom right corner of the screen.  This window will show the version of R that are installed.  In order to use R Client functions, you will need to select Microsoft R Client, as shown in the Workspaces tab.  The version selected will have a green check next to it as shown in the picture. To change the selection, click on the blue arrow near the gear, where you will be prompted with a message asking you if you are sure that you want to switch.

Changes to R Client make RevoScaleR Libraries Not Work

The latest version of the R client tools changes more than where to find the version of R running. The new client tools remove the need to install the RevoScaleR library.  With R Client 3.3.2, the library is no longer compatible and you will get an error if you try to install it.  The libraries are no longer needed as the functionality is included in R Client.  This means no additional libraries are need to for the rx commands like rxSetComputeContext(“local”). The functionality is included in R Client. If when trying to use R Client the error You are running version 9.0.1 of Microsoft R client on your computer, which is incompatible with the Microsoft R server version 8.0.3 appears, then you need to update SQL Server to the latest version, which is SP1 CU2, which you can get here.  If you haven’t installed SP1 for SQL Server, you will need to do that first.

Due to the changes in the R Client, a lot of documentation is no longer accurate, which is why if you are looking for information on R Client, make sure to check the date of the information to ensure what you are looking at is pretty current as things change a lot, which provides continual information for my blog.  I am looking forward to meeting more people who read it here at SQLBits 2017.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

SQL Server R Services and the 20 User IDs

What is the Reason why installing R creates 20 User IDs?

If you have installed SQL, you may have noticed that it creates twenty user ids as part of the installation process. To many people the automatic creation of a number for SQL Server User IDs is alarming, and they want to know what these IDs are for and who will be using them. The answer for who could be using those IDs, is just about anyone running R and they are going to be using the resources of the server to do it. If you are a DBA and want to figure out how to stop this, keep reading as I promise to tell you, after I provide some context about SQL Server and R internals.

R Server and Launchpad

SQLServerManagementConsoleWhen R Server is installed as part of SQL Server, one way you can check to see if it is installed is to look to see if the Launchpad service is running. When R code is running it does not run within SQL Server OS. It is by definition an external process and the Launchpad exe serves as a conduit between SQL Server and the space where R is running. If you want to know more about R and SQL Server Internals, this article I wrote for SQL Mag will provide a lot more details. Microsoft designed the Launchpad service so that other languages might someday also run as R does on SQL Server. It also supports a feature of R Server which I wrote about, context switching. Context Switching provides the ability for users to utilize Server memory instead of the memory on their computers for running R, and access is granted through the use of one of the twenty ids created when R is installed.

Launchpad Settings – Where the External Users are Referenced

launchpadThere are many reasons why a DBA might want to not allow clients to access server memory as that will tax the server. Turning it off is relatively simple. Go to the SQL Server Management Console and select SQL Server Launchpad for the instance of SQL Server running R Server.

In the picture of the screen, the instance of SQL Server I have running R Services is in SS2016. Right click on the server and select Properties, then click on the Advanced tab. When looking at the number of external users allowed by default, the number might look familiar. The reason there are twenty User IDs created for R Server is because Launchpad allocates by default external twenty users to connect from SQL Server to run R. If you don’t want to allow external users to run on a server, you will need to prevent the users from connecting by not enabling them to run R. To run R, users need to have db_rrerole permissions. If they do not have that, they cannot run R. On the production server, it is probably best that this permission not be granted to non-system users.

Since the External Users created are used by SQL Server when running R, it is not possible to set the number of external users to 0 as the Launchpad Service will not run, and no R Code can be executed anywhere. If the number of external users is modified, Configuration Manager provides a prompt window as a restart is required. If the number of External Users is set to 0, the Launchpad Service will not start. When the Launchpad Service tries to start, it will generate Error 1053: The Service Did Not Start in a Timely Fashion. The number of users has to be at least 1 for the service to be able to communicate with the external R components. If you add or reduce the number of External Users, the IDs will be either created or deleted to match the number listed.

Let me know if you found this information regarding SQL Server R Service information by commenting or messaging me on twitter. If you are interested in finding out more regarding the internals of SQL Server and R, you might be interested in reading this Article about the topic. I would also like to thank Bob Ward b | t of the Microsoft for helping me better understand the SQL Server R internals, and for patiently answering my questions on the topic.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Context Switching in R Server

R code tends to be very memory intensive as R processes primarily in memory. If you want R to perform well, you want as much memory as you can get your hands on to run your code, especially with larger datasets. This is a problem as many individual laptops have pitifully low memory capacity, and unless you have a computer with say as much memory as you can put in this one, if you are analyzing large datasets you may run out of memory. If a new computer is not in the budget, why not develop on the server? You may be thinking that there is no way the administrator of the box is going to provide you the means to be able to use the server memory. Well, if you have a SQL Server 2016 with R Server installed, chances are you can use the memory capacity of the server by connecting your R process to run on it from your computer, without the need to install anything on the server.

Microsoft’s R Server contains some specialized functions which are not part of the standard CRAN R installation. One of the ScaleR functions, RxInSqlServer will allow code to be processed on the server from the client. To make this work, you must have R Server and R Client installed. If you are doing a test on a local machine, you will need both R Client and R Server installed on that computer.

How to use the Server Memory not Local memory for Running R

If you are developing R in your IDE of choice, either R Studio or Visual Studio with R Tools, here is the code you need to make that work, which includes code running on the server

#First you will need to install ('RevoScaleR') if not there already as context switching is included in that library
if (!require("RevoScaleR")) {
install.packages("RevoScaleR")
}
#Load the library
library(RevoScaleR)
#Create a connection to your SQL Server 2016 server instance. Note the double slashes which I needed to identify the instance name
#
sqlConnString <- "Driver=SQL Server;Server=DevSQLServer\\SS2016;Database=TestR;Uid=ReadDataID;Pwd=readd@t@!!!"
#Set the variable containing RxInSQLServer. Note All specific R Server libraries start with Rx
#
serverside #Set the Compute context to SQL server. After this the code will run using Server Memory, not local memory
#
rxSetComputeContext(serverside)
#Check to see what the compute context is. Not this is for informational purposes. You do not need to do this to make anything work.
#
rxGetComputeContext()
#If you want to change the compute context back to your computer run this command
#rxSetComputeContext("local")
#Until the context is switched back, I am now running on the server, not locally.
#Here I am going to take a look at a table in my TestR database called AirlineDemoSmall
#
sqlsampleTable <- "AirlineDemoSmall"
#
sqlPlaneDS<- RxSqlServerData(connectionString = sqlConnString, verbose = 1,table = sqlsampleTable )
#To take a look at the content of the data, I am going to take a look at 30 rows in table in my TestR database called AirlineDemoSmall
#
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 10)
#To visually investigate the data, this command will plot a histogram displaying the frequencies of values in #one of the columns, CRSDepTime
#
rxHistogram(~CRSDepTime, data = sqlPlaneDS)

Here’s the output I get back in the R interactive Window.

Data Source: SQLSERVER
Number of variables: 3
Variable information:
Var 1: ArrDelay, Type: character
Var 2: CRSDepTime, Type: numeric
Var 3: DayOfWeek, Type: character
Data (10 rows starting with row 1):
ArrDelay CRSDepTime DayOfWeek
1       -14 16.283333   Monday
2       -1   6.166667   Monday
3       -2   7.000000   Monday
4         0 10.266666   Monday
5         0 13.483334   Monday
6       -10 16.833334   Monday
7       -10 19.949999   Monday
8       350 14.650001   Monday
9       292   9.416667   Monday
10       M   6.000000   Monday

RxHistogram

RxHistogram

Let me know if you found this post helpful, by posting a comment. Thanks also to Mario, who asked me about context switching which gave me the idea to answer his questions on this site. If you are interested in seeing more information about SQL Server and R, please subscribe as I tend to answer more of the questions I receive here.

 

 

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power BI and R Links

I presented Power BI and R today, and for those who were not able to attend, when available my engagement page will have the recording as all of my recorded presentations are always listed

Links for Ropensourcerlogo

There were a lot of links used both for R and for R and Power BI integration. All items needed are included here as well as a brief description.

Comprehensive Resource Archive Network [CRAN] is where one can download Open Source R, packages and contains lots of information about R.

Microsoft R Open which is a fully CRAN compatible version created using the Intel MKL for improved performance can be downloaded here.

Microsoft R Client used with SQL Server 2016. R Server is included in SQL Server 2016. The R Client is needed to connect to R Server even if they are on the same computer.

R Studio is the most popular IDE for developing R.

Visual Studio R Tools are required for Visual Studio 2015 to become and R development environment

CRAN Package List from Microsoft not only contains the list of all of the packages, but the ability to go back in time to look at previous versions of the packages

Power BI Custom R Visuals are found in a tab separate from the other Power BI Visuals on this page. There are six visualizations available.

CRAN R Library for Forecasting model is needed for Power BI Visualizations and the download is required for this visual.

I hope that you found this one stop location for everything you need to use R helpful. For more information on how to use these tools, please take a look at all of my R posts where I describe how to use them.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

DIY – Using Custom Power BI R-Powered Visuals

In a recent post I introduced using R custom Visuals in Power BI, and this post details how to use the correlation visual in a Power BI Report. The first step in the process is to download R, if it is not on the computer already. if SQL Server 2016 with R integration is installed, there is no need to download R, as it was installed already. If the computer does not have R installed, click here. Once R is installed, go to the Power BI Custom Visualization page and select the R tab to pick one of the six R visualizations. I picked the correlation plot. To use the R visualization, if the machine does not already contain the packages used in the visualization, as screen will prompt you to install some packages also. This may take a little while to do and when it is complete, a window will appear showing the packages were successfully installed. Now the custom Visual can be used in Power BI.

Using the Correlation Visualization in Power BI

Power BI will show the new visual which you can place on the report. It is important to understand a little about R to better understand the error messages received. For example if you are using the fcoorelationplotorecasting tool and have selected the year value instead of the date, you may receive an error about an invalid time series. The underlying code is expecting to receive a date value, and a year is not a date, meaning you have to reference a date field in order to make it work.

Differences between R Visualizations and other Power BI Visualizations

Interacting with R visuals works differently than with other report visualizations as you cannot click on elements within the visualization and filter other items on the page. Other visuals on the page will filter the data contained within the R visual. For example, let’s say my report contains a total field, a slicer which contains years and a correlation plot which contains products. If the slicker is changed to select a year, total field and the data within the R visual will change to reflect that. If on the other hand, I choose to click on the R visual to select one of the product categories, the total field will not change and the R visual will not change. The R visual’s appearance will not change in any way.

nointernetrmessage

One interesting thing to note is if you have created an R visual in Power BI and are working without an internet connection, the report will throw an error when trying to open the report. This occurred when it was a report I created or a sample report, so it appears with these reports an internet connection is required.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Using Power BI to Display R Visuals

 Microsoft has in the past few months increased the support for R Visuals in Power BI. In the July release, Power BI Service started displaying interactive visuals and in October, six R visuals were made

rvisualsavailable in Power BI which do not require any knowledge of R. Six visuals is a far step away from the wide number of visuals available in R. According to the Comprehensive R [CRAN] as of today there are 9557 packages available for R, and that does not count all of the packages on GitHub.

R Packages Supported in the Power BI Service

According to a September 8th post on the Power BI Blog, there are currently 352 R packages supported in the Power BI Service. For those of you wondering if your favorite R library is included (yes dplyr, ggplot2 and cluster are there) I have included the list of packages here.

abc abc.data acepack actuar ade4 AdMit andrews
aod apcluster ape aplpack approximator arm arules
arulesViz assertthat BACCO BaM BAS base64enc bayesm
bayesQR bayesSurv BayesTree BayHaz bbemkr BCBCSF BCE
bclust BenfordTests bisoreg bit bit64 bitops BMA
Bmix BMS0 bnlearn, boa boot bootstrap bqtl
BradleyTerry2 brew brglm, bspec bspmma BVS C50
calibrator car caret, catnet caTools cclust class
clue cluster coda coin CORElearn corpcor corrplot
crayon cslogistic cubature cvTools data.table date dbscan
deal deepnet deldir dendextend DEoptimR deSolve DiagrammeR
dichromat digest dlm DMwR dplyr DPpackage dse
dtw e1071 earth EbayesThresh ebdbNet ellipse emulator
ensembleBMA entropy EvalEst evaluate evdbayes exactLoglinTest expm
extremevalues FactoMineR factorQR faoutlier fBasics fields filehash
fitdistrplus flashClust FME fmsb foreach forecast Formula
fracdiff fTrading futile.logger gam gamlr gclus gdata
gee genetics geoRglm geosphere ggdendro ggmcmc ggplot2
ggthemes glmmBUGS glmnet gmodels gmp gnm GPArotation
gridBase gridExtra growcurves grpreg gss gsubfn gtable
gtools haplo.stats hash hbsae hdrcde heavy HH
HI Hmisc HSAUR ifultools intervals ipred irlba
irr iterators jpeg kernlab KernSmooth KFKSDS kinship2
kknn klaR knitr labeling lars lattice latticeExtra
lava lavaan lazyeval leaps LearnBayes LiblineaR LICORS
limSolve lme4 lmm lmtest locfit locpol LogicReg
lsa lubridate magic magrittr mapdata mapproj maps
maptree MASS MasterBayes Matrix matrixcalc MatrixModels maxent
maxLik mboost mda memoise mi mice microbenchmark
mime misc3d miscTools mitools mixtools mlbench mnormt
MNP modeltools mombf monomvn MSBVAR msm multcomp
munsell mvtnorm NbClust ncvreg nlme nloptr NLP
NMF nnet nnls nortest numbers numDeriv OutlierDC
pacbpred party partykit PAWL pbivnorm pcaPP pdc
PerformanceAnalytics plotmo plotrix pls plyr png polynom
predmixcor PresenceAbsence profdpm proto proxy pryr pscl
psych qap qdapRegex quadprog quantreg qvcalc R.oo
.utils ramps RandomFieldsUtils RColorBrewer Rcpp RcppArmadillo RcppEigen
registry reliampo relimp reshape reshape2 RGraphics rjson
RJSONIO Rmpfr rms robustbase ROCR rpart.plot rrcov
rscproxy RSGHB RTextTools rworldmap SampleSizeMeans SampleSizeProportions sbgcop
scales scatterplot3d sciplot segmented seriation setRNG sfsmisc
SIS SixSigma skmeans slam smoothSurv sna snow
SnowballC snowFT sp spacetime spam SparseM spatial
spBayes spikeslab splancs spls spTimer sqldf sROC
stabledist stabs stepPlr stringdist stringr stsm stsm.class
survival tau tcltk2 tensorA textcat textir tfplot
TH.data tidyr timeDate timeSeries tm topicmodels tree
tseries tsfa tsoutliers TSP vcd VennDiagram VGAM
VIM whisker wordcloud xgboost XML xts yaml
zipfR zoo

 

 

I hope that some day that this list becomes much longer, but it is a good start. If your company has lots R visuals and you wish to migrate them to Power BI, chances are some of the libraries you are using are not here. If you are interested in having your library added to the list of 352, go to the Ideas page of Power BI and request that your library be added, as Microsoft I know looks at this page to determine what to release in the future. Someone has requested that igraph be added, and since it hasn’t received a lot of votes yet (hint) it is probably low on the priority list.

Release Plans for Additional R Library Support in Power BI

In reviewing the Power BI documentation online, it is not known what Microsoft’s plans are for future R packages. Does Microsoft plan on adding a lot more in the future? Will new libraries be added as part of upcoming releases? All questions where the web presently does not provide answers.

The current package support in Power BI does provide the R developer with a method to more securely and widely distribute R visualizations. In my upcoming posts I will review how to use the six R visualizations which do not require R and also how to use R with Power BI.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Questions and Answers on Machine Learning with R

opensourcerlogoRecently, I did a webinar on Machine Learning and R. I received a number of questions during the presentation. Due to time constraints, I was unable to answer all of them, so I have provided the Question and Answers here.

Question: Can I Use R in SQL Server to plot non-linear regression curves? We use IC50 and others in Michaelis-Menten kinetics for bio-chemical work.

R running on SQL Server provides the functionality of standard CRAN R packages with the additional capability to run the SCALER functions provided by SQL Server’s implementation of R. Any other functionality performed in R can therefore also be performed on SQL Server. Like all R code, you will need to install the required R libraries in the appropriate subdirectory. The directory is something like C:\Program Files\Microsoft SQL Server\MSSQL13.<instancename>\R_SERVICES\library

Question: Can you execute/invoke an external R script from T-SQL?

Yes. With SQL Server 2016 it is possible to run an R script from within T-SQL by using the sp_execute_external_script command. This link should help you with more detailed information on how to execute R scripts within T-SQL

Question: Is it possible to run R processes in diffrent boxes other than SQL Server itself for scalability reasons?

You have the option of installing the R Server on another server. Just keep in mind that you do have to account for the additional overhead of moving all the data over the network, which needs to weigh in on your decision to move processing to a different server.

Question: Can we join data generated from an R script to SQL data directly or does it have to be inserted into a table first?

Data generated from an R script interacts with the data the same way other data in a stored procedure does. It is not necessary to write the data to a table for the data to be processed.

Question: When would you use R Server versus R on SQL Server?

R on SQL server is an implementation of R server. When running R on SQL Server, R Server is running.

Question: What is the maximum file size I can load on R server?

R Server uses SCALR which provides the ability to not only use memory but also use disk storage, providing nearly limitless capability for file size processing.

Question: Is knitr integrated or can it be integrated into R for SQL Server?

While you can install the knitr library on SQL Server to implement the functionality provided, there is no direct integration path for incorporating the wide functionality knitr provides within SQL Server. For more information about knitr, please see this link.

Question: Does MS R Open have all the functionalities of CRAN R? Does it support all packages?

MS R Open is fully CRAN R compliant. All code which runs in CRAN R can run in MS R Open and all packages are supported. Since MS R Open was rewritten to use the Intel Math Kernel Libraries, it provides technical advantages to CRAN R. MS R Open is partially Multi-threaded, instead of single threaded like CRAN R, and is up to 38% faster

Question: Do you have a link for the Microsoft Machine learning cheat sheet?

Sure. The link can be found here.

Question: Why Use Microsoft R Open over other Analytics Tools such as Open R, Python, Matlab?

Microsoft R Open is fully CRAN R compliant, and also multi-threaded and faster, providing a clear benefit over CRAN R. Python is a great tool for data analytics, but unlike R, it is not designed solely for statistically analysis but has a wider functionality scope. R is focused solely on providing statistical data analysis. Matlab is a great tool, but given its complicated user interface and high licensing cost, many users may gravitate towards R, as it is free and there is a lot of good support for learning R available online as well.

Question: Can R be used with older versions of SQL Server or only 2016?

While it is possible to use R with other versions of SQL Server using tools such as RODBC, the R Server and full integration of R in SQL Server is only available in SQL Server 2016.

Question: Do I need all R server and R client and Microsoft R Open to be installed to run R from SQL Server?

To run R on SQL Server, R Server needs to be installed. For more information on how to install R on SQL Server please see this link. To connect to the server and use the SCALR functionality, the client machine will need to have R Client installed.

Question: What are the benefits of Microsoft Standalone R Server?

R Standalone Server is available to connect to data stored in HDFS and Teradata in addition to SQL Server. To incorporate data from other data sources and provide the ability to process more data than one has memory, it might make a lot of sense to deploy a standalone R Server.

Question: Are there any memory limitations in R when handling big data?

R is designed to run in memory, meaning if you have more data than memory you may run out of memory. When running using the SCALR functions, memory and disk are used to provide nearly limitless ability to process data.

Question: Can one call R from Python?

While it may be possible to create Python code which calls R, I am unaware of the syntax. This functionality would not be supported in the context of SQL Server.

Question: Can you use R charts in SSRS?

There are two different ways to incorporate R charts in SQL Server. Since a stored procedure can provide and R visualization, it is possible to use SSRS to call a stored procedure which would create a graphic to be rendered in SSRS. Also since Microsoft announced the incorporation of Power BI with SSRS, and Power BI provides the ability to create R graphics, incorporating Power BI which renders R would be another way to use charges in SSRS

Question: Any classes like “R for the nervous DBA” that you can recommend? I have a statistician who is running R on my SQL server.

I do not know of any R classes specifically designed for SQL Server DBAs, but I am working on a post for additional configuration information which you might find helpful. Please check back in a few weeks for updates.

Question: What is a Pirate’s favorite language?

RRRRRR Mad Props to Phillip for sending me a pirate joke. I love pirate jokes. I feel compelled to tell a few when I am presenting R and I have been told all of mine are really bad.

If you have any additional questions about SQL Server and R or any pirate jokes, please comment on this post or send me a message via twitter @desertislesql.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Monitoring Reports for R on SQL Server 2016

Whenever I talk about R, in the question and answer period I get a lot of questions from DBAs on how to monitor the impact of running R code on SQL Server. Fortunately Microsoft recently created a series of custom reports for SQL Server which provide a lot of useful information on how R is running on SQL Server. The custom reports are available on github here in a folder called SSMS-Custom-Reports. There are 8 different reports: R Services – Active Sessions, R Services – Configuration, R Services – Configure Instance, R Services – Execution Statistics, R Services – Extended Events, R Services – Packages, R Services – Resource Usage, R Services – Script. If it has been a while since you have added a custom report to SQL Server, this link should help you get the reports installed.

R Services Reports

The titles for the reports provide a good indication of what they do. Active Sessions will show how many users currently have their compute context set to run on the server. Configuration shows the current R Configuration, which is very useful information for checking on the R components on the server.

rconfigrpt

The report R Services – Configure Instance is for configuring R for the first time. If you have R setup already, chances are you won’t need this report, as it is really not a report but a script you need to run to configure R. R Services – Execution Statistics is a really neat way of monitoring the ongoing use of R. It keeps track of every time R is run on the server. If you run an R job then refresh the report, the execution count and time will go up.

sqlserverrrpt

R Services – Extended Events is also not a report but a list of all the extended events that are available for R Services. This is a handy bit of information, which can be a great reference tool for extended events monitoring. R Services – Packages lists the packages which are currently installed on SQL Server. When people write R, many lot of different packages are used within the script. Prior to running a package, check the information on this report to ensure the libraries used are installed on SQL Server. If the library is missing the code will not work. R Services – Resource Usage is a great way to see how R has been configured to run on the server. Notice I have created an External Pool for R. This is a configuration recommended by Microsoft to better monitor your R Services.

resourceusagerpt

The last report R Services – TSQL Script is a bit of a mystery. The screen contains this line of text. This is the T-SQL script for generating the data in report: No Report . I don’t know what was the intent of the report. The Readme.md in github provides this documentation about the reports.
SSMS Custom Reports

Custom reports for SQL Server R Services built using SQL Server Report Builder. The reports can be added to SQL Server Management Studio to view various configuration and runtime information about R scripts execution in SQL Server.

That doesn’t provide any clues to me as to what the intent of the report is, so you may want to skip installing it altogether. If anyone does figure out how this report can be used, please drop me a line as I am curious. I was really excited when I found these reports and I think that they provide a lot of valuable insight for monitoring R on SQL Server 2016.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Machine Learning with R

While there are a number of different Applications designed to implement Machine Learning, such as Azure Machine Learning, Matlab and Octave, a specific package to perform Machine Learning is not required.  The algorithms used to generate machine learning experiments, can be applied in other languages, such as R.

Machine Learning Algorithms

machinelearningLearning is often described as a method of applying rules to situations. “Don’t put your finger on the stove.  The stove is hot and will burn you”.  A child can extrapolate this to irons, fire and other hot things after being told about stoves.  Computers process learning a little differently, by applying rules or algorithms to data to determine a result.  A great example of this was the Kaggle competition to determine from looking at a picture, which picture was a cat, and which picture was a dog. The computer reviewed a number of different pictures where there was a label on the picture, indicating that it was a cat or a dog and applied those rules where the pictures were not labeled.  The winning algorithm was right 98.914% on identifying dogs and cats.  Sorting pictures into groups is a classification function, one of the common functions used in Machine Learning. Other popular functions include anomaly detection, regression and clustering.  Once experiments are created, there are a number of different methods used to determine their effectiveness, such as the Receiver Operating Characteristic [ROC] graphs or a Confusion Matrix.

Algorithm Determination

Often times determining which algorithm to use can take a while.  Here is a pretty good flowchart for determining which algorithm should be used given some examples of what the desired outcomes and data contain. The diagram lists the algorithms, which are implemented in Azure ML.  The same algorithms can be implemented in R.  In R there are libraries to help with nearly every task.  Here’s a list of libraries and their accompanying links which can be used in Machine Learning.  This list is no means comprehensive as there are libraries and functions other than the ones listed here, but if you are trying to write a Machine Learning Experiment in R, and are looking at the flowchart, these R functions and Libraries will provide the tools to do the types of Machine Learning Analysis listed.

Drawing ROC CurvesROCR

Anomaly Detection

Regression

There is a really good list of all of the R regression functions here

Clustering

Binary Classification

Multi-Class Classification

 

Applied Machine Learning

Hopefully this list of R libraries will help you apply machine learning to data within R. To see how R can be used in Machine Learning, please join me on my upcoming webinar on Machine Learning with R and SQL Server 2016  where I will show how an R program can be created and applied to a production environment.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Memory Usage in SQL Server for R

While R is an open source language, there are a number of different versions of R and each handles memory a little differently. Knowing which version is being used is important, especially when the code is going to be migrated to a server. As part of a SQL Server implementation, there are three different versions of R which come into play. The first is standard open source R, commonly known as CRAN R. This is the standard open source version of R which runs code in memory and is single threaded. The next version which will be installed as part of a SQL Server Installation is Microsoft R Open. This version of R was written to take advantage of the Intel Math Kernel Libraries [MLK]. Using the libraries speeds up many statistical calculations which use matrix operations. It also adds multi-threading capability to R as the rewrite provides the ability to use all available cores and processors and process in parallel. More information on how it works and how much faster Microsoft R Open is compared to standard R is available here. To use Microsoft R Open, once it is installed, in Rstudio should automatically start using it. To check out what version of R that is in use, within R Studio, go to Tools->Global Options and look at the R version.

rstudioversion

R studio here is using Microsoft R Open. To take a look at the version in Visual Studio which has R Tools installed, go to Rtools and Options. Microsoft R Open is open source and can be used by anyone to speed up the execution of their R code. The version of R which is implemented within SQL Server is not this version of R. The R server version is proprietary, and Microsoft is not giving it way for free, as to run it on a server you need to purchase either SQL Server 2016 or R Standalone. That version is R Server. There is full compatibility on all of the versions of R. Code written in either CRAN R or Microsoft R Open will work in R Server. To write code for R Server, the R Client needs to be installed. The Visual Studio Screen shows this version of R installed on a PC which also is running SQL Server 2016 Developer edition.

vsr

R and Memory Consumption

One of R’s strengths and weaknesses is the fact that R runs in memory. This is good thing because it means R is very fast. It is not such a good thing when you need more memory than your machine has. When reading through the product information surrounding Revolution Analytics’ version of R [Revo R] which Microsoft bought, one of the features is the ability of this version of R to be able to use not only memory but disk, by rxapplying chunking logic. The code is broken into smaller pieces, executed then put back together. Using Revo R, it’s possible to do this by using the specifically designed functions which all start with rx which implement this functionality, known as ScaleR. This does not mean that all R code running on SQL Server will be using these functions. Chances are most of it won’t, especially if the R code is being migrated from the existing data science team. Prior to being bought by Microsoft, Revolution Analytics charged money for their product. If your organization didn’t have a license, chances are no one was writing rx Functions. For people who don’t know anything about R, and have been asked to create packages to run R, there is an easy way to tell if the code has been written to use the chunking. If there are no words in the code that start with rx, like rxSummary, rxCube or rxHistogram to name a few, the code will not use chunking.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur