While R is an open source language, there are a number of different versions of R and each handles memory a little differently. Knowing which version is being used is important, especially when the code is going to be migrated to a server. As part of a SQL Server implementation, there are three different versions of R which come into play. The first is standard open source R, commonly known as CRAN R. This is the standard open source version of R which runs code in memory and is single threaded. The next version which will be installed as part of a SQL Server Installation is Microsoft R Open. This version of R was written to take advantage of the Intel Math Kernel Libraries [MLK]. Using the libraries speeds up many statistical calculations which use matrix operations. It also adds multi-threading capability to R as the rewrite provides the ability to use all available cores and processors and process in parallel. More information on how it works and how much faster Microsoft R Open is compared to standard R is available here. To use Microsoft R Open, once it is installed, in Rstudio should automatically start using it. To check out what version of R that is in use, within R Studio, go to Tools->Global Options and look at the R version.
R studio here is using Microsoft R Open. To take a look at the version in Visual Studio which has R Tools installed, go to Rtools and Options. Microsoft R Open is open source and can be used by anyone to speed up the execution of their R code. The version of R which is implemented within SQL Server is not this version of R. The R server version is proprietary, and Microsoft is not giving it way for free, as to run it on a server you need to purchase either SQL Server 2016 or R Standalone. That version is R Server. There is full compatibility on all of the versions of R. Code written in either CRAN R or Microsoft R Open will work in R Server. To write code for R Server, the R Client needs to be installed. The Visual Studio Screen shows this version of R installed on a PC which also is running SQL Server 2016 Developer edition.
R and Memory Consumption
One of R’s strengths and weaknesses is the fact that R runs in memory. This is good thing because it means R is very fast. It is not such a good thing when you need more memory than your machine has. When reading through the product information surrounding Revolution Analytics’ version of R [Revo R] which Microsoft bought, one of the features is the ability of this version of R to be able to use not only memory but disk, by applying chunking logic. The code is broken into smaller pieces, executed then put back together. Using Revo R, it’s possible to do this by using the specifically designed functions which all start with rx which implement this functionality, known as ScaleR. This does not mean that all R code running on SQL Server will be using these functions. Chances are most of it won’t, especially if the R code is being migrated from the existing data science team. Prior to being bought by Microsoft, Revolution Analytics charged money for their product. If your organization didn’t have a license, chances are no one was writing rx Functions. For people who don’t know anything about R, and have been asked to create packages to run R, there is an easy way to tell if the code has been written to use the chunking. If there are no words in the code that start with rx, like rxSummary, rxCube or rxHistogram to name a few, the code will not use chunking.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur
Pingback: SQL Server R Services Memory Usage – Curated SQL