About two years ago, because I kept on reading that R was the language for analyzing data, I signed up for an online class in R. I took the class for four weeks, and then I got busy and couldn’t finish it. Now maybe they would have gotten around to the good parts if I had stayed longer, but in those four weeks I had no idea why you would want to use R. The classes had various projects to load data up into memory and analyze it, which I did thinking all the while “I can do in Excel, so why is this cool?”. I didn’t see anything that showed my why R was the tool for analyzing data. A little while later, I heard someone talking about visualizing data in R. Now I saw why people were so excited about it. This video from Revolution Analytics, which Microsoft bought last year, shows some of the cool visualizations you can do with R. Since Microsoft is including R support in SQL Server 2016, now might be a good time to start learning it.
Starting R from a SQL Perspective
Since R support in SQL Server 2016 will be available in the preview for CTP3, now might a good time to start learning it. When I wrote this post, CTP3 was not available, which is why all of my samples are created in the open source free tool, R Studio (go ahead click on the link and download it then come back). The application generally uses four quadrants. The top left contains the code editor, below that is the console where you see the results of the code you run. Like SSMS, you can run what you highlight, which may or may not be everything in the code editor. R Studio sort of reminds me of PowerShell, as R has good help and doesn’t always tell you if your command contains an error or not. The top right contains the workspace, including data which has been loaded, and the bottom right contains tabs for showing your graphics or the help files. R includes you data to play with to get started, just type data() to see the data sets, so you don’t have to go fish for data sets on the internet.
Visualizations in R
If this was the first example I saw with R, I would have better understood why people are paying more for R developers than C# developers as it doesn’t take much to get started to chart your data. You do have to load the libraries for what you want to load. I chose to load a selection of BMI data from gapminder for my sample. I have a simple text file which contains Country, year and BMI. First I am going to reference the libraries beeswarm and ggplot2, then load the data and assign it to a variable. After that, I will call the beeswarm function and have it plot the data, then provide a legend to see which country is which. After I have it loaded I am going to call the boxplot function to overlay the data. Not much code and a lot more satisfying than Hello World.
install.packages("beeswarm") install.packages("ggplot2") library(beeswarm) library(ggplot2) setwd("D:/files/code/r/WorkingDir") BMIFemale<-read.csv("d:/Files/code/r/BMIfemaleCountriesSelected.csv") #View(BMIFemale) beeswarm(BMIFemale$BMI ~ BMIFemale$Year ,data=BMIFemale, pch=20 ,ylab='BMI', xlab='Year', ,pwcol=(Country) ,labels=c(BMIFemale$Year) ) legend('topleft',legend =levels(BMIFemale$Country), title='Country', pch=20, col=1:50 ) boxplot(BMIFemale$BMI~BMIFemale$Year, data=BMIFemale, add=T , col="#0000ff18" )
Why should I use R ?
I hope this post provided a quick way of getting started with R. I will be devoting more space on this blog to provide other reasons why you might want to use R, especially when it comes to Machine Learning and explaining further R in relation to what you can do with SQL or other tools like Excel so you can better understand it’s place in the data environment. If you are interested in reading my upcoming posts on this topic, feel free to subscribe to my blog to get the updates.
Data aficionado et SQL Raconteur