Articles for the Month of May 2016

Using R in SQL Server 2016

As SQL Server 2016 will be released soon and to follow up on the talk I gave at SQL Saturday Atlanta, this post will guide you through the steps needed to make R work. Like many features of SQL Server 2016, R is optionally installed.  To install R make sure that when you install R, the Option for R Services (In-Database) is checked, as shown below.

SQLServer2016_InstallR

Microsoft is incorporating the version of R sold by Revolution Analytics, which they called R Server. This version, while fully compatible with Open Source R, has some additional features which allow the R code to be run not only in memory, but use of a chunking technology to swap the data to disk so that the R code will not run out of memory. The commands to use this additional functionality all start with rx and are part of the proprietary ScaleR feature set. To use the R Server as a separate stand along product, instead of selecting the R Server in database option, select the R Server Standalone shared features. A R server could be useful if you want to perform large scale data analysis on a Hadoop Cluster, or other Non-SQL database like Teradata.

SQL Server Steps to Enable R

SQL Server 2016 is installed, especially if it was installed by others, you may be wondering if the R service really is installed. Take a look at the services running on the machine with SQL Server. If the SQL Server Launchpad service is running, as shown below, the R services are installed and running.  ServicesRunningR

The last thing needed to run R is to configure and restart the SQL Server Services. In a new query type the following command
sp_configure 'external scripts enabled', 1
GO
Reconfigure
GO

After restarting the SQL Server Service, SQL Server will now run R code. The following text can be run on any SQL Server 2016 instance to see if R has been configured correctly

EXEC sp_execute_external_script @language =N'R',
@script=N'OutputDataSet <-InputDataSet',
@input_data_1 =N'SELECT 1 as CheckToSeeIfRIsWorking'
WITH result sets (([CheckToSeeIfRIsWorking] int not null));
GO

The code is executed as an external script, specifying that the language used should be R. @script contains the R code, which is a simple command to take the mean of the data coming from the InputDataSet. @Input_Data_1 contains the location of the data to be processed. Of course the R code could of course be more complicated, but this code example is generic enough to test for everyone to ensure R is set up properly on SQL Server 2016.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Time to Use Custom Visualizations in Power BI

After my previous blog post, and subsequent complaints to several people on the Power BI Product Team, I am really happy to be writing a blog encouraging everyone to use Custom Visuals. Power BI no longer has any horrible error messages for custom visuals which prevented me from using Custom Visuals in the Past. Look at the fish with no error messages!CustomVisualizations

Warnings For Custom Visualizations Go Away

Now since Power BI Custom Visualizations are not provided by Microsoft, they feel compelled to give you a warning message letting users know this. Here is the message box you get in Power BI Desktop when PowerBIWarningusing a custom visualization. Notice that I clicked on the check box next to the text Don’t show this dialog again. As Words mean things, checking this box means the warning message never appears again. When you import the visualization into Power BI, no warning messages. Now I can use and propose custom visualizations to clients because they really are neat, and now they contain no warnings. Thanks so much to the Power BI Product team for fixing this major issue.

SQLSaturdayAtlanta-521This visualization update makes me somewhat sad that I am not talking about Power BI at SQL Saturday Atlanta, but I am really excited to be talking about R and connecting with many people in the SQL Server Community. If you are in Atlanta, I hope to see you tomorrow.

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

3 Best Practice Steps for Implementing a Power BI Project

Power BI is a powerful tool with a rapidly evolving set of features. While it’s great to have more features, it is hard to keep up with the best ways to use them all, and some of the information out there regarding best practices have changed with the new releases. If you are planning on implementing a Power BI project where you work, following these steps will  minimize implementation and development issues while saving time. The steps are listed in order needed for creating a new Power BI Project, but if you have a Power BI implementation already, incorporating these steps will provide a more scalable, easy to maintain system over time.

Step 1 – Create a Group Workspace

1BlockOne person should never own a report. One person cannot be responsible for the reports 100% of the time as they may want to go on vacation or do something other than modify a report right now. The reports should belong to a group of people or accounts within the organization which will give the reports the appropriate status of organizational assets, not personal items. If the group owns the workspace, then the progress on the work any report can be monitored and if one person gets busy another person has the ability to help out.  When creating a workspace a group One Drive is also created. The pbix files used by the group should be stored in the group One Drive, which will allow others to edit the report over time. I have seen it take up to 30 minutes for the  workspace to be created, which means this step needs to be done in advance of when it is needed.

Using workspaces with Power BI does mean that you will need to have an Office 365 Exchange license for each Power BI user, because they create groups which are technically an Exchange feature, not a Power BI feature. This does not mean that you need to do anything with your current Exchange servers, you just need the license so that you will have the ability to use the group One Drive. If you do not have an exchange license, the group drive will not be accessible to all users. Updates will then consist of replacing the reports and datasets, which can adversely impact the dashboards relying upon them.

Step 2 – Select and Implement a Gateway

2BlockReports are based on data which needs to be updated to reflect what is current. Power BI offers two methods of updating the data the Personal Gateway and the Enterprise Gateway. If you select an Enterprise Gateway, you may not need to create a data model, as you may rely upon one already created in your organization, which is why this step must be completed at the beginning of the project.

If the data required for Power BI requires a data model is going to be created, the type of security used to access the data needs to be determined.  While both the Enterprise Gateway and the Personal Gateway can be used to handle data mashups, meaning for example you can combine an Excel file with some data from various SQL Server databases, the difference is in the security model. Enterprise Gateways need to be setup by someone who understands the underlying database security model as and passes the users credentials to the source, inheriting the existing individuals data security. A Personal Gateway will use the security of the person who creates it. A Personal Gateway is generally created based on a data model from a report.

Step 3 – Create a Content Pack

3BlockContent Packs can be used to ensure that the data used is coming from the gateway setup in the previous step, and any other reporting standards which you wish to implement are also included. Using the Power BI desktop, create a simple working report using the data refreshed in the gateway created in the previous step. This report may not be anything that you want to release to users, but a report which you can use to test the gateway created earlier to ensure that all of the data connection and refresh issues are resolved prior to spending a lot of time working on a series of complex reports. Load this file to the Power BI Service and use it to create a content pack. By providing a working report model for everyone in the workspace can use as a guide for how to start a report, many issues regarding how to get access to data and multiple sources to the same data can be eliminated. If you have multiple gateways, each should have a content pack. Demonstrate how to use the content pack to the people in the workgroup, and then everyone can create reports where the data will be refreshed.

Follow these Steps, Eliminate Problems

These best practice steps can eliminate many problems which arise when people start working on Power BI visualizations. Going back to my previous post, following these steps addresses the important components of Security and Business Continuity first, before any reports are even created. Addressing these components first can prevent many issues which tend to happen when different models are employed.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

The Five Key Components for a Successful Data Analysis Project

As a consultant, I have been a part of some very successful implementations of Data Analysis Projects. I have also been brought in to fix implementations which didn’t go so well. Being on a number of different projects, certain common components emerge. When these things are addressed early in the project, things go well. These items really cannot be put off to the end or ignored completely. These items are not related to the software, used in data analysis, as no matter what the tool selected, the solution will be incomplete if the following five areas are not addressed.

5 Components Every Project Should Include

Security

Business Continuity

Reliability

Distribution

Management Direction

Each of these items are important components of a successful Data Analytics Management practice. Problems with any of them can move a project from successful to a failed project.

Security

Security is an obvious consideration which needs to be addressed up front. Data is a very valuable commodity and

Data is the New Gold

Data is the New Gold

only people with appropriate access should be allowed to see it. What steps are going to be employed to ensure that happens? How much administration is going to be required to implement it? These questions need to be answered up front.

Business Continuity

Business Continuity means the solution cannot be on the shoulders of one person, as that can be a risky situation. One person needs a break to go on vacation or not work, and needs a backup who is skilled and able to understand the system and run it alone. This can be a really big problem, especially for smaller organizations, who have only relied on one person. I have been brought in to assist companies who until very recently thought that they had a very successful Data Analytics Platform. The problem with it was that there was only one person who had the skill to maintain it, and that person quit.

Business Continuity can be a specific problem for Power BI users, as often times one user owns a report. Reports for an organization should never be owned by one person. All companies using Power BI should have the reports in a series of group workspaces, not belonging to any single person. Otherwise, if the person writing the report quits and their account is deleted, the reports are not then deleted as well.

Reliability

Reliability is critical, because who cares about a data analysis project if no one believes the data is correct? In addition to accuracy, the system used needs to be reliable, containing data updates on a scheduled basis. How and when is the data going to be updated? How is that schedule communicated? The answers to these questions need to be addressed at the beginning of the project.

I remember working for one client who had over a 100 million dollar loss in a month on a visualization we created. I asked if the data was correct as that was a huge one month loss. I was assured that the data was not correct, but no one knew how to resolve the data issue. The reporting tool, whatever it happens to be, is not the place where data is fixed, it should reflect the contents source data. Where this rule is not followed, the reports are ignored as the data is suspect as no one knows why it should be believed as doesn’t match the source system. How is the source system data going to be fixed? This is often times a management issue as people need to be appropriately incentivize to fix things.

Management Direction

All data analysis needs Management Direction to set priorities. As there are only so many hours in a day, the important items need to be identified so that they can be addressed. What is important? Everything cannot be a number one priority as that means nothing is. In many data analytics projects, someone wants a dashboard. Great idea. What numbers show whether or not the company is currently successful? In most companies where I am helping them create data analysis project, the answer to what are the Key Performance Indicators [KPIs] is; no one has made up their mind yet. Management needs to provide the direction for the KPIs.

Distribution

How are people going to get their hands on whatever it is that you just created? What are people looking for? Reports in their email? Visualizations on their phones? What do people want? Only if you ask the question do you know if you are providing the data in a way people want to consume them. In order to pick the most appropriate tool or design visualizations people are actually going to use, these questions need to be asked up front. Recently I worked for a client who had selected Tableau as there reporting solution, but they were getting rid of it. Why? The users wanted to do adhoc analysis in Excel, so they were using Tableau not to visualize their data or do ad-hoc analysis, but to select data for Excel Pivot Tables. A lot of money and time would have been saved if the question of how the users wanted to use the data was asked up front.

Hopefully all of your data analysis project include these components. In today’s environment, data is the new gold. This valuable commodity needs a system which is Reliable, Secure, important to Management, which can be distributed to continually provide value to the organization.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur