Mar 13 2017

Security Updates to Power BI

Author: Ginger Grant • Discussion: 1 Comment

Office 365 Admin Screen for granting Power BI Admin rights

In the past month, Microsoft has made a number of security changes to Power BI. The first one, is not really a feature update, but a PowerShell replacement. No longer do you need to use PowerShell to become a Power BI Admin. Any Office 365 Admin can grant Power BI Admin permissions via this screen in the Admin Center. The Power BI Admin role was first created in October, but the screen was not complete, which was just fixed in February.

Power BI Security Changed from Tenant Only

People who have been granted Power BI administrator rights will also notice a modification to the Admin screen. The March 2017 update to Power BI provides a major change to the security model in Power BI. Previously all the security settings were set at the Tenant Level, meaning that all the privileges were granted to all users. If I wanted to allow one group within the organization to be able to publish reports to the web, but I did not want to allow everyone to publish reports to the web, there was no way that this could be accomplished. All that has changed. It is now possible to include or exclude groups of users from having rights in Power BI. Users can be classified into security groups in Azure Active Directory, either through the Office 365 Admin Center or via the Azure AD Admin Center. Once created the security groups can be used in Power BI. Security Groups are not the same thing as the groups created in Power BI when a new work group is created.

Using Security Groups in Power BI Admin

Power BI Admin Portal

The new Power BI Screen looks different. It now lists which rights can be specified to different groups of users. Share content to external users, Export Data, Export reports as PowerPoint presentations, Printing dashboards and reports, Content pack publishing, and Use Analyze in Excel with on-premises datasets now have the ability to be assigned to security groups so that the rights do not have to be the same throughout the entire tenant.

Unfortunately, some of the permissions are still tenant based. For example, the setting Publish to web, which is one permission I would definitely like to turn on only for some users, is still only available as a tenant level option. These security changes are a welcome improvement to the product as they provide more options for administrators to grant rights to Power BI.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Oct 06 2016

Analyzing JSON in U-SQL

Author: Ginger Grant • Discussion: 1 Comment

In USQL there are built-in extractors for parsing text, comma delimited or tab delimined files. Once again, parsing JSON becomes problematic. There is a solution built into USQL, write some C# code to extend it or use someone else’s C# code to extend USQL. Since I wanted to parse JSON, fortunately there are libraries available on github containing the information required to do it. Download the github package and open up the Microsoft.Analytics.Samples project in Visual Studio. When I did this the first time, there was a problem loading the Newtonsoft.Json reference, so I right clicked on the references and downloaded the missing parts again. Build the solution and check out the code in the directory …Examples\DataFormats\Microsoft.Analytics.Samples.Formats\bin\Debug\ . There will be two DLLs, Microsoft.Analytics.Samples.Formats.dll and Newtonsoft.Json.dll. These dlls then need to be registered in Data Lake Analytics and locally if you chose to run your USQL locally. As at some point the goal is to run from within Data Lake analytics, you will need to copy both of these dlls to the data lake. I created a folder for the dlls called Assemblies, and ran this command

USE DATABASE [master]; CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Newtonsoft.Json.dll"; CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"Assemblies/Microsoft.Analytics.Samples.Formats.dll";

Notice I told the USQL where to find the dlls, in the Assemblies folder. This step only needs to be completed once per data lake. After this job successfully runs, then the dlls which allow the JSON to be parsed, can be referenced.

Here is my sample JSON, which I have copied to the folder Samples/Data/TestNew.Json, in the Data Lake
{ "appInstanceId": "357ced1e-cf05-459c-9317-794bq24f61c2", "firmwareVersion": "1.0.2.4", "serialNumber": "254542-694967", "Side": "0", "Latitude": "33.8848744", "Longitude": "-128.403276", "GeneratedDate": "2016-10-04T21:18:19Z" }

Now that I have added the JSON to the Data Lake and the assemblies have been added, I can write some USQL to Parse the JSON. First I will need to reference the libraries, then create a schema, as there is no schema for a Data Lake. After those steps are completed, it’s possible to write SQL to query a JSON file. There is no UI to look at the results, so the results will be writing to a file. I am going to output the data to a csv file called JSONOutput.csv. Here’s the code to do that.
REFERENCE ASSEMBLY [Newtonsoft.Json]; REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @infile string="/Samples/Data/TestNew.json";

@logSchema = EXTRACT name string , appInstanceId string , firmwareVersion string , serialNumber string , Side string , Latitude float , Longitude float FROM @infile USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@testthis = SELECT appInstanceId , COUNT(*) AS LocationCount FROM @logSchema GROUP BY appInstanceId;

OUTPUT @testthis TO "/Samples/Data/JSONoutput.csv" USING Outputters.Csv();

Using Visual Studio, I am running the USQL Job. There isn’t much data to parse, and you can see in the summary widows that it took 21 seconds to prepare, and 33 seconds to run.

When go to the web and look at the Data Lake Analytics page, I can also see that the job completed. I have noticed that this appears pretty close to the same time on the web and on visual studio.

Clicking on the bar graph represented by today will allow me to select the job which ran, showing the same screen as appears in Visual Studio.

Thanks to Erik Zwiefel and Mark Vaillancourt b | t both of Microsoft for helping me figure out the process to use JSON in Data Lake Analytics, as I didn’t understand the steps which are required to parse JSON. I hope this blog makes it possible for you to figure out how to make it work.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Oct 05 2016

Using Visual Studio for U-SQL Data Analytics Jobs

Author: Ginger Grant • Discussion: 1 Comment

The pricing for USQL is based upon how many Analytic Units and Completed Jobs. To decrease the amount of money being spent, it would be most efficient if only completed jobs ran on SQL, not the 27 times the job was run to debug it. Fortunately, all of the debugging can be performed locally and only working jobs need to be run. Another thing that you may notice if you are exclusively using the Azure Portal for running Data Lake Analytic jobs is there is no way to actually save a job. Once the job is completed, you can review the job then click on the View Script button. Don’t rely on the button though, because for reasons unknown, sometimes the View Script Button is not enabled, meaning that it is not possible to see what ran.

Data Lake Analytics Setup for Visual Studio

There are a few steps required before any code is run. If the Data Lake Analytics Tools are not installed within Visual Studio, download them here and install them. When the tools are installed, the menu item Data Lake appears in Visual studio. The second step is to model your PC with the same file structure as your data lake. The default location which the Data Lake tools will look for your data structure is C:\Users\<<insertyourname>>\AppData\Local\USQLDataRoot . What this means is if you have folders and subfolders created in your data lake, your PC needs to have the same structure, including the data.

Running Data Lake Jobs Locally

If you take a look at the screen picture of Visual Studio with the data lake installed, you will notice a series of buttons at the top of the screen. The middle button currently is set to (Local). The drop down box at the top of the top of the screen will allow you to set the job to either your Azure Data Lake Analytics instance or locally. If it is running locally there will be no charges incurred on Azure. Also in Visual Studio, of course you can save the name of the USQL file.

When the context is switched to the Data Lake Analytics instance from Azure, you will probably want to check out the Summit button. There is only one option, Advanced. In this window, you can change the Job name. It is default set to the name of the script being run, but if you are running the same script over and over, you may wish to change this name so that the different instances can be identified. Parallelism can also be set to the value that is actually being used in the job. Take a look at the job view, which is the tab to the left of Script. This screen shows the processes in use when the job is run and set the value accordingly. You will be charged for the Parallelism value that is set, not the amount actually used. Setting to a lower value can decrease the cost of running a job.

The tab on the far left, shows the job with the same view shown in the Azure Portal Stream Analytics job. That screen is shown below.

Running on Visual Studio also has the benefit of less changing between screens than the Azure Portal, which is another reason to develop here. Now that I have this environment set up, I plan on writing all of my Data Lake Analytics jobs here, as I find the development environment works better for me. Let me know what you think of it by commenting below. If you are interested in finding out more about running Data Lake Analytics Jobs, especially if you are trying to parse JSON, please subscribe to my blog as that topic will be in an upcoming post.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Aug 29 2016

U-SQL and Azure Data Lake Analytics

Author: Ginger Grant • Discussion: 1 Comment

There are a number of different SQL Flavors–HQL, PL/SQL, MySQL, U-SQL, T-SQL — all of which are a derivative of Ansi-SQL, which is I suppose in today’s parlance, A-SQL. Many people have not heard of U-SQL, which Microsoft introduced on September 28, 2015. Since the announcement was in the Visual Studio Blog, a number of data people may have missed it. U-SQL is meant to combine the ease SQL with the functionality of C# to create a language which can process any kind of data, like videos or text, by creating the ability to customize the code and infinitely scale. This is very useful if for example all of the data is stored in an Azure Data Lake.

Using U-SQL in Azure Data Lake Analytics

In my previous series on Stream Analytics, I wrote some U-SQL. That U-SQL didn’t look much different than Ansi-SQL, which is sort of the point of porting the functionality to a different yet familiar language. Another application which heavily uses U-SQL is Azure Data Lake. Data Lake stores its data in HDInsight, but you don’t need to write hive to query the data, as U-SQL will do it. Like Hive, U-SQL can be used to create a schema on top of some data, and then query it.

For example, to write a query on this csv file stored in a Data Lake, I would need to create the data definition for the data, then I could easily write a statement to query it.

@searchlog = EXTRACT SaleDate string, SaleLocation string, Lemon int, Orange int, Temperature int, Leaflets int, Price string FROM "Samples/Data/Popsicle.tsv" USING Extractors.Tsv();

@testthis = SELECT SaleLocation , COUNT(*) AS LocationCount FROM @searchlog GROUP BY SaleLocation;
OUTPUT @testthis TO "Samples/Data/Output/SaleLocCount.csv" USING Outputters.Csv();

In this U-SQL code, I am creating a structure for the data, querying some fields, and writing the output to another file. Make sure that you don’t forget the semi-colons as that will cause errors. Also if any of your fields are blank you will have to code for that as well. From with Data Lake Analytics, the U-SQL is run as a job, creating a new file. Note the time that it took to finish the job.

The reason data is stored in a Data Lake is to provide a single storage location for the data, which will be used in analytics. U-SQL provides a powerful tool for getting the data out.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Desert Isle SQL

Just a three hour tour…

Data Lake Analytics

Security Updates to Power BI

Power BI Security Changed from Tenant Only

Using Security Groups in Power BI Admin

Analyzing JSON in U-SQL

Using Visual Studio for U-SQL Data Analytics Jobs

Data Lake Analytics Setup for Visual Studio

Running Data Lake Jobs Locally

U-SQL and Azure Data Lake Analytics

Using U-SQL in Azure Data Lake Analytics