Articles

Using Visual Studio for U-SQL Data Analytics Jobs

The pricing for USQL is based upon how many Analytic Units and Completed Jobs. To decrease the amount of money being spent, it would be most efficient if only completed jobs ran on SQL, not the 27 times the job was run to debug it. Fortunately, all of the debugging can be performed locally and only working jobs need to be run. Another thing that you may notice if you are exclusively using the Azure Portal for running Data Lake Analytic jobs is there is no way to actually save a job. Once the job is completed, you can review the job then click on the View Script button. Don’t rely on the button though, because for reasons unknown, sometimes the View Script Button is not enabled, meaning that it is not possible to see what ran.

Data Lake Analytics Setup for Visual Studio

There are a few steps required before any code is run. If the Data Lake Analytics Tools are not installed within Visual Studio, download them here and install them. When the tools are installed, the menu item Data Lake appears in Visual studio. The second step is to model your PC with the same file structure as your data lake. The default location which the Data Lake tools will look for your data structure is C:\Users\<<insertyourname>>\AppData\Local\USQLDataRoot . What this means is if you have folders and subfolders created in your data lake, your PC needs to have the same structure, including the data.

Running Data Lake Jobs Locally

If you take a look at the screen picture of Visual Studio with the data lake installed, you will notice a series of buttons at the top of the screen. The middle button currently is set to (Local). The drop down box at the top of the top of the screen will allow you to set the job to either your Azure Data Lake Analytics instance or locally. If it is running locally there will be no charges incurred on Azure. Also in Visual Studio, of course you can save the name of the USQL file.

vsdatalakescript

 

When the context is switched to the Data Lake Analytics instance from Azure, you will probably want to check out the Summit button. There is only one option, Advanced. In this window, you can change the Job name. It is default set to the name of the script being run, but if you are running the same script over and over, you may wish to change this name so that the different instances can be identified. Parallelism can also be set to the value that is actually being used in the job. Take a look at the job view, which is the tab to the left of Script. This screen shows the processes in use when the job is run and set the value accordingly. You will be charged for the Parallelism value that is set, not the amount actually used. Setting to a lower value can decrease the cost of running a job.

The tab on the far left, shows the job with the same view shown in the Azure Portal Stream Analytics job. That screen is shown below.

usqlscreenRunning on Visual Studio also has the benefit of less changing between screens than the Azure Portal, which is another reason to develop here. Now that I have this environment set up, I plan on writing all of my Data Lake Analytics jobs here, as I find the development environment works better for me. Let me know what you think of it by commenting below. If you are interested in finding out more about running Data Lake Analytics Jobs, especially if you are trying to parse JSON, please subscribe to my blog as that topic will be in an upcoming post.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur