Monitoring Reports for R on SQL Server 2016

Whenever I talk about R, in the question and answer period I get a lot of questions from DBAs on how to monitor the impact of running R code on SQL Server. Fortunately Microsoft recently created a series of custom reports for SQL Server which provide a lot of useful information on how R is running on SQL Server. The custom reports are available on github here in a folder called SSMS-Custom-Reports. There are 8 different reports: R Services – Active Sessions, R Services – Configuration, R Services – Configure Instance, R Services – Execution Statistics, R Services – Extended Events, R Services – Packages, R Services – Resource Usage, R Services – Script. If it has been a while since you have added a custom report to SQL Server, this link should help you get the reports installed.

R Services Reports

The titles for the reports provide a good indication of what they do. Active Sessions will show how many users currently have their compute context set to run on the server. Configuration shows the current R Configuration, which is very useful information for checking on the R components on the server.


The report R Services – Configure Instance is for configuring R for the first time. If you have R setup already, chances are you won’t need this report, as it is really not a report but a script you need to run to configure R. R Services – Execution Statistics is a really neat way of monitoring the ongoing use of R. It keeps track of every time R is run on the server. If you run an R job then refresh the report, the execution count and time will go up.


R Services – Extended Events is also not a report but a list of all the extended events that are available for R Services. This is a handy bit of information, which can be a great reference tool for extended events monitoring. R Services – Packages lists the packages which are currently installed on SQL Server. When people write R, many lot of different packages are used within the script. Prior to running a package, check the information on this report to ensure the libraries used are installed on SQL Server. If the library is missing the code will not work. R Services – Resource Usage is a great way to see how R has been configured to run on the server. Notice I have created an External Pool for R. This is a configuration recommended by Microsoft to better monitor your R Services.


The last report R Services – TSQL Script is a bit of a mystery. The screen contains this line of text. This is the T-SQL script for generating the data in report: No Report . I don’t know what was the intent of the report. The in github provides this documentation about the reports.
SSMS Custom Reports

Custom reports for SQL Server R Services built using SQL Server Report Builder. The reports can be added to SQL Server Management Studio to view various configuration and runtime information about R scripts execution in SQL Server.

That doesn’t provide any clues to me as to what the intent of the report is, so you may want to skip installing it altogether. If anyone does figure out how this report can be used, please drop me a line as I am curious. I was really excited when I found these reports and I think that they provide a lot of valuable insight for monitoring R on SQL Server 2016.


Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Machine Learning with R

While there are a number of different Applications designed to implement Machine Learning, such as Azure Machine Learning, Matlab and Octave, a specific package to perform Machine Learning is not required.  The algorithms used to generate machine learning experiments, can be applied in other languages, such as R.

Machine Learning Algorithms

machinelearningLearning is often described as a method of applying rules to situations. “Don’t put your finger on the stove.  The stove is hot and will burn you”.  A child can extrapolate this to irons, fire and other hot things after being told about stoves.  Computers process learning a little differently, by applying rules or algorithms to data to determine a result.  A great example of this was the Kaggle competition to determine from looking at a picture, which picture was a cat, and which picture was a dog. The computer reviewed a number of different pictures where there was a label on the picture, indicating that it was a cat or a dog and applied those rules where the pictures were not labeled.  The winning algorithm was right 98.914% on identifying dogs and cats.  Sorting pictures into groups is a classification function, one of the common functions used in Machine Learning. Other popular functions include anomaly detection, regression and clustering.  Once experiments are created, there are a number of different methods used to determine their effectiveness, such as the Receiver Operating Characteristic [ROC] graphs or a Confusion Matrix.

Algorithm Determination

Often times determining which algorithm to use can take a while.  Here is a pretty good flowchart for determining which algorithm should be used given some examples of what the desired outcomes and data contain. The diagram lists the algorithms, which are implemented in Azure ML.  The same algorithms can be implemented in R.  In R there are libraries to help with nearly every task.  Here’s a list of libraries and their accompanying links which can be used in Machine Learning.  This list is no means comprehensive as there are libraries and functions other than the ones listed here, but if you are trying to write a Machine Learning Experiment in R, and are looking at the flowchart, these R functions and Libraries will provide the tools to do the types of Machine Learning Analysis listed.

Drawing ROC CurvesROCR

Anomaly Detection


There is a really good list of all of the R regression functions here


Binary Classification

Multi-Class Classification


Applied Machine Learning

Hopefully this list of R libraries will help you apply machine learning to data within R. To see how R can be used in Machine Learning, please join me on my upcoming webinar on Machine Learning with R and SQL Server 2016  where I will show how an R program can be created and applied to a production environment.


Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Memory Usage in SQL Server for R

While R is an open source language, there are a number of different versions of R and each handles memory a little differently. Knowing which version is being used is important, especially when the code is going to be migrated to a server. As part of a SQL Server implementation, there are three different versions of R which come into play. The first is standard open source R, commonly known as CRAN R. This is the standard open source version of R which runs code in memory and is single threaded. The next version which will be installed as part of a SQL Server Installation is Microsoft R Open. This version of R was written to take advantage of the Intel Math Kernel Libraries [MLK]. Using the libraries speeds up many statistical calculations which use matrix operations. It also adds multi-threading capability to R as the rewrite provides the ability to use all available cores and processors and process in parallel. More information on how it works and how much faster Microsoft R Open is compared to standard R is available here. To use Microsoft R Open, once it is installed, in Rstudio should automatically start using it. To check out what version of R that is in use, within R Studio, go to Tools->Global Options and look at the R version.


R studio here is using Microsoft R Open. To take a look at the version in Visual Studio which has R Tools installed, go to Rtools and Options. Microsoft R Open is open source and can be used by anyone to speed up the execution of their R code. The version of R which is implemented within SQL Server is not this version of R. The R server version is proprietary, and Microsoft is not giving it way for free, as to run it on a server you need to purchase either SQL Server 2016 or R Standalone. That version is R Server. There is full compatibility on all of the versions of R. Code written in either CRAN R or Microsoft R Open will work in R Server. To write code for R Server, the R Client needs to be installed. The Visual Studio Screen shows this version of R installed on a PC which also is running SQL Server 2016 Developer edition.


R and Memory Consumption

One of R’s strengths and weaknesses is the fact that R runs in memory. This is good thing because it means R is very fast. It is not such a good thing when you need more memory than your machine has. When reading through the product information surrounding Revolution Analytics’ version of R [Revo R] which Microsoft bought, one of the features is the ability of this version of R to be able to use not only memory but disk, by rxapplying chunking logic. The code is broken into smaller pieces, executed then put back together. Using Revo R, it’s possible to do this by using the specifically designed functions which all start with rx which implement this functionality, known as ScaleR. This does not mean that all R code running on SQL Server will be using these functions. Chances are most of it won’t, especially if the R code is being migrated from the existing data science team. Prior to being bought by Microsoft, Revolution Analytics charged money for their product. If your organization didn’t have a license, chances are no one was writing rx Functions. For people who don’t know anything about R, and have been asked to create packages to run R, there is an easy way to tell if the code has been written to use the chunking. If there are no words in the code that start with rx, like rxSummary, rxCube or rxHistogram to name a few, the code will not use chunking.


Yours Always

Ginger Grant

Data aficionado et SQL Raconteur



Power BI Auditing

Auditing is one of the recent features in Power BI Preview which can be a great tool for monitoring what is going on in Power BI. There were a number of people who asked for this feature on the Power BI Ideas site, which shows that Microsoft is responding to what is posted there. When I read about auditing, I thought that it was a Power BI feature. While that is true, the scope of auditing is much bigger than Power BI. Auditing is designed for Office 365. Yes, you can turn it on for Power BI, as well as for SharePoint and a number of other Office applications. Since auditing is really an Office 365 tool, one has to be an Office 365 admin to be able to be able to access the tool, as it is turned on from within the Office 365 tenant. To be able to audit anything, you need to wait 24 hours until after it has been turned on to receive any data. It doesn’t go back in time, which is unfortunate as usually you figure out that you want an audit after something happens, and then it is too late. Best to just turn it on now. If you have free users and Pro Users, you will notice that you only get auditing for the pro users.

Accessing the Power BI Audit Log

Clicking on the gear in the upper right corner of the Power BI Service will take you to the Admin portal. If you select audit logs, the screen has a button on it which opens Office 365 Admin Center. Clicking on the search box will show all of the applications which have audit logs. Scroll down to Power BI activities.


As you can see by looking at the available Power BI options, there are a number of options to choose from. If you select the top item PowerBI activities, then everything gets selected. After doing that click outside of the menu for the menu to go away. Select a date and time range of your choosing, select specific users if you wish, then click on the Search button. Depending on how big your date range is, this may take some time to load. Once you see the results, you have the ability to filter as well.


The screen starts responding as soon as you start typing. I got as far as Del, and the screen changed to show only the items Deleted.It’s a pretty neat feature and a simple way to monitor what is going on in Power BI.


I am going to be talking about more methods to administer Power BI at IT/Dev Connections. If you are attending, please come by and say hello. I would love the chance to meet more people who read my blog. For those of you who are not in Vegas this week, I will be posting more information on Power BI and R to provide supplemental material for my sessions. Check back later in the week or please subscribe to my blog for the highlights.


Yours Always

Ginger Grant

Data aficionado et SQL Raconteur


Analyzing JSON in USQL

In USQL there are built-in extractors for parsing text, comma delimited or tab delimined files. Once again, parsing JSON becomes problematic. There is a solution built into USQL, write some C# code to extend it or use someone else’s C# code to extend USQL. Since I wanted to parse JSON, fortunately there are libraries available on github containing the information required to do it. Download the github package and open up the Microsoft.Analytics.Samples project in Visual Studio. When I did this the first time, there was a problem loading the Newtonsoft.Json reference, so I right clicked on the references and downloaded the missing parts again. Build the solution and check out the code in the directory …Examples\DataFormats\Microsoft.Analytics.Samples.Formats\bin\Debug\ . There will be two DLLs, Microsoft.Analytics.Samples.Formats.dll and Newtonsoft.Json.dll. These dlls then need to be registered in Data Lake Analytics and locally if you chose to run your USQL locally. As at some point the goal is to run from within Data Lake analytics, you will need to copy both of these dlls to the data lake. I created a folder for the dlls called Assemblies, and ran this command

USE DATABASE [master];
CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Newtonsoft.Json.dll";
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"Assemblies/Microsoft.Analytics.Samples.Formats.dll";

Notice I told the USQL where to find the dlls, in the Assemblies folder. This step only needs to be completed once per data lake. After this job successfully runs, then the dlls which allow the JSON to be parsed, can be referenced.

Here is my sample JSON, which I have copied to the folder Samples/Data/TestNew.Json, in the Data Lake

"appInstanceId": "357ced1e-cf05-459c-9317-794bq24f61c2",
"firmwareVersion": "",
"serialNumber": "254542-694967",
"Side": "0",
"Latitude": "33.8848744",
"Longitude": "-128.403276",
"GeneratedDate": "2016-10-04T21:18:19Z"

Now that I have added the JSON to the Data Lake and the assemblies have been added, I can write some USQL to Parse the JSON. First I will need to reference the libraries, then create a schema, as there is no schema for a Data Lake. After those steps are completed, it’s possible to write SQL to query a JSON file. There is no UI to look at the results, so the results will be writing to a file. I am going to output the data to a csv file called JSONOutput.csv. Here’s the code to do that.

REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @infile string="/Samples/Data/TestNew.json";

@logSchema =
EXTRACT name string
, appInstanceId string
, firmwareVersion string
, serialNumber string
, Side string
, Latitude float
, Longitude float
FROM @infile
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@testthis = SELECT appInstanceId
, COUNT(*) AS LocationCount
FROM @logSchema
GROUP BY appInstanceId;

OUTPUT @testthis
TO "/Samples/Data/JSONoutput.csv"
USING Outputters.Csv();


Using Visual Studio, I am running the USQL Job. There isn’t much data to parse, and you can see in the summary widows that it took 21 seconds to prepare, and 33 seconds to run.

When go to the web and look at the Data Lake Analytics page, I can also see that the job completed. I have noticed that this appears pretty close to the same time on the web and on visual studio.


Clicking on the bar graph represented by today will allow me to select the job which ran, showing the same screen as appears in Visual Studio.

Thanks to Erik Zwiefel and Mark Vaillancourt b | t both of Microsoft for helping me figure out the process to use JSON in Data Lake Analytics, as I didn’t understand the steps which are required to parse JSON. I hope this blog makes it possible for you to figure out how to make it work.



Yours Always

Ginger Grant

Data aficionado et SQL Raconteur



Using Visual Studio for USQL Data Analytics Jobs

The pricing for USQL is based upon how many Analytic Units and Completed Jobs. To decrease the amount of money being spent, it would be most efficient if only completed jobs ran on SQL, not the 27 times the job was run to debug it. Fortunately, all of the debugging can be performed locally and only working jobs need to be run. Another thing that you may notice if you are exclusively using the Azure Portal for running Data Lake Analytic jobs is there is no way to actually save a job. Once the job is completed, you can review the job then click on the View Script button. Don’t rely on the button though, because for reasons unknown, sometimes the View Script Button is not enabled, meaning that it is not possible to see what ran.

Data Lake Analytics Setup for Visual Studio

There are a few steps required before any code is run. If the Data Lake Analytics Tools are not installed within Visual Studio, download them here and install them. When the tools are installed, the menu item Data Lake appears in Visual studio. The second step is to model your PC with the same file structure as your data lake. The default location which the Data Lake tools will look for your data structure is C:\Users\<<insertyourname>>\AppData\Local\USQLDataRoot . What this means is if you have folders and subfolders created in your data lake, your PC needs to have the same structure, including the data.

Running Data Lake Jobs Locally

If you take a look at the screen picture of Visual Studio with the data lake installed, you will notice a series of buttons at the top of the screen. The middle button currently is set to (Local). The drop down box at the top of the top of the screen will allow you to set the job to either your Azure Data Lake Analytics instance or locally. If it is running locally there will be no charges incurred on Azure. Also in Visual Studio, of course you can save the name of the USQL file.



When the context is switched to the Data Lake Analytics instance from Azure, you will probably want to check out the Summit button. There is only one option, Advanced. In this window, you can change the Job name. It is default set to the name of the script being run, but if you are running the same script over and over, you may wish to change this name so that the different instances can be identified. Parallelism can also be set to the value that is actually being used in the job. Take a look at the job view, which is the tab to the left of Script. This screen shows the processes in use when the job is run and set the value accordingly. You will be charged for the Parallelism value that is set, not the amount actually used. Setting to a lower value can decrease the cost of running a job.

The tab on the far left, shows the job with the same view shown in the Azure Portal Stream Analytics job. That screen is shown below.

usqlscreenRunning on Visual Studio also has the benefit of less changing between screens than the Azure Portal, which is another reason to develop here. Now that I have this environment set up, I plan on writing all of my Data Lake Analytics jobs here, as I find the development environment works better for me. Let me know what you think of it by commenting below. If you are interested in finding out more about running Data Lake Analytics Jobs, especially if you are trying to parse JSON, please subscribe to my blog as that topic will be in an upcoming post.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Event Hub Troubleshooting

When creating an Azure Event Hub, chances are there will be no errors. This is not always a good thing, as it may mean that the errors exist but do not appear. Maybe the event hub is sending data, but the data cannot be read by a stream analytics job. Maybe the event hub really is working, but nothing appears in the dashboard. If any of these problems sound familiar, this post should help.

Testing the Event Hub

If you don’t have a source of data, like a raspberry pi or a sensor sending data, you can use this guide to create a C# program to send data to your event hub. Chances are though, this code is going to have to be modified even more than the instructions indicate, because the data sent is not in JSON. While it is not a requirement that data sent to the event hub be in JSON, if you want to read it with stream analytics that is one of the acceptable formats needed. If you are using the code provided and you want to insert a record into a database field input01, the message needs to be changed to the following to add the double-quotes and brackets required by JSON.
var message = "{'input01':\"" + Guid.NewGuid().ToString()   + "\"}";

Validating the Event Hubs Receive Messages

To ensure that the event hub is actually receiving data, validation can only be done in the old Azure portal. The service bus icon is two down from the HDInsight elephant. Double-clicking on the service bus namespace will bring up the a list of event hubs. Double-clicking on that will show this screen. This screen picture was taken roughly at 7:10. How many messages are there at 7:00? None.


This screen print was taken at 7:17. Notice anything different about the message count at 7:00?


Oh look, there are 144 Messages which came in between 7:00 and 7:05. This means that everything really was working, I just needed to wait to see them appear. The wait time tends to vary from 10-20 minutes. Perhaps nothing is wrong. Lucky if this is you as you can stop reading

Stream Analytics with Event Hubs

If you are using an event hub to pass data to a stream analytics job, step one, make sure the stream analytics job is started. Created does not mean started as it should say Running as shown in the clip below.

The input for this stream is set to an event hub which has a standard subscription. The basic subscription, which is of course cheaper, has one default consumer group. With a standard subscription multiple consumer groups can be created and more importantly named. When setting up the inputs there is a blank for the name of the consumer group. If you have a basic subscription this will be empty. If it is empty, then the event hub won’t pass data to the stream analytics job. Perhaps there is a way to get a basic event hub to work with a stream analytics job, but I couldn’t make it happen. When I created an event hub with a standard subscription and created a consumer group and added that name to the input of a streaming analytics job, it worked.

If you have found these troubleshooting tips helpful, please subscribe to my blog, as I will be passing along more tips in my next post which will detail the steps of how to get data from the event hub to a Azure Database.


Yours Always

Ginger Grant

Data aficionado et SQL Raconteur



Group Workspaces and File Sharing in Power BI

When working on Power BI Reports, much of the time the task is spread among multiple people. If this is the case, you probably will also want to provide limited access to the PBIX files used in your Power BI reports. It would also be nice to have the ability to check out a file to insure that one person doesn’t over-write the changes of someone else. It would be nice to track changes that have been made to the files too. Based on this description, it may sound like the next topic will be implementing source control software, like TFS or Subversion. Instead, all you need to do is use the Group Workspaces functionality with Power BI.

Report Development Teams needs Power BI Group Workspaces

Power BI is licensed by the individual. This means Power BI reports created within a My Workspace belong to that person. This can be a problem if say that person goes on vacation or quits. If the reports are all contained within one person’s account and that individual’s account is deleted when they separate from the company, what do you think happen to all of the reports? Answer they are gone too. Maybe a call to Microsoft Power BI Support will be able to help if that occurs, or maybe not. It’s would be better to create the ability within the Power BI environment where the company owns the company reports. Even if the Power BI reports are all created by one person, I would create a workgroup. After all that one person might want to go on vacation sometime, and for disaster recovery reasons someone else needs to be able to access the reports Of course by that I mean, create a Workspace within Power BI, and add other people to it. When you create a workspace within Power BI, not only do you get a location where multiple people can visualize the same reports, a Group One Drive is also created where PBIX files can be shared within the group.

Source Control with Power BI Reports

PowerBIGroupWorkSpaceMany of the functionality people associate with source control programs live inside the group one drive which is created for Power BI. Looking at the picture of the group screen, which was created when a Power BI Workspace was created, you will see that this group contains 7 members and four files. The members of this groups are the only ones who have access to the files. The file AcmeThree.pbix is selected, and clicking on the ellipse(…) brings up a menu for the file. Notice one of them is Check Out. If I check out a file, the icon next to the name changes, providing a visual queue to all who wish to edit the file that it is being working on. The menu option for me would change to Check In, providing the ability to check the file in to the directory, allowing others to check out the file and work on it. Notice Version History also exists. This feature allows previous versions of the file to be loaded, which means that changes made to a file can be rolled back.

Using One Drive with Workgroups provides a convenient way for groups of users to work on reports, ensuring that the reports are stored in a secure location and can be access by multiple people within the organization. If you are looking for way for a team of people to collaboratively work on reports, create a Group Workspace in Power BI then use the One Group functionality to provide a secure method for report development.


Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

U-SQL and Azure Data Lake Analytics


There are a number of different SQL Flavors–HQL, PL/SQL, MySQL, U-SQL, T-SQL — all of which are a derivative of Ansi-SQL, which is I suppose in today’s parlance, A-SQL. Many people have not heard of U-SQL, which Microsoft introduced on September 28, 2015. Since the announcement was in the Visual Studio Blog, a number of data people may have missed it. U-SQL is meant to combine the ease SQL with the functionality of C# to create a language which can process any kind of data, like videos or text, by creating the ability to customize the code and infinitely scale. This is very useful if for example all of the data is stored in an Azure Data Lake.

Using U-SQL in Azure Data Lake Analytics

In my previous series on Stream Analytics, I wrote some U-SQL. That U-SQL didn’t look much different than Ansi-SQL, which is sort of the point of porting the functionality to a different yet familiar language. Another application which heavily uses U-SQL is Azure Data Lake. Data Lake stores its data in HDInsight, but you don’t need to write hive to query the data, as U-SQL will do it. Like Hive, U-SQL can be used to create a schema on top of some data, and then query it.

For example, to write a query on this csv file stored in a Data Lake, I would need to create the data definition for the data, then I could easily write a statement to query it.


@searchlog =
EXTRACT SaleDate string,
SaleLocation string,
Lemon   int,
Orange     int,
Temperature     int,
Leaflets int,
Price                      string
FROM "Samples/Data/Popsicle.tsv"
USING Extractors.Tsv();

@testthis = SELECT SaleLocation
, COUNT(*) AS LocationCount
FROM @searchlog
GROUP BY SaleLocation;

OUTPUT @testthis
TO "Samples/Data/Output/SaleLocCount.csv"
USING Outputters.Csv();

In this U-SQL code, I am creating a structure for the data, querying some fields, and writing the output to another file. Make sure that you don’t forget the semi-colons as that will cause errors. Also if any of your fields are blank you will have to code for that as well. From with Data Lake Analytics, the U-SQL is run as a job, creating a new file. Note the time that it took to finish the job.



The reason data is stored in a Data Lake is to provide a single storage location for the data, which will be used in analytics. U-SQL provides a powerful tool for getting the data out.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur


Data Factory – Troubleshooting Errors

IBM PunchCard computer reader

Punch card and punch card reading machine

If you have been reading my Blog Series on Data Factory, you will notice that I didn’t talk about what to do when you have errors, until now. Data Factory is different from most other programming languages which you may be familiar with, such as C#, Java, R, SSIS or VB. In those languages one can step through the code and look at the variables or values while the code is running. perform any of the standard set of troubleshooting techniques which are used in other languages to help resolve errors. Data Factory doesn’t provide a way to really determine what is going on internally when the code is running. Debugging Data Factory harkens back to day when people lined up punch cards and waited to get the output at the other side.

Data Factory Error Codes

Unfortunately, while developing Data Factory I became very familiar with errors. All of the errors show up at the end and provide very little insight as to what in the process failed. Here’s an example.

Database operation failed on server ‘’ with SQL Error Number ‘40197’. Error message from database execution : The service has encountered an error processing your request. Please try again. Error code 4815. A severe error occurred on the current command. The results, if any, should be discarded.

I didn’t get any results to discard, so that helpful hint was not applicable, ever. I was able to resolve this error a few times by resolving different reasons why the error might have occurred by looking through the code and guessing at what line could possibly have caused the error. Generally speaking these errors were found after the first few things I tried did not work. Metaphorically, I had to wait for my punch cards to be read through the machine to see if I had correctly guessed what might be wrong. I have heard this process described as “black box”, but I think a more accurate description would be a punch card computer, as black box is too cool of a name for a process this heinous. In one instance, this error was received when the data in the field was longer than the field definition. That took a while to find since I had over 25 fields to review. Another time I got this same error when I had a typo in a field name, making the message appear completely arbitrary.

Data Factory Error 1000

Batch Execution failed. The response from the Machine Learning service at endpoint…(excluded specific job reference codes here)… {“Exception”:{“ErrorId”:”LibraryException”,”ErrorCode”:”1000″,”ExceptionType”:”ModuleException”,”Message”:”Error 1000: TLC library exception: Exception of type ‘Microsoft.Numerics.AFxLibraryException’ was thrown

If you search on the internets, you will notice that Error 1000 comes up a lot. The reason for that is it is a catch-all error number. There are 999 actual error messages coded and anything that isn’t one of those errors is error 1000. This problem could be anything. In my case it was because my web service wanted me to strongly type the data coming in, instead of having the data default to text, which worked outside of the web service. I did not resolve my problem after reading the error message and coming up with a logical guess based on my code. I had to create an Official support ticket to answer this question as the answer eluded me.

Data Factory Troubleshooting Error Strategies

Regrettably, there is no really good way of resolving errors. It’s not possible to look at anything in process and see how the data is being processed. Instead errors pop up when the execution fails. There are a few things you can do though. If you are calling an Azure Machine Learning Web service, I recommend modifying your input data so that if it looks like a number but it is read in as text, to use the meta data editor to type the data. Make sure you test the batch execution prior to loading it in Data Factory. For other pipelines, I employed a binary sort to determine what field might have the error. I commented out half the fields, and half the fields until I could determine what field had an error. Eventually I figured out the field which must be short. I also decreased the size of data using the code to 3 rows so I wouldn’t have to wait so long for it to fail.

Data Factory Series

This post is the end of my five part series on Azure Data Factory. I hope you have found it both interesting and useful when trying to learn Data Factory. If you have found this series interesting, please subscribe to my blog to be notified of the latest topics. Given that I plan on doing a lot of speaking in the very near future on topics such as R, SQL Server 2016 and Power BI, those topics are going to be on my blog in the very near future.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur