If you have been reading my Blog Series on Data Factory, you will notice that I didn’t talk about what to do when you have errors, until now. Data Factory is different from most other programming languages which you may be familiar with, such as C#, Java, R, SSIS or VB. In those languages one can step through the code and look at the variables or values while the code is running. perform any of the standard set of troubleshooting techniques which are used in other languages to help resolve errors. Data Factory doesn’t provide a way to really determine what is going on internally when the code is running. Debugging Data Factory harkens back to day when people lined up punch cards and waited to get the output at the other side.
Data Factory Error Codes
Unfortunately, while developing Data Factory I became very familiar with errors. All of the errors show up at the end and provide very little insight as to what in the process failed. Here’s an example.
Database operation failed on server ‘Sink:DBName01.database.windows.net’ with SQL Error Number ‘40197’. Error message from database execution : The service has encountered an error processing your request. Please try again. Error code 4815. A severe error occurred on the current command. The results, if any, should be discarded.
I didn’t get any results to discard, so that helpful hint was not applicable, ever. I was able to resolve this error a few times by resolving different reasons why the error might have occurred by looking through the code and guessing at what line could possibly have caused the error. Generally speaking these errors were found after the first few things I tried did not work. Metaphorically, I had to wait for my punch cards to be read through the machine to see if I had correctly guessed what might be wrong. I have heard this process described as “black box”, but I think a more accurate description would be a punch card computer, as black box is too cool of a name for a process this heinous. In one instance, this error was received when the data in the field was longer than the field definition. That took a while to find since I had over 25 fields to review. Another time I got this same error when I had a typo in a field name, making the message appear completely arbitrary.
Data Factory Error 1000
Batch Execution failed. The response from the Machine Learning service at endpoint…(excluded specific job reference codes here)… {“Exception”:{“ErrorId”:”LibraryException”,”ErrorCode”:”1000″,”ExceptionType”:”ModuleException”,”Message”:”Error 1000: TLC library exception: Exception of type ‘Microsoft.Numerics.AFxLibraryException’ was thrown
If you search on the internets, you will notice that Error 1000 comes up a lot. The reason for that is it is a catch-all error number. There are 999 actual error messages coded and anything that isn’t one of those errors is error 1000. This problem could be anything. In my case it was because my web service wanted me to strongly type the data coming in, instead of having the data default to text, which worked outside of the web service. I did not resolve my problem after reading the error message and coming up with a logical guess based on my code. I had to create an Official support ticket to answer this question as the answer eluded me.
Data Factory Troubleshooting Error Strategies
Regrettably, there is no really good way of resolving errors. It’s not possible to look at anything in process and see how the data is being processed. Instead errors pop up when the execution fails. There are a few things you can do though. If you are calling an Azure Machine Learning Web service, I recommend modifying your input data so that if it looks like a number but it is read in as text, to use the meta data editor to type the data. Make sure you test the batch execution prior to loading it in Data Factory. For other pipelines, I employed a binary sort to determine what field might have the error. I commented out half the fields, and half the fields until I could determine what field had an error. Eventually I figured out the field which must be short. I also decreased the size of data using the code to 3 rows so I wouldn’t have to wait so long for it to fail.
Data Factory Series
This post is the end of my five part series on Azure Data Factory. I hope you have found it both interesting and useful when trying to learn Data Factory. If you have found this series interesting, please subscribe to my blog to be notified of the latest topics. Given that I plan on doing a lot of speaking in the very near future on topics such as R, SQL Server 2016 and Power BI, those topics are going to be on my blog in the very near future.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur