For those unfamiliar with Azure Data Factory, it is the application in Azure which moves data and calls processes to use the data. In a lot of ways this makes data factory a lot like SSIS. In addition to moving data around, data factory also contains all of the monitoring tools found in an SSMS job. If you add those things together you get a pretty good picture of what data factory does.
Setting up Data Factory in Azure
Data Factory is a resource which lives in a Resource Group in Azure. To add it to the list of available resources, click on the plus button in the Azure portal and search for Data Factory. Select the icon and this window will appear. Click on the create button at the bottom of the screen to enter the setup information. First pick a name for the service, then select from the dropdown box the appropriate subscription from the options you have available. Like everything else in Azure the services exist within in a container, either pick an existing container to use, or create a new one. Generally speaking I will create a new one so if for any reason later on I want to destroy what I have done, I can just delete the container. Pick the location based on two factors, Azure Data Factory is not available everywhere so you are limited to use only the ones where it is available. If you pick one where it isn’t available, you will get an error message letting you know why you cannot create the resource. Whenever possible within Azure to pick the same resource where your data lives. There are charges within Azure if you migrate data across resources and no charge if you stay in the same resource. You may want to go look at where the data lives which will be used in Data Factory before deciding where to put it. I always check the Pin to Dashboard option so that I can find the resource later, but it is not required and can be done later. Click on the create button to create a Data Factory Resource. If you have selected Pin to Dashboard you will see a little window which says Deploying Data Factory. This little window goes away once Data Factory is completed, and you will have an entry in the list of resources for Data Factory.
Data Factory Tiles
Clicking on the Data Factory resource icon will bring up the following window. Initially the only button you will care about is the one on the far left Author and Deploy. This is where you will create the Data Factory Components. The authoring is all done in JSON. It is not the most intuitive language in the world and is very picky about brackets. There is some intellisense, and when you click on a bracket, it will highlight the bracket which it lines up with, which is somewhat helpful. In my experience, JSON can be frustrating, as it is rather picky. I don’t think my experience is unique because Microsoft has created two titles which write the JSON components for you, Copy Data and Sample Pipelines. The Copy data function has a wizard to walk through gathering the data required to write the JSON. I have had mixed success in getting it to work. Sometimes it did, and sometimes I got an error. It is in preview, so I imaging over time the performance will improve. Sample pipelines has a few samples of tasks you might want to complete that can be deployed in your environment. Diagram view is very important as it opens a window on the right hand side of the screen showing you a picture of what has been created in JSON. Monitor and Manage brings up an entirely new tab, which is welcomed by those of us who are tired of sliding to the right to see new windows. This screen shows a diagram and what has run. Metrics and Operations open a window showing what has run.
I hope you found this post handy for creating an Azure Data Factory resource. In a later post I will discuss what you can do with Azure Data Factory and provide more detail around the Contents section of this screen and discuss Datasets, Pipelines and Linked Services. Please subscribe to my blog to be notified of my latest posts.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur