Sometimes the basics aren’t enough. Car manufactures know this, that’s why they have a base model and one with all the options. Let’s face it, most of the time you want all the options. Well, with SQL Server you can get the options and you don’t have to spend hours debating the cost of clear coat. But I digress. If you want all the options, you can have them, right after you install SQL Server, and I am not talking about the cool codeplex stuff, which I will have to bring up another time. The genuine Microsoft list of add-ons are available for you at no additional cost. When Microsoft releases SQL Server, just because you installed it, doesn’t mean you have everything. For versions 2008R2, 2012, 2012 SP1 as well as 2014 all have Feature Packs. Looking at the SQL Server 2014 Feature Pack List , there are a number of things which are about as useful as heated seats in the desert, like the DB2 or SAP drivers, which is probably why they are not automatically included as part of the release. When looking at the list, there is a very good chance many SSIS developers may be interested in some of them. For example, wouldn’t you want the option for the more powerful and fuel efficient motor? I know I would. There is an item in the feature pack which provides that feature, the Balanced Data Distributor.
Threading the Memory since 2011
Balanced Data Distributor (or BDD for short) was first released as a new SSIS transform for 2008 and 2008 R2. It was designed to take advanced of multi-threading hardware capabilities by spreading the data load so that the data can be broken into chunks and processed at the same time, rather than serially starting at the beginning and loading a stream until it ends. This means you can go much faster on a single tank of gas, which is awesome. As you might imagine, processing more data at one time, decreases the time needed to process it. This is really usefully if you are using blocking transforms like a row-by-row script component or a Sort or Aggregate transform, which require SSIS to look at every single row before processing. Another situation where you might find it useful is if you have to write to a really slow output.
DIY of BDD – Divide and Conquer
What the BDD does is divide the copies into multiple threads. How many is determined by the developer because while you will not need to configure it, you will need to add code to tell it how many times to divide the task. Let me show you what I mean through a series of SSDT screen shots.
This screen shows that I have a Data Flow task where I am reading in 121,317 rows then doing an Aggregate and a Sort, both of which are blocking transforms. This is just meant to be an example as with 121,317 records, you probably won’t see that much of a performance improvement, but you get the idea.
In this screen shot you can see that I have added the BDD task, which I’ve highlighted in the SSIS toolbox so you will see where it shows up when the component is installed. You will also see that I copied the code so that the same tasks from the aggregate on appear twice. What happens when this version is run?
Check out the outputs underneath the BDD component. The number of records was split, but it isn’t an even split. There are 62,337 records on one side and 58,980 records on the other side. The record counts in the output is determined by the component as the optimal number based on the available threads. You configure nothing, just drag it onto the screen. Pretty cool, isn’t it?
This example shows how easy it is to speed up SSIS processing without a huge amount of effort, allowing you to drive laps of code with speeds you may not of thought possible with the help of a free download. Have fun and let me know what kind of performance gains you see.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur