Articles for the Month of November 2014

DQS – DIY Guide to Getting Started with Data Quality Services

SQLServerDBListPower BI and SQL Server 2012 (and beyond) both have components Microsoft developed to shift the focus of Business Intelligence onto business users. Since there are more people who use Excel than have eaten at McDonald’s, you may know about Power BI and the data features in Excel, but have you heard of Data Quality Services [DQS]? If not, well now you have. To get started, if you don’t have Microsoft’s developer version of SQL Server 2012 or later, or access to an enterprise or BI server somewhere, you will need $59.95 to get a copy of the Developer Edition SQL Server 2014. Interestingly Microsoft won’t let you buy the Developer Edition SQL Server 2012, which I found out when tried earlier in the year. Unfortunately, although Visual Studio is now free, you still have to pay for the Developer Edition of SQL Server.

After you have installed SQL Server and selected that you wanted Data Quality Services, one would assume you had installed it. While that sounds like sterling logic, it is not correct. Here’s how to tell if you have DQS. Open up SSMS on your computer and look at the list of databases like I did here. Do you see any databases here which start with DQS? No. That is because it hasn’t been installed yet. It sure looks like it is if you look at the sql installer, which I have included below. I added the red boxes to highlight the fact that I really did select Data Quality Services when I installed.

If you don’t have the Data Quality Services and Data Quality Client installed in SQL Server like they are here, you will need to do that first, but this is only the first step. Once the install screen looks like the one pictured above, you need to go to the Data Quality Services folder in SQL Server and select the SQL Server Data Quality Server Installer. After this package is run, which takes a while, you will finally get a screen that lets you know the installation is finally completed.

DQSInstallSuccessfulScreenAfter DQS Server has installed, you will see that 3 databases have been added: DQS_Main, DQS_Projects and DQS_Staging_Data. Once these three databases are installed, you can then start using the DQS Client.

DQSDBList

The DQS client does not need to be installed on a server. Since I highly doubt most places will want their business users to be directly accessing their Server, most of the time it will not be installed on the server.

Once your environment is set up, it’s time to start using it. For more information on how to use DQS, please listen to my presentation on the PASS BI virtual chapter on November26. If you can’t make it, generally speaking it will be available on PASS BI’s You Tube Channel after about a week. I sincerely hope you can make it. Let me know what you think of my presentation by posting feeback to my blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Balanced Data Distributor and Other Features in the Pack

BBDRunFullSometimes the basics aren’t enough.  Car manufactures know this, that’s why they have a base model and one with all the options.  Let’s face it, most of the time you want all the options.  Well, with SQL Server you can get the options and you don’t have to spend hours debating the cost of clear coat. But I digress. If you want all the options, you can have them, right after you install SQL Server, and I am not talking about the cool codeplex stuff, which I will have to bring up another time. The genuine Microsoft list of add-ons are available for you at no additional cost. When Microsoft releases SQL Server, just because you installed it, doesn’t mean you have everything.  For versions 2008R2, 2012, 2012 SP1 as well as 2014 all have Feature Packs. Looking at the SQL Server 2014 Feature Pack List , there are a number of things which are about as useful as heated seats in the desert, like the DB2 or SAP drivers, which is probably why they are not automatically included as part of the release. When looking at the list, there is a very good chance many SSIS developers may be interested in some of them.  For example, wouldn’t you want the option for the more powerful and fuel efficient motor?  I know I would. There is an item in the feature pack which provides that feature, the Balanced Data Distributor.

Threading the Memory since 2011

Balanced Data Distributor (or BDD for short) was first released as a new SSIS transform for 2008 and 2008 R2. It was designed to take advanced of multi-threading hardware capabilities by spreading the data load so that the data can be broken into chunks and processed at the same time, rather than serially starting at the beginning and loading a stream until it ends.  This means you can go much faster on a single tank of gas, which is awesome. As you might imagine, processing more data at one time, decreases the time needed to process it.  This is really usefully if you are using blocking transforms like a row-by-row script component or a Sort or Aggregate transform, which require SSIS to look at every single row before processing.  Another situation where you might find it useful is if you have to write to a really slow output.

DIY of BDD –  Divide and Conquer

What the BDD does is divide the copies into multiple threads. How many is determined by the developer because while you will not need to configure it, you will need to add code to tell it how many times to divide the task. Let me show you what I mean through a series of SSDT screen shots.

BBDfull

This screen shows that I have a Data Flow task where I am reading in 121,317 rows then doing an Aggregate and a Sort, both of which are blocking transforms. This is just meant to be an example as with 121,317 records, you probably won’t see that much of a performance improvement, but you get the idea.

In this screen shot you can see that I have added the BDD task, which I’ve highlighted in the SSIS toolbox so you will see where it shows up when the component is installed. You will also see that I copied the code so that the same tasks from the aggregate on appear twice. What happens when this version is run?

BBDRun2

Check out the outputs underneath the BDD component. The number of records was split, but it isn’t an even split.  There are 62,337 records on one side and 58,980 records on the other side.  The record counts in the output is determined by the component as the optimal number based on the available threads.  You configure nothing, just drag it onto the screen. Pretty cool, isn’t it?

This example shows how easy it is to speed up SSIS processing without a huge amount of effort, allowing you to drive laps of code with speeds you may not of thought possible with the help of a free download. Have fun and let me know what kind of performance gains you see.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur