Articles

Presentation Follow up to Data Quality Issues? There is a Service for that

PASSBIThank you to all of you who were able to attend my latest presentation. I received a number of great questions during the presentation and wanted to make sure that I was able to provide answers. I also wanted to say thank you for the people who provided feedback in the question and answer section as that was very thoughtful and most appreciated.

Questions and Answers

I assume DQS is included with the existing SQL Server license? Is DQS included with enterprise edition? Can you provide some idea of what the licensing costs are? What are the licensing cost for Master Data Services?

Because I don’t work for Microsoft, I hesitate to answer any but the most basic questions regarding licensing, but I can say DQS and MDS are included in the licensing costs for SQL Server. Neither are available in the standard edition either. For more information about what is included, see here http://technet.microsoft.com/en-us/library/cc645993#Other_Components or contact Microsoft directly.

Do you know of any testing of scalability that would provide some insight into the scalability of DQS?

The best place I can point you to for more information on DQS scalability is here http://www.microsoft.com/en-us/download/details.aspx?id=29075

It seems that DQS appears similar to Fuzzy Lookups and Fuzzy Grouping. Is that pretty much what DQS is using?

While the logic employed in the SSIS components Fuzzy Lookup and Fuzzy Grouping are similar, but not functionally equivalent. DQS uses logic much closer to that of Fuzzy Grouping, as both apply a rules to a set of data and set similarity thresholds to determine success. Where they differ is in DQS’ use and development of the knowledge base to continually improve the accuracy.

How do you see DQS and MDS working together? What gets done where? How does DQS integrate with MDS? Do you use MDS to fix the data?

One of Microsoft’s definition of Master Data Management is “we define Master Data Management (MDM) as the technology, tools, and processes required to create and maintain consistent and accurate lists of master data.” MDS contains a framework for identifying and implementing workflows, security, business rules and auditing to the business data. DQS is part of that process that can be used by the MDS product to accomplish these tasks. To best illustrate how they work together, I’d recommend downloading their DQS and MDS tutorial. At a high level MDS enforces rules for cleaning the data and DQS can be implemented as a part of that process. For example before adding data to MDS, you can use DQS to clean the data prior to evaluation, but it’s use isn’t required.

In your SSIS demo, what additional steps would you recommend to have the data steward process the invalid data so that it can be reprocessed under SSIS?

Ideally, when data fails a quality process, the data should be placed in a review table so that it can be reviewed by the data steward. Having a person resolve the issues and add them to the knowledgebase will continually improve the knowledgebase over time.

When building a knowledgebase within DQS, can you include multiple columns for evaluation within one table?

DQS is not limited to evaluating one column at a time. You can use combinations of columns within a table to determine what the correct values are.

 

If you missed my session, a recording is available here , so please feel free to review the video when you have a chance.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

DQS – DIY Guide to Getting Started with Data Quality Services

SQLServerDBListPower BI and SQL Server 2012 (and beyond) both have components Microsoft developed to shift the focus of Business Intelligence onto business users. Since there are more people who use Excel than have eaten at McDonald’s, you may know about Power BI and the data features in Excel, but have you heard of Data Quality Services [DQS]? If not, well now you have. To get started, if you don’t have Microsoft’s developer version of SQL Server 2012 or later, or access to an enterprise or BI server somewhere, you will need $59.95 to get a copy of the Developer Edition SQL Server 2014. Interestingly Microsoft won’t let you buy the Developer Edition SQL Server 2012, which I found out when tried earlier in the year. Unfortunately, although Visual Studio is now free, you still have to pay for the Developer Edition of SQL Server.

After you have installed SQL Server and selected that you wanted Data Quality Services, one would assume you had installed it. While that sounds like sterling logic, it is not correct. Here’s how to tell if you have DQS. Open up SSMS on your computer and look at the list of databases like I did here. Do you see any databases here which start with DQS? No. That is because it hasn’t been installed yet. It sure looks like it is if you look at the sql installer, which I have included below. I added the red boxes to highlight the fact that I really did select Data Quality Services when I installed.

If you don’t have the Data Quality Services and Data Quality Client installed in SQL Server like they are here, you will need to do that first, but this is only the first step. Once the install screen looks like the one pictured above, you need to go to the Data Quality Services folder in SQL Server and select the SQL Server Data Quality Server Installer. After this package is run, which takes a while, you will finally get a screen that lets you know the installation is finally completed.

DQSInstallSuccessfulScreenAfter DQS Server has installed, you will see that 3 databases have been added: DQS_Main, DQS_Projects and DQS_Staging_Data. Once these three databases are installed, you can then start using the DQS Client.

DQSDBList

The DQS client does not need to be installed on a server. Since I highly doubt most places will want their business users to be directly accessing their Server, most of the time it will not be installed on the server.

Once your environment is set up, it’s time to start using it. For more information on how to use DQS, please listen to my presentation on the PASS BI virtual chapter on November26. If you can’t make it, generally speaking it will be available on PASS BI’s You Tube Channel after about a week. I sincerely hope you can make it. Let me know what you think of my presentation by posting feeback to my blog.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Data Quality, there is an App for that

On Wednesday, January 8, I will be giving a presentation of an interesting and potentially handy new feature of SQL Server 2012, Data Quality Services [DQS].  I discovered this interesting tool when I was studying for my 70-463 exam, which I did pretty well on and will be discussing in a post coming soon to a screen near you.  This tool is included as part of the standard release of SQL Server 2012.  It’s an application that given it’s usefulness will have a wide implementation in the future.

I will demonstrate how to use this tool in conjunction with third party reference data from the Azure Marketplace to validate some sample address information.  In order to assimilate ever increasing volumes of data, new applications and tools have been created to process it, but the underlying problem still remains.  The data needs to be complete and accurate.  While sometimes it is possible to improve your methods of gathering the data by say tweaking the input phone number field so that you only get valid number in a format you expect, such modifications are not always possible.  There are times where the data providers may benefit by sending more data, not more accurate data, as they are paid to deliver quantity and it is up to the recipient of the data to determine the quality. 

Microsoft has created two tools which they designed to improve the quality of collected data.  The first tool, which was released as part of SQL Server 2008 is Master Data Management.  I have yet to see or hear of anyone who is using this application, and outside of testing I have not used it either.  Master Data Management was designed to assist organizations by providing a framework which transfers ownership of data quality to subject matter experts who know the correct values.

In SQL Server 2012, Data Quality Services was added as a way to fix the data.  As part of the ongoing push to put Business Intelligence into the hands of users and away from IT, DQS has a very user friendly interface designed to develop a set of rules which are used to clean data.  Once the rules have been validated, you can take this knowledge and put it into an SSIS job to automate cleansing of larger quantities of data.  

As this is the first version of DQS, there are some things about it which are a little awkward.  An example of this is the two-step process required to install DQS.  When you install SQL Server 2012, and select DQS as an installation component it appears to be installed, but it isn’t.  After the SQL Server 2012 application is installed, then you have to run DQSInstaller.exe.  One of the other limitations of DQS is it is a bit on the slow side.  Microsoft doesn’t recommend it for millions of records, just a few thousand.  Even with these limitations, DQS can still provide a great deal of useful functionality which can be an important part of ensuring your production data is complete and correct.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur