Articles for the Month of September 2014

From the clouds, data – DIY Guide to using SSIS with Azure

My apologies to Jack Johnson for lifting his song title for this blog, but I couldn’t help it. I thought that it might be useful to discuss the How-Tos of data migration to the cloud, which reminded me of all the cloud songs I know. I always thought the cloud metaphor was sort of an odd name for a remote data center, but that is probably why I am not in the business of naming things. But I digress. Back to the topic at hand. Moving data from SSIS to databases hosted on Microsoft’s Azure cloud requires some different steps than just moving data around on premise. Since things on the cloud change quite quickly, I thought I would demonstrate currently what this looks like using SSDT in Visual Studio 2012.

Breaking through the Wall

Before getting started with SSDT, there are some configuration steps one needs to complete on Azure first. The Windows Azure SQL database firewall must first be set up to allow connections from your IP, or your data will never get to the cloud. There is an option on the database screen to “Connect to your database”. If the firewall is not opened on the IP and port in use, you will get this message.

AzureDBScreenConfigFirewall-message

Selecting the Yes is needed to be able to connect to the Azure database via SSIS.

Also one might want to validate that the right drivers are loaded on SSDT as well. If you are running via Visual Studio 2012 or 2013, no worries as the drivers are already there, but for earlier versions new drivers may be required. If one is planning on loading data to an Azure SQL Server database, the ODBC or ADO.Net are the connections needed for Azure. The old data connection standby, Ole-DB is going to be left in the toolbox like last year’s dress, as it won’t work for the cloud. Much like fashion, everything old is new again so ODBC is once again the “It” connection. You can use ADO.Net too, but I won’t be here.

The next step in the process is getting the connection information needed to connect to the Azure database. Microsoft made this step quite easy. Look on the Azure Database screen where I’ve pasted a pink arrow. Click there.

 AzureDBScreenFull-EnabledCropped

 

This makes it so easy as a screen pops up with all the connection information you need. You just need to copy the ODBC section, and remember what your password is as you will need to enter it.

AzureODBCConnectionInfo

While we are still in Azure, I thought it would be a good idea to display where the SSIS package we will be creating will be putting the data. Here’s the table structure where the data will be placed.

AzureDBScreenTable

 

Here’s the query screen showing that right now the table is empty.

AzureDBScreenTable-NoData

SSIS Package Transferring Data to Azure

After you have all of the information you need from Azure, it is a relatively simple thing to create an SSIS package, with an OLEDB connection for my on premise database and an ODBC data connection to Azure using the information copied from the Azure database connection screen to transfer data to my Azure Database.

AzureDBSSISRunning

Going back to Azure, you can see 19,972 rows were added.

AzureDBScreenTable-Data

One word of caution, as you see here in the progress log, adding data can be a very slow process.

AzureDBSSISProgress

I highlighted the Elapsed time in red so that it would be easy to see that a simple file transfer took over two minutes.

Location, Location

One thing which is important to consider is where you are going to be moving your data. I demonstrated what I think may be the more common scenario, where the data is not on the cloud, and you want to put it to the cloud. Microsoft refers this as Hybrid Data Movement. Of course this may not be the case. If you are running SQL Server on a Virtual Machine in the cloud it may make a lot more sense to run SSIS on that virtual machine. If that is the case, for optimal performance, locate the SSIS in a VM in the same data center as the database because otherwise, due to the bandwidth and network latency, it will be slower. When transmitting data around the cloud, whether it be from on premises to the cloud or from one server to another on the cloud, you might want to consider compressing the data prior to sending if at all possible to decrease the size of the data being transmitted. It may be faster to extract the data you want on premises and transmit a compressed file to be applied on the cloud server. This can get to be more complicated as it requires setting up an Secure FTP server to transmit the files, which then have to be applied. Natively SSIS doesn’t have a compression tool, but there are third party products, such as Task Factory, which will allow you to not only compress the output but send it to your VM via Secured FTP from within the SSIS package.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

SSIS Tuning – What Size is my row, your DIY Buffer Sizing Guide

When looking to improve the performance of an SSIS package, one of the common recommendations listed in many places is to modify your buffers so that you can load as much data into a single buffer size as possible. According to Microsoft, for optimal performance, SSIS should be configured for maximum memory utilization by having buffers with as many rows as possible without exceeding the internal 100 MB limit. Ok now that you know what Microsoft says about the topic, how does one put it in practice? I thought it might be nice to post the How-To as it is a little complicated.

Buffer Tuning Parameters

There are a couple of default settings that are used for loading data into memory and they all need to line up correctly to keep data from being sent from memory to disk. In order to set the size appropriately, it is best to apply the Price Is Right Axiom.

DefaultMaxBufferSize – the default is 10 MB. The maximum size is 100MB, which SSIS stores as MaxBufferSize. This value can never be any bigger than 100MB, so size it appropriately for it’s environment.

DefaultMaxBufferRows – The default is 10,000 rows. This setting needs to be set the same way that you win with the Price is Right . To win, the default max buffer rows * (times) the row size needs to be as close to DefaultMaxBufferSize as possible without going over.

Size Matters

To figure out what to put in the DefaultMaxBufferRows, one needs to know how big the rows are. Remember to win the value must be as close to the size of DefaultMaxBufferSize without going over. If the buffer is sized too large, the package will be slower as rows will be cached to disk, so make sure you don’t miss by even a little. How big is the row size? There are a number of ways of figuring this out. You can look at each field in the query SSIS is loading, and based on the data type, add up the value of all the fields. Fortunately that isn’t the only way to figure out the row size. If your data source is SQL Server, you are in luck, as the system tables can help to determine what the size is. Here is a sample the query, assuming your table name is Address, which you can run on the AdventureWorks database.

Exec sp_SpaceUsed 'Person.Address'

The Results are

name         rows    reserved       data          index_size    unused
Address    19614   5960 KB      2784 KB     2688 KB      488 KB

To figure out what the size of your buffer should be for this entire table is to take the number of (data *1024)/ Rows as 100MB is the max size you can set. To calculate the row size, use the formula values  2784 / 19614 * 1024 = 145.346, or 146 bytes per row.  If you set DefaultMaxBufferRows to 100MB, which is the maximum and what I recommend in most cases, it is 104857600 bytes is the Buffer Size.  Buffer Size/ Row size = DefaultMaxBufferRows. 104857600 / 146 = 718202.73 so set the DefaultMaxBufferRows to 728203  If  you are using the columns, you can get the same information by looking at the  syscolumns.  By using the column length, it is relatively easy to figure out what the appropriate size of your buffer should be, by adding up the column lengths. One word of caution. I do not wish to imply that because the information is available on a per table basis one should pick Table or View in the SSIS source. Au contraire. Always access the data by using a Select statement as it performs better.

Validate

To ensure that you have improved the buffer size performance, check it. After you are done with the settings, Enable logging on the data flow task, and select the BufferSizeTuning event to see how many rows are contained in each buffer.

Please feel free to drop me a line if you find this helpful.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power Map – No longer only BI

Microsoft did something interesting with the licensing of Power Map, they changed their mind. Power Map was original released as part of Power BI a preview tool. Shortly after Power BI went live on February 10, Microsoft made this announcement. “On May 30, 2014 if you have the Power Map Preview installed, it will no longer work in any non-Office 365 subscription version of Excel.” You won’t see that text on their site, but I had it on a slide for a previous presentation I did on Power BI.  Time to update the slide as things have changed since SQL Saturday Detroit, and I have updated my presentation for Power BI SQL Server Saturday Denver.

Do you have a License for That?

The Preview for PowerMap was originally available for Excel 2013 and Excel 2013 for Office 365. Microsoft’s message that came out after PowerBI was officially released is translatable to unless you have Office 265 and pay for a Power BI license, no Power Map for you. Well, that folded like a bad poker hand. If you look now at the Power Map website it says this (I’ll quote it here to make sure I have the text if it changes again)

“If you have any subscription for Microsoft Office 365, you have access to Power Map for Excel as part of the self-service business intelligence tools.” Yea!  But what if you have Excel 2013 on prem? You can use the preview forever. What does that mean? No new features but Power Map won’t stop working. Here’s what the website says exactly “Although feature and performance enhancements for Power Map will continue to be released for the Office 365 subscription plans, there are currently no future plans to update the downloadable Power Map Preview.” Now you don’t have to have purchased the separate license for Power BI to get Power Maps. You can be all Powerful with Office 365 or Office 2013. (Note to Microsoft:Does everything have to have Power in Excel? I am starting to channel the Wizard of OZ)

New Power Map Features

On September 9th, 2014, Microsoft just released some new features for Power Map which they do about once a month. One of the cool new features is the ability to add your own maps for things like the inside of buildings. If you want that to work, you might need to check your configuration. I’ve included a copy of my Com Add-In Screen. If you can’t remember how to pull this up, check my previous post https://www.desertislesql.com/wordpress1/?p=178

PowerMapAddIn

You will notice there are two checkboxes here for Power Map on the COM Add-In Window because I was using the preview and Office 365 automatically gave me the non-Preview version of Power Map when I got an automatic Office 365 update. To get the cool new stuff, I must have the Microsoft Power Map for Excel checked, not the Preview one. If you didn’t pay for Power BI with Office 365, you won’t be able to use the new features Microsoft adds in every month.

Now if you have Excel 2013 or Office 365 and didn’t download the preview yet, you still can right here. http://www.microsoft.com/en-us/download/details.aspx?id=38395. Microsoft now calls the Power Map Preview an Unsupported Add-In, but all of the original features still work.

Mapping in Excel

If you have Excel 2013 or Office 365 and you don’t have Power Map Preview installed, you can still use mapping tools for Excel reports. How is this possible? By inserting maps into Power View.  The maps inside of Power View are very interactive, and unlike Power Maps, you can encode these reports as HTML 5 and display them on your phone or tablet.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Powerful Stuff at SQL Saturday 331 – Denver

SQLSaturday 

This will be the fourth SQL Saturday this year that I have honored enough to have been selected as a presenter. I know it is an honor, because I’ve also been turned down, which makes me work harder and keeps me humble.  If you decided today that hanging out at the University of Denver next Saturday is something you want to do, you are out of luck. There is now a waitlist to attend as SQL Saturday Denver is filled to capacity.  It is really awesome to think that they have so many people interested in learning about SQL Server that they are going to have to turn some away. It says something about the great job that Steve Wake, Windy Martin and friends are doing in publicizing this event too.

Power BI

I am going to presenting on Power BI. As a lot of people haven’t yet seen it in action, I will be demonstrating the features of it and explaining the rather complicated licensing issues around Power BI.  Power BI is mostly Excel, and I’ll be explaining what you can do in Excel 2013 without having to buy anything and if you do decide to buy PowerBI what you get.  Because there are some things which won’t really fit in my presentation, I’ll be posting a few things here on Power BI as well.

SQL Community

I don’t get out as much as I would like, and I certainly don’t get to visit the places where a lot of people who read my blog are, like Brazil. If you are attending SQL Saturday Denver please come by and introduce yourself.  I will give you 10,000 reasons to come to my presentation, none of which I am going to say here as it will ruin the surprise.  I am looking forward to hearing presentations from the other speakers too as there are some great topics being covered. I hope to see you there.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Where to go to be in the Know

At the last SQL Server User’s group meeting, I got into a long conversation with someone about where to go to find good training materials online for people at various skill levels. Contrary to popular belief, I do spend time on the internet not on twitter or shoe shopping, and have come up with a list of sites where I think the training material is quite good, and also free. I included a lot of video content, as a lot people find that easy to use for learning new things.

This is not meant to be an exclusive list, just the places I’ve found helpful most recently or have found myself visiting a lot. If you have some suggestions of places you think I should add, please send them to me.

Big Data

MSBIAcademy – This is a really good way to get up to speed on Big Data and other topics.

Apache – If you are interested in Hadoop, you will make your way here to get the latest releases and see what wacky name the latest new tools has.

Hortonworks – I found the training information on Hortonwork’s site to be very good at explaining things.

Learning Map for HDInsight and Azure – SQL Server data is moving to the cloud too and this is a great place to get up to speed.

Mostly SQL Server

A lot of these sites have crossover information too.

SQL University – This site has information for those people just getting started with the Microsoft stack to more advanced topics.

Pragmatic Works Training on the Ts – Every week on Tuesdays and Thursdays, Pragmatic Works provides free webinars on SQL Server and Big data topics with some of the people who wrote the book, ok lots of SQL books about all kinds of SQL and Big Data Stuff. Flip through the archives if you are working during the day and don’t have time to view them live.

Microsoft Virtual Academy – This is Microsoft’s site where they offer free training and you get points. I didn’t know that I wanted points before, but I do now.

Microsoft’s SSIS Tech Net Videos – The audio on these is often very lousy, but the content is pretty good. I am not sure how often these are updated, but you can find good best practice material here.

Ola Hallengren’s Site – At any time you have anyone calling you a DBA, you should know about this site.

CBT Nuggets on YouTube – If you can find anything on YouTube from CBT Nuggets, it probably won’t be a waste of time. This link is for information on SSIS. Be wary of some of the things posted on YouTube, as not everything there is correct or best practices and the quality can be marginal.

Channel Nine – Microsoft has some random-ish videos out here, some of which are really helpful

SQLServerCentral – This is a great resource. Go create an account here as it is free and there is a treasure trove of information.

SQLPass – Last but certainly not least, check out all of the information archived on SQL Pass. They have a lot of virtual groups on a wide variety of SQL related topics. If you can’t attend when they are being held, the videos are available for later viewing on the website. The previous PASS Summit information is awesome. They also have a YouTube channel as well, where you can find interesting things to watch.

SQL Saturday

All of the other stuff I mentioned is archived and available when you have a chance, but there is nothing like being able to ask resident experts about various stuff and network with other SQL Server people. Check out SQL Saturday to see when and where there is going to be an event near you. These events have gone worldwide, so it is very likely there will be an event near you sometime this year.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Inquiring Minds Want to Know about Improving Data Quality

Lost in the sauce of all things Power released as part of Excel 2013, Microsoft snuck in an app that is pretty obscure, so I thought I’d shine some light on it. What’s the app? Inquire. Ok, be honest have you heard of this yet? Microsoft bought Inquire which was previously known as SpreadsheetIQ before Microsoft bought Prodiance and their product on June 6, 2011.

In order to see and use the Inquire, you will need to add it from the list of Com components shipped but not turned on in Excel. This is the same process that you need to follow to add PowerPivot. Check out my previous post blog and follow the same steps you needed to add PowerPivot to add Inquire, which will add yet another tab to Excel. Why would you want to add Inquire? Well, it contains some neat tools to help you analyze what is going on in Excel worksheets. If you are unaware of these tools or are not sure how to use them, you might want to consider taking an excel course online to gain a deeper understanding of the tools and functions available in Microsoft Excel. In addition to helping you improve your speed while doing major data analytics projects, it may help you increase your accuracy as well.

How accurate is your data?

Since more and more data is being stored in spreadsheets people have been starting to study how accurate the data is. In a recent study at the University of Hawaii they have found that spreadsheets are wrong most of the time. It is not surprising that there are tools, like Inquire, being created to help accuracy improve. One of the ways Inquire helps improve data quality is to expose the internals of Excel so they are easier to review, which will make it easier to see what all is included within in Excel. These tools make it easier to analyze the contents all in one place. However, if you are extracting data from a website in a JSON file, you may not be able to use the various data analysis tools as Excel may not support such file formats. In such cases, you may need to look for an online JSON to XML converter, which may help you use extracted website data in Excel.

Workbook Analysis

When you have selected this feature, Inquire will go through your excel spreadsheet and review the contents. The summary contains a lot of useful information. In addition to including things like how many cells create formulas, as well as see how many cells contain errors. The summary also shows Very Hidden Sheets, which is something I didn’t even know you could create via VBA. Data Connections, Validation Criteria, Dependents, Duplicate formulas etc are some of the many items that Inquire includes in the summary. If you accidently put a minus sign in front of a cell and didn’t see notice, Inquire will list the item under Negative Formulas. All of the numeric items you have stored as text are listed too. If you run the detail report, you will see the workbook sheet name and every cell address and formula where these numeric text fields are located, which will be extremely handy when trying to correct this problem. All the information is stored in a report, which naturally is generated in an excel spreadsheet for later review.

Understanding the connections

Inquire’s Workbook Relationship report shows a picture of how workbooks are related together.

Relationship Diagram

The cell relationship does a similar diagram on the individual cell level. If you have two spreadsheets and you want to know what the difference is between the two spreadsheets on a field by field basis, Inquire will show the differences when you select the Compare files button. If you just want to do some cleanup, there is a button for that too. Removing excess formatting can help improve the performance of the spreadsheet by eliminating duplicate and extraneous formatting. Lastly, inquire contains the ability for you to manage your passwords contained throughout the spreadsheet in one place.

With more and more data being stored in excel, either to create PowerBI analysis, Tabular Prototyping or just because excel is your data store of necessity for now, its nice to see Microsoft adding tools to help ensure that you can beat the odds and produce an excel spreadsheet where you can have greater confidence in the data stored within it.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur