Articles By admin

Getting started with PowerBI in Office 365

When speaking at SQL Saturday #292 – Detroit, I received a question which I thought more people might have. How do I get started using Power BI or PowerPivot? I have Office 365 and I can’t get to the PowerPivot Menu and I don’t see Power Query either. What’s wrong?
I am assuming in that case your version of Excel looks like this.

image 1

 

Power BI was officially released on February 10th 2014. You might think that would mean that if you installed Office 365 after February 10th, which I did here, that Power BI would work out of the box. Sorry, that’s just not the case.

Turn on the Power

In order to get PowerPivot, Power Map, Power View or Power Query working you need to turn them on in Excel first. Click on the file tab in Excel and you’ll see this screen

image 2
 

 

 

 

 

 

 

 

 

 

 

Click on the Options menu item and you’ll see the following screen

image 3
Notice there are two sections to this window, Active Application Add-ins and Inactive Application Add-Ins. I’ve highlighted Power Pivot, which is listed under Inactive Application Add-Ins. This is why you don’t see it, it’s not active. Sounds reasonable, just click on the GO button and fix it right? No, wrong.

Type Tricky

The trick to adding add-ins is looking at the type. PowerPivot is type COM Add-in. If you click on the Go button you will see all of the Excel Add-ins, won’t turn on PowerPivot. When is an Excel Add-in not an Excel Add-in? When the Type says so. If you click on Go, this is what you will see

image 4

 

 

 

 

 

 

 

 

 

 

 

PowerPivot is a COM Add-in, so it isn’t on this menu. Cancel out of here and go back to the Add-In screen and change the drop down box to COM Add-in before clicking on the Go Button. This menu is what you need for getting 3 out of the 4 powers.
image 5

 

 

 

 

 

 

 

Go ahead and check the boxes next to PowerPivot, PowerMap and Power View like you see here. Click ok and then you’ll see the PowerPivot tab in Excel.

image 6

You still won’t see the Power Query Tab, because that has to be downloaded separately, which you can do for free here
https://www.microsoft.com/en-us/download/details.aspx?id=39379
Pick the one version you need, either the 64 or 32 bit version and follow the installation instructions, which means getting out of excel first as it will only install when it is not loaded.

image 7

 

 

 

 

 

 

 

 

 

 

 

After you finish the installation, go ahead and open Excel.

Power Signs

The Power Query tab is magically added, as the install takes care of adding the add-in for you.

image 8

 

When Excel looks like this, it’s time to get started with PowerBI!

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

MCSA – Part Deux, Exam 70-462

If you haven’t seen Charlie Sheen’s cinematic triumph Hot Shot’s Part Deux, you’re missing out, which is why I made sure to include the link for it. If you have seen the movie you will wonder how it missed out on an Oscar nomination, or maybe better understand why it is on cable non-stop in reruns. It was also the first thing that came to mind when I thought about Part Deux. So having taken the first exam 70-461, without the book, I felt compelled to go out and get the Official Microsoft Press study book for 70-462 . This was the first book made available for any of the SQL Server MCSA Exams. Yea! That will make studying for the test so much easier, right? No. Wrong. Dead wrong.

So about the book.

If all you do is follow it, make sure you check out this web site http://www.microsoft.com/learning/en-us/second-shot.aspx as you will be doing a Part Deux yourself. This exam covers the administration part of the exam. This is the exam that DBAs should be more familiar with, although I doubt in your environment that they have set up every form of replication and high availability option in SQL Server 2012, so I bet you will have to study too. I set up a server and thumped around the exercises. It wasn’t enough. The book does offer a coupon in the back for a discount on the exam cost as well as practice exams. After I failed the practice exams, I thought hmm this isn’t working. I went back to what I did on the last exam, after all I passed that one. Here is everything you need to know to pass the 70-462 exam. What I did was made a word document out of all the material it said you needed to know by copying things off of MSDN and books online. I made two documents, after all this is part two, and it got sorta long and I didn’t want to blow up Word.

After making sure that I could really do the stuff listed that I didn’t have a chance to mess with when I played DBA, I drove my rolling purse over to the testing center and took the exam. I brought my running watch which has a handy stopwatch which is also fun to set when standing in TSA lines in the airport to see how much of your time is really being wasted. You should try that sometime as generally you will be surprised that it seems like you are spending a lot more time in line than you actually are. The test was like that, as it seemed like it was taking forever but I actually had more time than I needed. I had 30 minutes to spare so I could go get ice cream before I had to be at my next appointment.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

MCSA, Act 1 Exam 70-461

The MCSA is really a play in three acts, one for each test. For SQL Server 2012, Microsoft changed up the whole exam game. Now you have to take three tests, count them three 461, 462 and 463, to get a certification, the Microsoft Certified Solutions Associate [MCSA]. Gone are the good old days when you could take the first exam 461 and then stick Microsoft Certified Professional on your business cards. That was sweet, and in all candor I have done something kinda similar. Unfortunately, these days you will receive nada until you take and pass all three exams. Only then will you receive two logos and a certificate, which I now have ensconced upon my office wall in an exceedingly tasteful and exceedingly stylish frame.

A bit more branding for a lot more work.

So let’s start at the beginning and talk about 461. Unofficially speaking, this test isn’t terrible. Check out the link to see what’s on it http://www.microsoft.com/learning/en-us/exam-70-461.aspx. You just have to make sure you know all the stuff that they tell you on this page. For me the hardest part was learning all of the new stuff on 2012 which I hadn’t gotten around to playing with yet and the XML which no one has asked me to play with for years.

I wish I could comment on the book for it, but I got all motivated and took the test before the book came out. Itzik Ben-Gan is really an impressive guy when it comes to SQL stuff, so I bet it is good. I can tell you this, you can pass the test without it. What I did was took every topic listed in the study guide on Microsoft’s 70-461 site, and made up my own reference from links online. I played around writing code in adventure works too. I ordered the material I studied based on the emphasis that topic had on the test. The topic Work with data has the highest percentage of the material at 27% so I studied that one the most and the rest of them in percentage order. I made sure that I could write the sql code covered and that pretty much did it.

For those of you who wonder what it is like to actually take the test, just follow the process. First off, you make an appointment at a testing site, show up 10 minutes before your time, and sign in. If you bring anything with you other than your two ids and keys, you will have to lock it up there, so empty your pockets and leave all the stuff you usually carry around with you in your car, or as I like to refer to my car, the rolling purse. The proctor will hand you a laminated piece of paper and a dry erase marker and then walk you over to a library carrel with a computer they will sign in. The test is timed, so take a watch as the computer won’t show you what time it is. Whenever I don’t know what time it is, I find a strange fixation to know the time. Yes I get that this makes no sense and a waste brain power, but this is what I am thinking about when I am sitting there. As soon as you hit enter on the last question, sit there for a minute and you will find out if you passed or not. Be kind to the other people taking the test and don’t start banging on the desk because I don’t need your personal issues to mess with my brain, which is still mode locked over not knowing what time it is. You will get a printed copy of your results which show how you did in each section with a Gantt chart. If it is all black you got that section 100% correct. I am dying to know what a blank 1/3 of an inch means, but Microsoft doesn’t really disclose how it is scored, so I guess I’ll never know but I did pass it. Note to self, next time, bring a watch.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

The Scoop on Sqoop

In the weeks following my talk at Desert Code Camp and SQL Saturday in Detroit about Big Data, I have been receiving inquiries at my blog regarding sqoop, so I thought that I might get more specific on how it works. Sqoop is part of the Apache borg-like collective of tools which was created to use databases, any databases. Lots of people have databases and like them. Databases are really good ways to store data. Just think if Oracle would have been cheaper and faster Hadoop may have never been created because Hadoop was created to solve those problems, I guess at least in this situation resistance was far from futile, but I digress. Let’s say you have some data which you would like to load up into your SQL database. Since you are picking the data to load up into SQL Server, I am expecting you are picking some data which is already structured.

A while ago I worked on a GPS tracking application. We collected data on trucks every 10 seconds, which means that we were collecting a lot of data. To decrease the data in the database, the data was archived off after 30 days. If I was working there now, I would recommend that the data be archived to HDFS. You could store it very cheaply that way and using Sqoop, load the data back again if someone threatened to sue or something worse…
Here’s how you make an archive that work using Sqoop and HDFS
1. Create an HDFS datastore
2. Load the drivers for SQL server, because they only give you mySQL
3. Run the Sqoop command
4. This extracts the data and inserts into HDFS
Ok, let’s say you want the data back. The trickiest part is getting back only the data you are interested in and not everything you have. You can run out of space in SQL server by loading all of this data up, so be careful. First you need to know some information about SQL Server. Run this query on your destination
Select CONNECTIONPROPERTY(‘Net_transport’) as net_transport
, CONNECTIONPROPERTY(‘local_tcp_port’) as tcp
, CONNECTIONPROPERTY(‘Client_net_address’) as client_net_address

If it comes back that you have mixed instead of TCP, go into SQL Server configuration manager to change it to TCP. You will need that information to know what to put here. I am of course assuming that you have already created a SQL user id called Hadoop with a password of bigdata.

sqoop import –connect “jdbc:sqlserver://192.168.138.1:1433;database=AdventureWorks;username=hadoop;password=bigdata” –table

Assuming you kicked this off in the right path and all, congratulations, you have just used Sqoop!

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Hadoop Tools Peg Board

 

Looking at all of the tools available for Hadoop reminds me of the work area in my Grandad’s  basement. There he had a giant pegboard, ok maybe it just seemed big because I wasn’t, and he had all these tools on it. Different kinds of hammers, screwdrivers and saws and things I couldn’t identify.  At first glance Hadoop looks a lot like that. There are lots of tools available, but you will get better results when you know when to use the claw hammer versus the ballpeen variety. Sometimes, the difference between tools are not so obvious, like between Hive and Pig.  Other times the difference in tools are substantial for example the difference between Hive and Impala.

Big Data is an overarching term which can portray anything, from a bunch of websites to vehicle GPS tracking information which you get every 10 seconds.  Due to the cheaper costs for storage, businesses want to save everything, and they are relying on the data people they employ to extract the desired answers they want from this reservoir of data, whenever the mood strikes them.  In much of the recent literature, this is known as the Cake-in-the-Lake paradigm. The data is stored in HDFS is a giant pool, or Lake, and the data requested is the Cake. I have to digress and wonder who comes up with these metaphors.  The useful information is the cake, and you need to go diving in the lake to find it. In this metaphor you are searching for soggy pastry.  Wouldn’t it make more sense to go pearl diving for good information?   I guess since “Pearl” or really “Perl” has already been taken as a name, someone thought a rhyme which evokes mental images of ruined bake goods would be better.  Putting aside the metaphors, there are a number of tools and ways to get the good stuff out of the accumulated data pile.

As I am a database person, the tool which has most intrigued me is Cloudera’s Impala.  No longer just your father’s Chevy, this tool is a full on SQL database on top of an HDFS file system. This is very attractive due to it’s high coolness potential as this allows users to write real ANSI SQL statements on top of Hadoop.   Ok here’s my question, so when is that going to work?  One of the big things stored in Hadoop is unstructured data. As I recall the reason that you don’t put unstructured data in a database is that the structure of said data does not lend itself to a formalized schema.  Think about the structure of a series of web pages.  What kind of schema are you going to impose upon that? It won’t work out well.  If on the other hand, the data in the HDFS file structure is a large set of semi-structured data like sensor data or data which is inherently structured, Impala could be a good solution.   Unfortunately, there is no one tool which will work for everything. If you need to parse through social media posts to find trending instances of people interested in buying a house for the first time in various parts of the country, you may have to use Map Reduce.  Map Reduce is a batch process and a pain to write so a lot of tools exist so you don’t have to use it.  Depending on what you are being asked to do, breaking out a Map Reduce program remains the best solution.

With Hadoop, what tool you can use is greatly influenced by what you are storing. Big data or small, you will still need to take a look at it so as to determine how you can categorize what is inside before taking that tool off the pegboard.  And if you are going to be playing around with Hadoop, you are more than likely going to need to know how to use more than one tool.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power BI Licensing and Constraints

Licensing is a topic which most people reflexively ignore. They just scroll down until they see the I agree button and click it so they might attain the object of their desire, software with which to amuse themselves.  With respect to Power BI, now might be the time to read the fine print.  The license dictates how much data you can display, which you need to evaluate to see if this is the best tool for your environment. Let’s go to the authoritative source regarding such matters, Microsoft.  Here’s a link to their Power BI release and pricing page http://bit.ly/1e9dAEB .  If you didn’t know, as of right now, Microsoft is only licensing Power BI with Office 365.  Based on their page it appears if you don’t have Office 365, you can’t have Power BI.  If you have been playing around with Power BI, you know that’s not really the case.  

Power BI Components

To ensure that we are all starting from the same place, here’s a summary of the Power BI Preview components. Power BI consists of four Excel add-ins, Power View, Power Pivot, Power Query and Power Map.  Power windows are not included.  Together these tools are used to create a Power BI document to be shared on to the Power Cloud, SharePoint on Office 365.  The add-ins Power Query and Power Map were created for the Power BI preview, because two powers are just not enough you need a quad. If you want to be able to deploy a Power BI report to your tablet or smart phone with HTML 5 or use the natural language query feature Q&A to manipulate the data, both of which are part of the preview, you can only do this from Office 365 SharePoint.  If you desire to distribute your Power BI creation to a mobile device such as a smart phone or tablet, you will then need to send the app to the Windows Store so you can distribute your report app to said devices. 

Help! I don’t have Office 365

If you have loaded Office 2013 or Office 2010 service pack 1 to your computers can you still use Power BI? This is where licensing gets tricky. Sure, you can do it, at least at present. Power BI is in Preview Release mode, and as part of the preview you can download the preview add-ins for both of those versions of Excel. Can you use the add-ins you have once it goes into full release?  This remains unclear.  Excel 2010 has Power Pivot included and Excel 2013 has Power View also so both of those components don’t require a Power BI license.  But Power Map and Power Query add-ins both say they are Preview Editions. What happens after the preview is over? Yet another ambiguity.

If you have seen Power View back when SQL Server 2012 was released you may think that you might have all of the same features of Power BI without having Office 365, but that isn’t the case. Power View was released as part of SQL Server 2012, so if you have SQL Server 2012 you can use Power View with SharePoint 2010 or 2013.  Power View for SharePoint has a map feature as well but the feature set is not the same as Power Map. What about Power Query on SharePoint? Power Query is part of Power BI and right now it can’t be loaded on your current SharePoint server.     

Preview Edition

As of this writing, Microsoft has yet to offer Power BI for sale to end users, although you can easily obtain a preview of it.  If you do decide to give Power BI a whirl, you will be granted a preview license.  The preview license terms differ in significant ways from the proposed release version.  Keep this in mind when you noodling around with Power BI.  If you want to load a file to the Cloud (aka SharePoint on Office 365) the maximum file size is 10 MB.  I divined that tidbit here http://bit.ly/1cajntG .

Implications of Excel File Size or Why my Power BI won’t load

You may have learned, like I did in an online presentation, that the maximum size for a Power BI Excel file is 250MB. .  I discovered the file size is dependent on your license, which creates some interesting things to consider. The 250 MB file size does not apply to the preview license version, which seriously dampened my enthusiasm. I was rather chagrined to discover this inconvenient limitation, after my carefully crafted 45 MB presentation file failed to load.  I started looking around for the size that would load, as it was clear mine was too large, making it clear the 250MB limit isn’t true,  as the preview license limits you to 10 MB.  When you buy Office 365 you pay for how much space you have in SharePoint and there are limits which Microsoft reveals to you here http://bit.ly/1bCeI72 . This website states that licensees only have 500MB per user regardless the plan for Office 365.

Power BI is meant to be the tool for all business users to query their data. If you are designing a model for use in Power BI your space is limited.  Given it’s current configuration, it is assumed your entire operational data store could not possibly exceed 250 MB.  If you have a large tabular model, you may find using Power View in SharePoint will provide a better solution as these data model constraints don’t exists there. Power BI is great tool, but the scalability limitations cannot be overlooked.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power Query and Power Pivot – two ways to load up Excel with data

If you’ve have messed with pivot tables in the past, you know that adding data to an excel spreadsheet is nothing new.  You now can add data three ways, because you can now add data with Power Pivot and Power Query.  The question is which tool should you use when?  Well if you have Office 2013 and didn’t sign up for the Power BI preview, you are limited to using Pivot Tables.  Since I signed up for the Power BI preview I was left with another question.  Which tool is the right one to use for loading data?  Well it’s the same answer you always get when working with computers.  It depends. 

Power Query was designed to readily provide access to data, any data.  Microsoft on their site promotes the fact that you can include, or as they like to say “Mash” data from Wikipedia or other sources together with your business data.  I can see where this feature would be very useful.   I work with a concrete company who has a lot of KPIs that they have for daily production.  Well, when it rains they don’t meet the performance metrics because no one wants to pour concrete in the rain as it affects the way it cures and the concrete can end up bumpy instead of smooth.  If they correlated the weather together with the KPIs they could definitively determine the effect of rain on their business. Power Query also provides a way of merging or appending data together, which makes sense since you may be selecting data from a variety of sources

If you are pulling together internal data so that you can create a Power View report on it, Power Pivot would be the best way to proceed.  Yes, you can do this in Power Query, but Power Query doesn’t provide the same level of filtering that Power Pivot does.  Power Pivot also provides the dangerous feature of allowing users to update data in the database by modifying the data in Power BI a spreadsheets, a feature which most of the time I believe will be turned off. 

With Power Pivot, not only do you load up the data, but you also build the relationships between the data.  It works the same way if you user Power Query for half the tables in your database and Power Pivot for the other half.  I felt compelled to check that out when I was playing around to see if it would make a difference.  Of course Power Pivot can do a lot more than just relationship management of data.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Do I really have the Power?

Is just it me or does anyone else keep on hearing that stupid Power song by the one hit wonder Snap whenever they are hear about Power BI?  Now if you cannot recall this catchy tune this should send it rattling around into your skull http://www.youtube.com/watch?v=_BRv9wGf5pk.  It’s the new Rick Roll.

Is it possible that adding the word Power is supposed to be a subliminal message to people who’s lives are spinning out of control and have been sapped of their personal essence?  But then again I could be wrong.  You add data by Power Query, then manipulate it in Power Pivot, then make a Power View and add a Power Map.  And naturally this task should never be attempted at home and only by Professional Power Users.  I feel that by using it I have increased my power exponentially by a factor of 365 to complete this task.  On the other hand, I could be totally wrong.  Power BI is available as a preview edition, which means for now anyone can load it up as it has not been officially released.  Still though the more time I spend with it the more I generally like it. 

Although you could argue that it might have made more sense for Microsoft to build a reporting app than bloating up Excel, it’s perfectly understandable why they did it.  Based upon what I have seen, there is a real reticence on the part of the IT staff to install anything else on a computer.  Given the things end users wish to install on their computers, one could hardly blame the IT staff.  The idea of self-service BI is for business users to use it, and Excel has been the staple of ad-hoc data storage and analysis for decades.  Since Power BI is a series of task specific bolt-on accessories to the ubiquitous Microsoft spreadsheet, using it isn’t the most intuitive thing in the world either.  It is sort of awkward to click here and there to get the tool to work, but once you figure that out, in the end it turns out to be a cool tool.  It has changed quite a bit since it was first introduced.  Microsoft’s blog from July includes a walk through of Yelp data which will no longer works as written because the expand feature has been removed. 

Power BI really makes Microsoft competitive in an area where they had nothing but a gaping hole which their competitors, namely Tableau, have driven their truck through.  In less than two years they have come up with an application which provides self-service BI to business users, uses in-memory analytics, can be viewed securely on the web, and can be viewed on tablets and phones.  Since Microsoft is slowly but surely getting rid of Silverlight to follow everyone else to HTML5, you can view your Power BI reports on whatever phone and tablet you happen to own. Ok, the caveat is that the application that was created for the Microsoft phone is better, but you are not shut out if you have an iPhone or Samsung tablet either. 

From a marketing perspective, Microsoft has provided perfectly sensible reason that you might want to actually get Office365.  Personally Power BI was the reason that I decided to move to Office 365.  Let’s face it, for most people Office 365 was just another way to load up office.  The whole cloud thing was not really much of a selling feature for end users who didn’t want to use a skydrive when they could easily use a hard drive ensuring your data isn’t spread all over the known universe.  Hosting SharePoint in the cloud provides a feature people might actually use.  There is an argument to be made as to whether or not it is any less annoying to administer SharePoint via a cloud rather than locally, but it is one less server IT needs to monitor.  The price isn’t terrible.  Here’s the info which Microsoft has released on pricing http://www.microsoft.com/en-us/powerBI/pricing.aspx#fbid=5EP_a34ZHLQ

I am giving a talk at SQL Saturday in Albuquerque on Power BI where I will delve the technical features of Power BI in greater detail.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

Data Quality, there is an App for that

On Wednesday, January 8, I will be giving a presentation of an interesting and potentially handy new feature of SQL Server 2012, Data Quality Services [DQS].  I discovered this interesting tool when I was studying for my 70-463 exam, which I did pretty well on and will be discussing in a post coming soon to a screen near you.  This tool is included as part of the standard release of SQL Server 2012.  It’s an application that given it’s usefulness will have a wide implementation in the future.

I will demonstrate how to use this tool in conjunction with third party reference data from the Azure Marketplace to validate some sample address information.  In order to assimilate ever increasing volumes of data, new applications and tools have been created to process it, but the underlying problem still remains.  The data needs to be complete and accurate.  While sometimes it is possible to improve your methods of gathering the data by say tweaking the input phone number field so that you only get valid number in a format you expect, such modifications are not always possible.  There are times where the data providers may benefit by sending more data, not more accurate data, as they are paid to deliver quantity and it is up to the recipient of the data to determine the quality. 

Microsoft has created two tools which they designed to improve the quality of collected data.  The first tool, which was released as part of SQL Server 2008 is Master Data Management.  I have yet to see or hear of anyone who is using this application, and outside of testing I have not used it either.  Master Data Management was designed to assist organizations by providing a framework which transfers ownership of data quality to subject matter experts who know the correct values.

In SQL Server 2012, Data Quality Services was added as a way to fix the data.  As part of the ongoing push to put Business Intelligence into the hands of users and away from IT, DQS has a very user friendly interface designed to develop a set of rules which are used to clean data.  Once the rules have been validated, you can take this knowledge and put it into an SSIS job to automate cleansing of larger quantities of data.  

As this is the first version of DQS, there are some things about it which are a little awkward.  An example of this is the two-step process required to install DQS.  When you install SQL Server 2012, and select DQS as an installation component it appears to be installed, but it isn’t.  After the SQL Server 2012 application is installed, then you have to run DQSInstaller.exe.  One of the other limitations of DQS is it is a bit on the slow side.  Microsoft doesn’t recommend it for millions of records, just a few thousand.  Even with these limitations, DQS can still provide a great deal of useful functionality which can be an important part of ensuring your production data is complete and correct.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

Coming to a “Gizmo” near you…

Tropical-Island-Wallpaper-1Welcome to our uncharted enchanted isle…

Things around here are just getting started.

Please cruise by from time to time to check my progress.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur