Data Factory – Copying Data from a Database to an Azure Blob Store

My previous post provides instructions on Getting Started with Data Factory. To borrow a line from Ron White, “I told you that story so I could tell you this one”. If you followed the steps in that post, you have a Data Factory resource. Time to use it. This post describes how to process some data by copying data currently stored in the database to a Blob Storage account. To complete that task, you will need to write some JSON. Please be careful when you do this as writing JSON may involve cursing JSON. Since Jason was in the list of top four male baby names in the US from 1973 – 1982, chances are you may work near someone named Jason when writing JSON, who may think the cursing is directed at him.

Author and Deploy Linked Services

Author and DeployTo be able to pass data from a database, a connection needs to be established to a database. That connection is made by creating a Linked Service. In Data Factory, if you create a New data store, a new Linked Service JSON template will be created. After Clicking on the New Data Store Icon, a list of databases will appear. In this example I am going to pick an Azure SQL Database. If I wanted to access a local SQL Server Database, a gateway to connect to it would also be required. The JSON Linked service template will appear under the menu Drafts and will be labeled Draft-1. A copy of the coded is included below.

{"name": "AzureSqlLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"description": "",
"typeProperties": {
"connectionString": "Data Source=tcp:<servername>.database.windows.net,1433;Initial Catalog=<databasename>; UserID=<username>@<servername>;Password=<password>;Integrated Security=False;Encrypt=True;Connect Timeout=30"
}
}
}

If nothing is changed and the Deploy icon at the top of the screen is clicked, a new item AzureSqlLinkedService will appear under Linked services. There will be a circle with an exclamation point next to it as the connection does not contain valid data, but you can deploy it. To get rid of the error, replace all of the information default provided after the colon with valid values, starting with the name, which has been defaulted to AzureSqlLinkedService.The name property dictates what the Linked service will be called and  cannot be change the name later. To change the name later, highlight the name of the service click on the Clone,  change the name, deploy, highlight the old name, click on the …More text at the top of the screen, select delete, then confirm the deletion the Linked service by clicking on the Yes Button. Conversely, just select a good name before clicking on the Deploy icon. Within the connection string, replace all of the items between < and > with valid values. If there are any errors after deploying, fix the data, click on Deploy and the circle with the exclamation point will go away. Invalid database names are accepted, as long as they are in the right format. If the name is not valid, an error will be received when data factory is run.

Because we need a connection to a database and a Azure Blob, two Linked Services are required, one for each different type. Prior to completing this step, create an Azure Blob storage account by clicking on Add on All Resources. Create the second Linked service, like the first. Click on New data store then select Azure Storage. Using the template for an Azure Blob Storage linked services, I have modified it below adding the “hubName” as it is required


{
"name": "GingerAzureBSLinkedService01",
"properties": {
"description": "Test Azure Blob Storage Account for DF",
"hubName": "GingerDataFactoryTest_hub",
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=gingerblobstorage01;AccountKey=**********"
}
}
}

The hub name is the name of your Data Factory resource you created earlier, and the name will be from you storage account. You will need to get the actual Account Key and replace it. Once the JSON is deployed, the value will appear as a line of asterisk again.

Create a Data Factory Dataset

Two datasets representing the data accessed in the two linked services must be written in JSON. Click on …More and select New dataset. Unlike Linked Services, you actually do have an option to create a New dataset as the name is consistent. Creating the JSON for the database is a bit of a pain as you have to define every single column accessed and as well as the datatype for each one. Of course both order and capitalization are important, as they must exactly match what is in the database, or you will get an error after the code is run. You won’t see any errors when you type it. Here is a sample

{
"name": "InputDataSet",
"properties": {
"structure": [
{
"name": "Age",
"type": "Int32"
},
{
"name": "workclass",
"type": "string"
},
{
"name": "education-num",
"type": "Int32"
},
{
"name": "marital-status",
"type": "String"
},
{
"name": "occupation",
"type": "String"
},
{
"name": "relationship",
"type": "String"
},
{
"name": "race",
"type": "String"
},
{
"name": "sex",
"type": "String"
},
{
"name": "hours-per-week",
"type": "Int32"
},
{
"name": "native-country",
"type": "String"
}
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "InputLinkedServiceAzureDB01",
"typeProperties": {
"tableName": "vCensusInfo"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}

In this sample, I am actually getting data from a view, not a table. Even though all the code references a table, using a view instead works fine. The linked service name does of course represent the Linked service created in the previous step.

Fortunately, it is not necessary to list every column when inserting values to a Blob so the JSON for that is much easier. While it is possible not to enter a file name, if you ever wish to use the data in the blob store later, you will need to create one. This JSON will write a file input.csv to a blob store to the path mlinput01.  If this is run twice, the file will be overwritten the second time.

{
"name": "InputDataSetBlob",
"properties": {
"published": false,
"description": "Input Blob Dataset to feed Azure ML",
"type": "AzureBlob",
"linkedServiceName": "AzureBlobStorageLinkedService",
"typeProperties": {
"fileName": "inputdata.csv",
"folderPath": "mlinput01/",
"format": {
"type": "TextFormat",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Hour",
"interval": 1
},
"external": false,
"policy": {}
}
}

JSON for a Data Factory Pipeline

The last JSON required needs to provide the action, in our example copying from a database to a Azure Blob store. Click on …More and select New pipeline.

The JSON code looks like this

{
"name": "PipelineTemplate",
"properties": {
"description": "<Enter the pipeline description here>",
"activities": [],
"start": "<The start date-time of the duration in which data processing will occur or the data slices will be processed. Example : 2014-05-01T00:00:00Z>",
"end": "<The end date-time of the duration in which data processing will occur or the data slices will be processed. Example: 2014-05-05T00:00:00Z>"
}
}

That JSON looks really simple. Unfortunately, for this to work a lot more JSON is required, especially under the activities


{
"name": "PipelineCopy01",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlSource"
},
"sink": {
"type": "BlobSink",
"blobWriterAddHeader": true,
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "InputDataSet"
}
],
"outputs": [
{
"name": "InputDataSetBlob"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval"
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Copy Activity"
}
],
"start": "2016-08-07T00:01:00Z",
"end": "2016-08-08T00:01:00Z",
"isPaused": false,
"hubName": "GingerDataFactoryTest_hub",
"pipelineMode": "Scheduled"
}
}

Once the JSON has all of the curly braces and commas necessary, it can be deployed without a red circle. Scroll back over to the right to get the the Action menu again and select Diagram. The diagram should appear as shown below.

Ad-Hoc Running Data Factory Pipelines

This job is scheduled to run on the hour, but chances are you want to run it whenever you have the code done, rather than waiting. To run the pipeline, underneath the Contents section, click on the Datasets. A list of datasets will appear. Select the one for the Azure Blob storage output, which in my example is called InputDataSetBlob. Select a value under slices, then click on it and another window will appear on the right. Click on one of the items on that window. Another screen will appear on the right. On this window on the top left corner click on Run. Ideally at this point if your spelling and capitalization all works out, after some duration, the status should say Succeeded. To validate that the transfer actually took place, go to your blob storage account, and you should see the file created. You can even download the file to make sure it actually has data in it. If you don’t change the name, and run it again, the data will be overwritten, and no error will occur, so feel free to run this as much as you like.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

The Future of Datazen – SSRS

There were a number of presentations where the next versions of software were discussed at PASS Summit. Among the more interesting was the conversions about the Future of DataZen. For those of you who are unfamiliar with the product, Microsoft bought Datazen on April 14, 2015, as their on-premise Mobile Solution. For more information on Datazen, please check out my previous posts on it here. In July of this year when I talked to someone from Microsoft about how Datazen fit into the product offering, he told me that it was going to be the non-cloud application to provide visualizations to mobile phones. Now with the latest series of product announcements, I fully understand what he meant. Datazen is going to be merged into SQL Server Reporting Services [SSRS]. The Datazen server is going to be combined into SSRS. SSRS is the new Datazen.

SSRS Updates Include Adding Datazen

For anyone who has looked at the product releases notes of the last three versions of SQL Server, you won’t find much about SSRS. The picture perfect reporting system was left alone. I was not surprised to hear from one of the Microsoft Employee’s at their booth at PASS Summit that they were told not even to mention SSRS as a reporting solution a few years ago. Everything was all about Power BI. Power BI is a neat visualization tool, but it doesn’t fit the needs of all users, primarily because it is a cloud based application, and let’s face it. Some people are just not going to go to the cloud.

On Premise Solutions Are No Longer Forgotten

Anyone who has perused by Microsoft’s product offerings in the past few years saw lots of things about the cloud.

Microsoft shows support for cloud and on premise software

Microsoft shows support for cloud and on premise software

Applications like Power BI and Azure ML are only available there. It seemed like Microsoft was abandoning people who wanted to maintain their own servers. They are looking to change this impression. I have included the graphic that Microsoft included in a number of their presentations. They are trying to show love for the box, which is the representation of on premises software.

Datazen SSRS Integration

Datazen is going to maintain the client features that people like, including designing using the grid view and picking the form factor. The server which manages Datazen is no longer going to be a separate server but will be included in the SSRS Server in SQL Server 2016. The idea is that Datazen will inherit some of the features people like about SSRS, such as the subscription model. The details regarding exactly how all of this will work were not made exactly clear, but I imagine that in the very near future more details of the Datazen/SSRS integration will be available soon. One of the things that I wondered is if the name Datazen was going to go away entirely and the application was going to be called SSRS mobile? I heard differing answers to this question, so I am not sure it has been determined yet. What I was assured of is the features that made Datazen a very good mobile solution are not going away. Microsoft hopes that having the ability to publish from SSRS to the phone may be a big reasons people decide to upgrade to SQL Server 2016 next year when this functionality is released.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Datazen – The On-site BI Option

I have talked to a number of people who looked at Power BI Preview and read that it will be replacing Power BI Office 365 as a cloud solution for mobile self-service Business Intelligence [BI] and thought what are we going to do? Some people are not going to go to the cloud. Since they have made that business decision, they were wondering what Microsoft was planning on offering in the non-cloud space? I was curious too, so I asked my local Microsoft Technical Sales Representative about what was the plan for people who wanted on-premises BI. The answer to that question is, use Datazen. Datazen is positioned to be the application for people who don’t want cloud but do want to create a rich self-service environment for doing self-service BI, which of course means that you can send the reports to tablets and phones.

Datazen vs Power BI Preview

When comparing the two products Datazen and Power BI Preview, there are a number of features which will sway people one direction or another.DataZenVisualizations  For example, Datazen doesn’t integrate with PowerPivot models, you have to flatten the model to connect. On the plus side, Datazen has a very robust security and distribution model which Power BI Preview does not. There are more visualization types in Datazen and the snap to grid makes using them a breeze. Power BI has the ability to connect with QuickBooks and other data sources such as Salesforce, which Datazen cannot do. Datazen is included with a current SQL Server maintenance agreement. Power BI will have an on-going per user monthly fee. Both have the ability to allow users to pick their own colors, rather than limit the selection to a series of color pallets, and both deploy to tablets and phones in addition to a web page. All of these factors, plus a whole lot more than I have space for here, weigh into the decision of which product you may select. If the foremost criteria for mobile Self-Service BI is No cloud, Microsoft’s solution in the self-service BI space is, use Datazen.

Future is Cloudy

Recently most of Microsoft’s new products have been very cloud based. Azure Machine Learning, is a great example of this, as no where can you find any document on Microsoft’s site about any plans to move that out of the cloud. The purchase of Datazen shows Microsoft is still accommodating people who just want to stay in house, which I missed in their announcement. I’ll be reading the upcoming releases a little more carefully from now on, as I am sure there will be many more.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Azure Data Lake: Why you might want one

On April 29, 2015 Microsoft announced they were offering a new product Azure Data Lake. For those of us who know what a data lake is, one might have thought that having a new data lake product was, perhaps redundant, because Microsoft already supported data lakes with HDInsight and Hadoop. To understand why you might want a separate product, let’s look at what a data lake is.  I think the best definition of a data lake that I read recently was here. Here’s the TL;DR version “A ‘data lake’ is a storage repository, usually in Hadoop, that holds a vast amount of raw data in its native format until it is needed.” Ok so here’s the question, one  can spin up an HDInsight Hadoop cluster on Azure and put all of your data there, which means you can already create a data lake. Since you can already create a data lake, why did Microsoft go and create a new product?

Hardware Optimization and the Data Lake

If you look at Microsoft’s most recent Azure release, you’ll see they are releasing products designed to operate together. Service Bus, Event Hubs, Streaming Analytics, Machine Learning and Data Factory are designed to process lots of data, especially a lot of short pieces of data, like Vehicle GPS messages, or other types of real time status messages. In reading the product release for Azure Data Lake, they highlight it’s ability to store and more importantly retrieve this kind of data.  DataFactory The difference between the HDInsight already on Azure and the Data Lake product is the hardware dedicated to make the storage and the integration designed to improve access to the data. Data Factory is designed to move your data in the cloud to anywhere, including a data lake. If you look at the graphic Microsoft provides to illustrate what Data Factory is designed to integrate, the rest of the outputs listed have products associated with them. Now there is a product associated with the data lake too. Data lakes are designed to store all data, but unlike a database operational data store, data lakes are designed to have the database schema applied when the data is read, not when the data is written. This allows for faster writing of the data, but it does tend to make accessing the data slower. The Azure Data Lake hardware, according to the release, is designed to address this issue by providing computing power designed for massively parallel processing to provide the data when needed, which would be on the reading and analysis of the data, not when it is written. This sort of targeted computing power differs from the HDInsight Hadoop offering, which is uses a standard hardware model for storage and access. By tailoring the hardware to meet the needs of the specific type of data stored, in theory this will greatly improve performance, which will increase the adoption of not only the Azure Data Lake, but the tools to analyze and collect the data too. It’s going to be interesting to see how the marketplace responds as this could really push massive amounts of data to the Azure cloud. Time will tell.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

 

Azure ML, SSIS and the Modern Data Warehouse

Recently I was afforded the opportunity to speak at several different events, all of which I thoroughly enjoyed. I was able to speak on Azure Machine learning first at the Arizona SQL Server Users Group meeting. I really appreciate all who attended as we had quite a crowd. Since the meeting is held MachineLearningTalkpractically on Arizona State University’s Tempe Campus, it was great to see a number of students attending, most likely due to Ram’s continued marketing efforts on meetup.com. After talking to him about it, I was impressed at his success at improving attendance by promoting the event on Meetup, and wonder if many SQL Server User Groups have experienced the same benefits. If you have, please let me know. Thanks Joe for taking a picture of the event too.

Modern Data Warehousing Precon

The second event where I had the opportunity to talk about technology was at the Precon at SQL Saturday in Huntington Beach, where I spoke about Modern Data Warehousing. It was a real honor to be selected for this event, and I really enjoyed interacting with all of the attendees. Special thanks to Alan Faulkner for his assistance. We discussed the changing data environment including cloud based storage, analytics, Hadoop, handling ever increasing amounts of data from different sources, increasing demands of users, the review of technology solutions demonstrate ways to resolve these issues in their environments.

Talking and More Importantly Listening

The following day was SQL Saturday in Huntington Beach #389. Thanks to Andrew, Laurie, Thomas and the rest of the volunteers for making this a great event as I know a little bit about the work that goes into planning and pulling off the event. My sessions on Azure ML, Predicting the future with Machine Learning and Top 10 SSIS Tuning Tricks were both selected and I had great turnout on both sessions. To follow-up with a question I received during my SSIS Session, Balanced Data Distributor was first released as a new SSIS transform for SQL Server 2008 and 2008 R2, so you can use it for versions prior to SQL Server 2012. I’ve posted more information about it here. I also got a chance to meet a real live data scientist, the first time that has happened.  Not only did I get a chance to speak but a chance to listen. I really enjoyed the sessions from Steve Hughes on the Building a Modern Data Warehouse and Analytics Solution in Azure, Kevin Kline on , and Julie Koesmarno on Interactive & Actionable Data Visualisation With Power View. As always it’s wonderful to get a chance to visit in person with the people who’s technical expertise I read. In addition to listening to technical jokes which people outside of the SQL community would not find humorous, it’s great to discuss technology with other practitioners. Thanks to Mr. Smith for providing me a question which I didn’t know the answer, which now I feel compelled to go find. I’ll be investigating the scalability of Azure ML and R so that I will be able to have an answer for him next time I see him. I really enjoy the challenge of not only investigating and applying new technology but figuring out how to explain what I’ve learned. I look forward to the opportunity to present again, and when I do I’ll be sure to update this site so hopefully I get a chance to meet the people who read this.
Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

What is a Modern Data Warehouse?

As I was honored enough to be selected to give a PreCon on the Internals of the Modern Data Warehouse, I thought that I would take the time to explain why I felt drawn to the topic. There are a lot of places that haven’t given much thought to the changes in technology which have happened over the last few years. The major feature upgrades to SQL Server in 2012 and 2014 have meant that they can use column store indexes which makes things faster and maybe better High Availability. While those things are certainly valuable improvements there is a lot more that you can do to derive value from your data and companies want more than just a well-organized, running data warehouse.

Data is a Valuable Asset

In 2010, Borders Group Inc. was allowed by the Federal Trade Commission to sell their customer information to Barnes and Noble as part of their bankruptcy sale of their assets. In 2015, RadioShack is doing the same thing. Businesses understand that data is valuable and they are interested in using it to drive decision making. Amazon, Netflix and Target are well known for their use of customer information to drive sales, but they are far from the only ones doing this. This is one of the bigger trends identified recently in the business press. The heads of companies are now looking for their data teams to do more with their data so that they too can have the dream information systems they are reading about.

Total Destruction of the Existing DW is Not Required

Excavator working with earth and sand in sandpitWhile a lot of the time, it might be nice to level everything and start over, that is not always an option. The major reason for this is that the data warehouse environment already in place has a lot of value. You want to add to the value already there, not destroy what you have. Also it would take a long time to recreate the environment and no one is patient enough to wait for that. Alternatively you could expand into areas of new technology as your data grows. Perhaps this mean you archive some of your data from your database to a Hadoop cluster instead of backing up the data in some far off location. This would allow you to use Sqoop to bring the data back when you need it, providing ready access to the data. Perhaps you want to provide the users more self-service BI capabilities, moving the data analysis into the hands of the people who are more familiar with the data? You could add the capabilities of Power View in Excel, Power Designer or Tableau to your environment.

Incorporating Social Media Information

The business world operates not only on a batch cycle. More and more companies want to know what is being said about them so they can respond appropriately. With tools like Azure Event Hubs, Data Factory, Streaming Analytics, and Machine Learning this isn’t as hard to do as it might sound. We’ll review these products so that attendees will understand how these tools can provide greater insight not only into their own data, but the data building about them outside of the company firewall.

For More Information

I really hope you can join me in Huntington Beach on April 10 for a full day of exploring these concepts. I always look forward to events like the precon and of course SQL Saturday #389 – Huntington Beach which is the following day.

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Licensing of Power BI moves away from Office 365

The gap between Power BI and Excel keeps on getting wider. As there is conflicting information about the Excel/Power BI break-up on various places on the internet, I wanted to clarify some of the common discussion questions.  One place where you can get definitive answers is in the new Licensing Information for Power BI which I linked in case you missed it. Unfortunately for those of us who have been paying the higher fees for Power BI, the price reduction to ten dollars isn’t immediate. Although I am disappointed, the non-immediate fee reduction makes sense, since the new-Excel-free-version of Power BI is still preview edition. No one outside the US is able to even try it yet. The new pricing will be available when the new product is available. This also gives people a chance to migrate their existing reports to the new version of Power BI. Another way of saying this is, going forward you don’t need Excel or the Four Powers – Power Query, Power Map, Power Pivot and Power View – anymore. The only thing you’ll need is the New Power BI.

No SharePoint needed for Power BI

Another thing that the licensing document makes clear, is when the new Power BI is released, the Office 365 version of SharePoint will not be required. To be even clearer, SharePoint will not be needed to use Power BI. There are several places online where I have read conflicting information regarding the need to have SharePoint. Let me clarify by quoting from Microsoft’s Licensing Information for Power BI page just to make it perfectly clear “Power BI service will become a standalone service and will no longer require SharePoint Online”. Since the current version of Power BI is using SharePoint, if this is the only reason you have Office 365 SharePoint, you can get rid of SharePoint, which will be an additional cost savings. How much will it cost? Talk to Microsoft Support as the details must be worked out with them.

Why Did Microsoft Change Power BI to not use Excel ?

While at SQL Saturday in Albuquerque, which was even better than my high expectations, I had a chance to talk to someone from Microsoft, and of course Power BI came up. I asked why he thought Microsoft moved away from Power BI? While not divulging anything that is covered under an NDA, he mentioned that there were a lot of people who would like to use the features of Power BI, but they didn’t have the right version of Excel within their organizations. Microsoft removed this barrier to adoption by moving to a non-Excel version. Excel also had a lot of features that weren’t needed for data visualization, and support for some of the current features was sort of confusing. I agreed with him. For example there are three different ways of creating a data connection, which is definitely confusing.

Scheduled Updates in Excel

For those people who like the Power BI Add-ins to Excel and want to stay with them, there is one big issue, scheduled updates. Only with Power BI deployed to the Office 365 Cloud SharePoint can you get scheduled updates from all the places which you might be retrieving data. For all those people who for a variety of issues didn’t want to go with Power BI deployed that way, there is now a solution. If you are looking to update Excel, Power Update is what you need. Kudos to Rob Collie for providing this solution as I have heard from a number of people that they had SharePoint and didn’t want to go to the cloud for Power Query updates or didn’t want to have to deploy SharePoint. Keeping up with Excel and Power BI is now a wider world than just Microsoft.  The one thing you can count on is things will always be changing.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Presentation Follow Up to Data Analytics and Distribution with Power BI

Thank you to all of the people who took the time to view my session on Data Analytics and Distribution with Power BI. I really enjoy getting the chance to decrease the confusion I hear regarding Power BI, and hope that you will find the question and answer section helpful if you are trying to learn more about the product.

Questions and Answers

 In Power BI Designer, is it possible to manipulate the colors of the charts?

While this may change as Power BI Designer is still in preview mode, currently the colors are assigned automatically. As you might guess, this is a feature that other people are interested in so it is on Microsoft’s list of things to add. If Power BI Designer color selections are implemented like they are in Power View, which is very similar to Power BI Designer, it is likely themes of colors will be available, rather than the ability to pick each color like in Report Designer, but we will have to wait and see.

Are Power BI designer/dashboard changes specific to each user?

If you have created a Power BI Designer Dashboard, you have the ability to share it with people in your organization. There are a couple of things that need to happen for this to work. The people that you share it with must have Power BI accounts, and they have to be in the same domain as you are. When you share the reports they are only able to read, not edit them. For more information regarding security and Power BI, see Microsoft’s guide here.

Can I grant access to users outside of our domain?

Power BI’s security model is a separate tenant from the security model for SharePoint in the Office365 cloud, but they are related as you can only grant access to Power BI if those users are able to access your version of Office 365 SharePoint. As stated in the previous question, for Power BI Designer Dashboard, the users must be part of the same domain.

How does Power BI perform predictions? Is it the same logic which is used in Data mining?

Power BI uses the Forecasting and Hindcasting features to perform predictive analytics. There are a number of different analytical categories, and the kind used in Power BI use Time Series. As the name suggests, Time Series models analyze a set of measurements performed over time to determine patterns in the past which can be used as guides going forward. Data Mining looks at variables, which may or may not include time, as it looks for patterns throughout the data. These underlying statistical models are not the same.

Does Power BI have Power Map feature?

Power BI definitely contains Power Map. In fact Power Map is only supported in preview mode if you do not have Power BI. This link can provide more information about the limits of Power Map in Excel. Power Map is designed to be run as a movie, and provide a directed look of the data on the map, rather than providing the interactive drill down mapping features which are available in Power View. You can share a Power Map by saving it as an mp4 video file and posting it anywhere. There are a number of Power Maps on You tube if you care to search there.

How does “R” play here?

The R language is completely agnostic as to what is the source of its data, you have the ability to use excel if you want to as a data source. If you want to use R within Excel, try the RExcel add-in, which is available here.

Is the PowerPivot where the data is stored for Power BI Designer?

No. Data can come from anywhere, not just from Power Pivot in Excel. For example, if you want to use a website as a data source, you could do that too as there are a number of different available data sources, and that is one of many.

Do you have to use Power BI Designer on the Web?

While Microsoft has designed the Power BI Designer as a web project, so that you can create Power BI Designer Dashboards as part of the preview, there is also an application available for download here. The desktop application works very similarly to the web version, with the exception of course that you will need to upload and Select Power BI Designer file as your data source. Should you wish to modify the dashboard once it is loaded, you can do so.

How would you determine anomalies or freak instances in data versus true trends?

The problem of determining anomalies is one which the practitioners of predictive analytics are constantly struggling. For trending to occur, the numbers of what was previously considered an anomaly need to increase. Forecasting within Power BI applies one of the more standard methods for accounting for anomalies, looking at the standard deviations and probability. The likelihood a number will fall within a certain range of numbers are based upon the number of times this has happened in the past, which is graphed as a bell curve. The values representing the far sides of the Bell curve are discarded, which is known as variance, which in Forecasting in Power BI is represented as 1δ .The number increases with the more variance you wish to represent.

Does SharePoint on premise support Power BI Designer?

No. Power BI Designer is currently in preview version in the US for Power BI subscribers. You can download the application to play with it.

Will Power BI be available in next version of SharePoint?

While I cannot speak for Microsoft, I can tell you that it isn’t there now. For more information on SharePoint, check out their website here.

Does this work in Office 365 SharePoint? And this replace the bi feature offered in SharePoint on the cloud?

Since I do not work for Microsoft, I am hesitant to talk much about how their licensing plans really work. For more information, please check out their website.

If we want to start learning Power BI, where do we start?

There are a number of great places to learn about Power BI, the best and most up to date being here. I have included some other places where you might want to go to learn more about Power BI

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

The Changing Forecast of Power BI

On Thursday, I have the opportunity to talk about parts of Power BI which reside outside of the Excel. I’ll be showing the some of the web components of Power BI and how to use them. Because the install base of the Power BI tenant is not nearly as high as the number of people who have used the Power BI components within Excel, this webinar will highlight the more exotic features. When I signed up to do this webinar, I planned to showcase forecasting and hindcasting as they are neat ways of visually showing predictive data analytics. Having used them before, I thought that demoing their capabilities once again would be a good way to show some of the neat reasons that you might want to use Power BI. Unfortunately, I won’t be able to demo these features.

Forecast Maintenance

Imagine my surprise when I saw this announcement from Microsoft stating that the Power BI Forecasting feature was “offline for maintenance”. Since the announcement came in December, I originally thought that this would be no big deal. Surely the feature would be back online soon. I thought wrong. The forecasting feature is gone. When using Power BI, the option never appears. It’s like it never existed.   In the excitement to release Designer, the non-excel dash boarding feature, the disappearance of the previously released tool has not been highlighted, until now.

The Dark Cloud of Iterative Releases

When the Forecasting feature was released in May, there was a lot of buzz about the fact that Microsoft was able to release a great new feature in time to coincide with the PASS BA Conference. Well, the pulling of this feature is the Nimbostratus of the cloud release, bringing rain. There are a lot of features available for me to show in Power BI, so I still have a lot of demo material, but I find this development to be rather troubling. Here’s hoping that this giving and taking of features in Power BI doesn’t repeat itself.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power BI’s First Year in Review

Since 2014 was the year Power BI was released, a lot of changes have been made, that I thought it might be interesting to review. For all who saw this headline and thought, Well technically Power BI is not really a year old because it was released in February, allow me some creative license to perform some rounding. Listing the number of changes released points to the beauty of the iterative release schedule which allow changes to be rolled out whenever Microsoft is done with them, rather than at fixed intervals, which means changes happen often. There were more updates made than I listed here, but space and TL:DR limited me, and so only selected items are reviewed.

Timeline

January 6th – Microsoft releases pricing info for Power BI.  This is one of the final steps taken before the product was released

February 10th – Power BI is released. Since Power BI was revealed and available for preview on July 8, 2013, the release wasn’t a surprise.  Having a released product meant that the previews people had been playing with for free for a while would be going away soon.

March 3rd – There were a lot of little updates made in this release.  The one that I thought was had the most impact were the changes made to HTML5 rendering, which got a lot better.  Q&A features were also notably improved. Microsoft touted the new connectivity to Business Objects, for those people who wanted it, that is probably quite useful.

April 1st –SQL Server 2014 is released.  I realize that this isn’t directly related to Power BI, but the products are designed to work together, which is why I selected to include it.

April 18th – Microsoft announced that they were not going to remove Power Map from Excel in Office 365. In January, when Microsoft announced  that Power BI was going to be released, they also made this announcement “On May 30, 2014 if you have the Power Map Preview installed, it will no longer work in any non-Office 365 subscription version of Excel”.  As you might guess, this announcement made people unhappy, and they complained to Microsoft. These complaints were heard and the policy was changed.  There is a perception sometimes that when you complain, no one hears you.  At least this one time, complaints were heard and changes made.  Here’s the caveat for Power Map for Excel users, you don’t get any of the new Power Map upgrades. The cool features added later can only be used by Power BI users, like the new feature of the ability adding sound, a new Power Map feature in this release.

May 8th – Forecasting in PowerView was demo’d at the PASS Business Analytics Conference.  Forecasting added the Predictive Analytics to Power BI.  What was nice about the new algorithm, not data mining, is that it also included Hindcasting to validate the predictions.  More information on this feature can be found here.

May 29th  – The Power Query Updates here made it easier to Query information from the Azure Marketplace as well as remember what you queried recently, which was most helpful. Q&A was also updated to have a wider number of natural language queries too.

August 19th – Skipping over the summer release stuff, the data source changes included in this release were really cool.  Now you can schedule a refreshes from a number new sources.

September 9th – Custom maps were added in this update.  Now instead of maps being limited to the world, you can map inside of buildings.  I know of a casino operation that really wanted this earlier.  This is another cool feature not available to Office 365 excel users.

September 30th – The Power Query update can be summarized as Seeking and Spamming. Finally Microsoft made it possible to not get updates when you didn’t want them.  The data source settings can now be sorted and searched, which is really helpful when you have a lot of sources.

October 13th  – Microsoft is making nice with the juggernaut SalesForce by adding Power Query support for data within it  Since it seems SalesForce is everywhere, being able to query the data will definitely be handy.

November 13th – Updates to Power Map mean that you can now play individual scenes instead of playing through the entire tour. This is a most helpful feature when editing a PowerMap.

December 19th – New Dashboards and Native PowerBI for the iPad.  Microsoft now has an Apple app so that you can see your data visualizations on Apple, just in case you wanted to do that.  While still in their infancy, the new dashboards are the first break PowerBI is really making from Excel as you don’t need it anymore to create visualizations.  I wrote about that new feature here.

New Year, New Updates

Looking back at everything that has happened this year, I can only imaging what next year will bring.  Given the current preview release status of the new dashboarding feature, it would be logical to assume that more changes will be added to it.  To be sure, when new things are added to PowerBI, you will be able to find more about them here.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur