Articles

Exploring Azure Synapse

In the past few months, I have been examining Azure Synapse and what it can do.  When it was first released in November of 2019, the first functionality that was rolled out was an update of Azure SQL DW.  For this reason, many people think that Synapse is just an improved version of a cloud data warehouse.  Microsoft did improve SQL DW when it moved it to Synapse.  The biggest architectural design change is the separation of the code from the compute, a theme with many web projects, which allows the compute power to be increased when need dictates and scaled down when computing needs change.  Within Synapse, resources are allocated as Pools and you can define a sql pools to run data warehouse and later change the compute to a different resource.  You will still need to partition your DW as large datasets require partitioning to perform well.  Subsequently Microsoft Released the Azure Synapse Studio to be a container for a larger environment of tools and notebooks to interact with them.

Non-Data Warehouse Elements of Azure Synapse

To me the more interesting parts about Azure Synapse have nothing to do with data warehouses.  Azure Synapse also contains the ability to query files stored in Azure Data Lake Gen 2 as if they were SQL files. This is a great way to analyze large data without first cleaning it up and putting it into a relational environment. Within Synapse you can formulate a query using syntax for selecting parts of files, providing the ability to look at many files as if they were one. You can also create processes which bring data into your synapse environment using Orchestration, a process that people who are familiar with Azure Data Factory will find very familiar. Synapse also contains the ability to analyze data in Cosmos DB without doing ETL or moving the data at all using a scalable architecture which will not impact the transactions being processed simultaneously on the same Cosmos DB.

Azure Synapse and Spark

Check out the promotional video here

By far the most interesting component of Azure Synapse is the Spark connection. Microsoft has added the ability to create Spark Pools into Azure Synapse.  To be honest I was somewhat surprised that this functionality is included here first and not in Azure Machine Learning, where to use Spark you need to access clusters created them in Databricks.  Spark provides the ability to dynamically scale resources when running processes.  This is very handy when writing machine learning code which can really use the performance improvements Spark brings.  Because this is Microsoft’s Spark, you can also write your code to access it in .Net if you like, in addition to the more common Spark Languages, Scala, R or Python.  You can also incorporate the AutoML API created for Azure Machine learning in R and Python so that you can use the power of Azure to select your algorithm and hyperparameters instead of spending time doing it yourself.

Getting up to Speed with Synapse

There is a lot to learn when it comes to Synapse as it combines a lot of different components into one environment. As more and more data is being migrated to the cloud, it is uniquely designed to handle both big data components containing raw data, managed data lakes as well as more traditional data warehouse needs.  It can also be the location where all of the data is processed, secured, cleaned and analyzed using Machine Learning. There is a lot to cover and since it is new, there is not a lot of places yet where you can learn more about it.  If you are interested in a deep dive on Azure Synapse and how to use it in a Modern Data Warehouse, sign up for my precon at PASS Summit 2020 where I will cover the topic in depth.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Gaining Better DAX Skills

If you work with Excel Pivot Tables, Power BI, or Analysis Services Tabular more than likely you are working to improve your DAX skills. I know I am. While I do a lot of work with DAX, I am always looking to get better at writing it and better at explaining how to write it for clients or class attendees.  As I find things which I think others may find helpful, I write about them here to help spread the knowledge, especially to those people who have background in SQL.

DAX History

Microsoft has been working on the DAX language for about 10 years and released it in as part of Excel 2010 Power Pivot Add-in, back when Microsoft was incorporating Power BI functionality in Excel. Microsoft wrote DAX so that it would be like Excel so that people who are familiar with Excel would have an easier time writing it. While DAX is easier than MDX, it is enough different from Excel that it can get quite complex. One of the founders of the DAX language is Jeffery Wang, who is still actively working on enhancing the language further.  He works not only with developers but also people in the technical community who work extensively with DAX to get their input. I also found out recently that he took time out to do a technical review of a book on the DAX language by Phil Seamark b | t.

Leveraging SQL Knowledge to Learn DAX

If like me, your skills like not in Excel but in SQL, reading a book which builds upon SQL knowledge to understand DAX is really helpful.  Beginning DAX with Power BI provides examples of how to write something in SQL then shows how the same functionality would be written in DAX, sort of like a SQL-to-DAX language library.  Showing how to get the answer from SQL and DAX is a great technique to provide a better understanding of how to write DAX. I worked a client recently who validated all of the DAX in the tabular model by writing SQL to ensure they were getting the correct answers. I think the validation process would have been improved with this book which takes DAX out of the black magic category and explains the way it works.

Performance Tuning DAX

Once you start writing a lot of DAX, you will at some point write a query which performs badly.  What do you do to avoid poorly performing DAX? Phil provides some “old-school debugging” techniques to understand what is being generated and where it is going wrong. He also shows how you can use DAX Studio and SSMS to help figure out how and why your DAX is working the way it is. You will definitely want to reference Chapter 8 to learn more about this important topic.

Other Methods for Learning DAX and Improving Power BI skills

If you are looking for a hands-on environment to better understand how DAX works and how to improve your Power BI skills I have two opportunities for you in the US. I will be giving Advanced Power BI Classes in Columbus, Ohio on July 27 and Boston, Massachusetts on September 20.  Of course we will be covering more advanced DAX development and tuning as well as applied visualization theory to improve the interactivity of your Power BI reports as well as hands on labs to implement the new drill-through features and enhancements in Power BI and other class topics.  I hope to meet you there or at some of the other conferences I will be attending this year.  Check out my Engagement page and hopefully find a place where we can meet up as I would love to meet you.

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Applying Data Science to SQL Server

Data has been getting a lot of attention in the business world for a while now. First there was big data, which was another way to store data so that later the data could be analyzed. Recently the talk has been all about analyzing the data with new tools such as R and Python. The reality is that people who have been working with databases doing work in business intelligence have been analyzing data for a while. Learning a different toolset for analyzing data is not such a big leap, but an expansion of what they know.

As the field is rapidly expanding now, and demand is huge, now is a great time to learn the tools. With the advent of more advanced software, it seems essential to create fast and reliable processing mechanisms. In addition, concepts such as Web3 seem to be developing and becoming realities at an accelerated rate. Data plays an essential role in the development of such marvels of information technology. It is high time companies consider combining traditional data science tools, like MATLAB, Apache Spark, and SQL, with modern tools, like web3 sql. This could help them organize and process the data at relative speeds while reducing data analytics costs.

Traditional Data Science Development

Data scientist have created analysis solutions with data for a number of years. The data is analyzed, cleaned, processed with various algorithms, and results are created. When the process is complete, code has been created to provide meaning from a portion of the data and is ready to be migrated to production. Traditionally there has been a big gap between creating a solution and implementing the solution to be run against data on a regular basis. Data Scientists traditionally are not part of the IT organization, they are actuaries or analysts, not the people who have anything to do with system processing. Recently I did some work for a company and after the data scientists were done creating a solution, they turned over all of their code to the Java team. Six weeks later the code was released into production. This solution made no one happy. Management thought it took too long. The data scientist didn’t believe that the code that they created was what was implemented into production, and the java developers were tired of people blaming them for wrong code which required a long time to implement.

SQL Server Implementation of Data Science

Since SQL Server 2016 incorporates R and SQL Server 2017 has added the ability to include Python code into SQL Server, data science solutions can be incorporated as part of a scheduled process with SQL Server. There is now a dev ops solution for incorporating R and Python into SQL Server. One way of learning about the technology is through blogs and other online training which can help you get up to speed. Many times though there is no substitute for hands on learning. If you are attending PASS Summit 2017, and want to learn not only about data science, but how to incorporate it into SQL Server, I hope you can sign up for my all day training session on Applied Data Science for the SQL Server Professional. I hope to see you there.

I have recently created a You Tube channel where I plan on sharing more data related content where I have included my first video about this conference.

If you are at PASS Summit, please introduce yourself as I would love to meet people who read my blog personally.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Power BI – Beyond the Basics

When helping clients recently with their Power BI implementations, I have noticed that when talking to people about Power BI there seems to be some areas where there continues to be a log of questions.  While it is easy to find a plethora of information about getting started with Power BI, when it comes to implementing a solution, the information is scarce.  How do you handle releases? Should an implementation contain only one data model? Is Power BI’s data secured on the cloud? Is it required to have Office 365 use Power BI? Do you have to have Power BI Premier to have the Power BI run locally?

Advanced Power BI Techniques in Norway

While I have discussed some best practice techniques on my blog, as usual new features released in Power BI have a

Norway Parliament Building in Oslo

Norway Parliament Building in Oslo

tendency to change some of the available options.  For example, App workspaces, the updated take on Content Packs released a few months ago, now offer a new method for releasing not only dashboards but the reports behind them and the ability to easily migrate sources. I am excited that I will have the opportunity to discuss the answers to the questions received by doing a full day of training at SQL Saturday Oslo. I am looking forward to visiting Oslo, which is home to the best preserved Viking Ship, an Opera House designed to be walked on and the home of the guy who painted the Scream.  If you happen to reside somewhere where it is possible to make the journey to Norway, please register to attend this full day of interactive training.  We will cover all of these items and go into detail about Power BI administration, security and new features and design techniques which will improve Power BI implementation techniques.

sqlsat667_osloFor those of you who are unable to attend, I feel obliged to answer some of the questions I posed earlier.  Implementations generally require more than one data model.  Power BI is encrypted both in transit and at rest. You do not need to have Office 365 to run Power BI.  Power BI can be run locally with Power BI Report Server, which is part of SQL Server 2016 Enterprise with Software Assurance, and you do not need to sign up with Power BI Premier to install it.

I hope to see you in Norway.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Upcoming and Recent Events

24HOPPassSpeakingThe PASS organization is a professional organization which sponsors a number of different technical events in the technical community. Recently, I have been honored to be selected to speak at not one but two events hosted by PASS, a professional organization which provides a lot of great resources to improve knowledge of all things SQL Server and related technologies to the world. The PASS Business Intelligence Chapter provides training on all things related to Business Intelligence via the web. I was selected to talk at the last meeting in May. Thank you to all of the people who were able to attend my talk on Top 10 SSIS Tuning Tricks live. If you had to work, no problem all of the talks hosted by the PASS Business Intelligence Virtual Chapter Recordings are available on www.Youtube.com. The recording of my Top 10 SSIS Tuning Tricks session is available here.

24 Hours of PASS

Periodically PASS provides a 24 Hour Training session on SQL Related topics to provide training live to every time zone in the world. As this event is watched by people around the world, it is a real honor to be selected for this event. This time the speakers were selected from people who had not yet spoken at the PASS Summit Convention, as the theme was Growing Our Community. The theme is just another way the PASS organization is working to improve people’s skills. Not only do they provide the opportunity to learn all things data, but also provide professional development through growing the speaking skills by providing many avenues to practice these skills.

Data Analytics with Azure Machine Learning

My abstract on Improving Data Analytics with Azure Machine Learning was selected by the 24 Hours of PASS. As readers of my blog are aware, I have been working on Azure Machine Learning [ML] this year and look forward to discussing how to integrate Azure ML into current environments. Data analytics with ML are yet another way to derive meaning from data being collected and stored. I find the application of data analytic fascinating, and hope to show you why if you are able to attend. There are a number of wonderful talks scheduled at this event, so I encourage you to check out the schedule at attend as many as you can. To be sure I’ll be signing up for a number of sessions as well.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Azure ML, SSIS and the Modern Data Warehouse

Recently I was afforded the opportunity to speak at several different events, all of which I thoroughly enjoyed. I was able to speak on Azure Machine learning first at the Arizona SQL Server Users Group meeting. I really appreciate all who attended as we had quite a crowd. Since the meeting is held MachineLearningTalkpractically on Arizona State University’s Tempe Campus, it was great to see a number of students attending, most likely due to Ram’s continued marketing efforts on meetup.com. After talking to him about it, I was impressed at his success at improving attendance by promoting the event on Meetup, and wonder if many SQL Server User Groups have experienced the same benefits. If you have, please let me know. Thanks Joe for taking a picture of the event too.

Modern Data Warehousing Precon

The second event where I had the opportunity to talk about technology was at the Precon at SQL Saturday in Huntington Beach, where I spoke about Modern Data Warehousing. It was a real honor to be selected for this event, and I really enjoyed interacting with all of the attendees. Special thanks to Alan Faulkner for his assistance. We discussed the changing data environment including cloud based storage, analytics, Hadoop, handling ever increasing amounts of data from different sources, increasing demands of users, the review of technology solutions demonstrate ways to resolve these issues in their environments.

Talking and More Importantly Listening

The following day was SQL Saturday in Huntington Beach #389. Thanks to Andrew, Laurie, Thomas and the rest of the volunteers for making this a great event as I know a little bit about the work that goes into planning and pulling off the event. My sessions on Azure ML, Predicting the future with Machine Learning and Top 10 SSIS Tuning Tricks were both selected and I had great turnout on both sessions. To follow-up with a question I received during my SSIS Session, Balanced Data Distributor was first released as a new SSIS transform for SQL Server 2008 and 2008 R2, so you can use it for versions prior to SQL Server 2012. I’ve posted more information about it here. I also got a chance to meet a real live data scientist, the first time that has happened.  Not only did I get a chance to speak but a chance to listen. I really enjoyed the sessions from Steve Hughes on the Building a Modern Data Warehouse and Analytics Solution in Azure, Kevin Kline on , and Julie Koesmarno on Interactive & Actionable Data Visualisation With Power View. As always it’s wonderful to get a chance to visit in person with the people who’s technical expertise I read. In addition to listening to technical jokes which people outside of the SQL community would not find humorous, it’s great to discuss technology with other practitioners. Thanks to Mr. Smith for providing me a question which I didn’t know the answer, which now I feel compelled to go find. I’ll be investigating the scalability of Azure ML and R so that I will be able to have an answer for him next time I see him. I really enjoy the challenge of not only investigating and applying new technology but figuring out how to explain what I’ve learned. I look forward to the opportunity to present again, and when I do I’ll be sure to update this site so hopefully I get a chance to meet the people who read this.
Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

What is a Modern Data Warehouse?

As I was honored enough to be selected to give a PreCon on the Internals of the Modern Data Warehouse, I thought that I would take the time to explain why I felt drawn to the topic. There are a lot of places that haven’t given much thought to the changes in technology which have happened over the last few years. The major feature upgrades to SQL Server in 2012 and 2014 have meant that they can use column store indexes which makes things faster and maybe better High Availability. While those things are certainly valuable improvements there is a lot more that you can do to derive value from your data and companies want more than just a well-organized, running data warehouse.

Data is a Valuable Asset

In 2010, Borders Group Inc. was allowed by the Federal Trade Commission to sell their customer information to Barnes and Noble as part of their bankruptcy sale of their assets. In 2015, RadioShack is doing the same thing. Businesses understand that data is valuable and they are interested in using it to drive decision making. Amazon, Netflix and Target are well known for their use of customer information to drive sales, but they are far from the only ones doing this. This is one of the bigger trends identified recently in the business press. The heads of companies are now looking for their data teams to do more with their data so that they too can have the dream information systems they are reading about.

Total Destruction of the Existing DW is Not Required

Excavator working with earth and sand in sandpitWhile a lot of the time, it might be nice to level everything and start over, that is not always an option. The major reason for this is that the data warehouse environment already in place has a lot of value. You want to add to the value already there, not destroy what you have. Also it would take a long time to recreate the environment and no one is patient enough to wait for that. Alternatively you could expand into areas of new technology as your data grows. Perhaps this mean you archive some of your data from your database to a Hadoop cluster instead of backing up the data in some far off location. This would allow you to use Sqoop to bring the data back when you need it, providing ready access to the data. Perhaps you want to provide the users more self-service BI capabilities, moving the data analysis into the hands of the people who are more familiar with the data? You could add the capabilities of Power View in Excel, Power Designer or Tableau to your environment.

Incorporating Social Media Information

The business world operates not only on a batch cycle. More and more companies want to know what is being said about them so they can respond appropriately. With tools like Azure Event Hubs, Data Factory, Streaming Analytics, and Machine Learning this isn’t as hard to do as it might sound. We’ll review these products so that attendees will understand how these tools can provide greater insight not only into their own data, but the data building about them outside of the company firewall.

For More Information

I really hope you can join me in Huntington Beach on April 10 for a full day of exploring these concepts. I always look forward to events like the precon and of course SQL Saturday #389 – Huntington Beach which is the following day.

 

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Complex Data Analysis and Azure Machine Learning Presentation Wrap Up

Thank you for all of the people who signed up for my webinar on Data Analysis with Azure Machine Learning [ML]. I hope after watching it that you find reasons to agree that the most important thing you need to know to get started in Machine Learning is not Math, but having good knowledge of the data you want to analyze. There’s no reason not to investigate as Azure Machine Learning is free.  In order to take more time with the questions after the presentation than the webinar format allowed,  I am posting my answers here, where I am able to answer them in greater detail.

How would one choose a subset of data to “train” the model? For example, would I choose a random 1000 rows from my data set?

It is important to select a subset of data which is representative of the data which wish to evaluate. Sometime a random 1000 rows will do that, and other times you will need to use other criteria, like transactions throughout a given date range to be a better representative sample. It all comes down to knowing your data well enough to know that the data used for testing is similar to what you will be ultimately using for analysis.

Do you have to rerun or does it save results?

The process of creating an experiment requires that for each run you need to re-run the data as it does not save results.

Does Azure ML use the same logic as data mining?

In a word, no. If you look at the algorithms used for data mining you will see they overlap with some of the models available in Azure ML. Azure ML provides a richer set of models, plus a greater ability to either call models created by others or write custom models.

How much does Azure ML cost?

There is no cost for Azure ML. You can sign up and use it for free.  Click here for more information on Azure ML.

If I am using Data Factory, can I use Azure ML ?

Data Factory added the ability to call Azure ML in December, providing another place to incorporate Azure ML analytics. When an Azure experiment is complete, it is published as a web service so that the experiment can be called by any program which chooses to call it. Using the Azure ML experiments from directly within Data Factory decreases the need to write custom code, while allowing the logic to be incorporated into routine data collection processes.

http://azure.microsoft.com/blog/2014/12/16/azure-data-factory-updates-integration-with-azure-machine-learning-2/

If you have more questions about Azure ML or would like to see me present on the topic live and live in Southern California, I hope you can attend SQL Saturday #389 – Huntington Beach where I will be presenting on Azure ML and Top ten SSIS tips. I hope to see you there.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

Tips on SSIS at SQL Saturday Albuquerque

sqlsat358_ABQOn February 7, I was fortunate enough to be selected to speak at SQL Saturday in Albuquerque, New Mexico on Top 10 SSIS Tuning Tricks. Having worked with SSIS for a number of years, I’ve needed to research what was the best methods to employ to ensure my SSIS ETL was running optimally. I’ve compiled the most valuable items, with examples of course, into this presentation. I’m assuming that everyone attending already has been using SSIS for a while, so I will skip straight into more in-depth ways of tuning SSIS. One of the questions that I know I have heard most often is “When should I do X in SQL or SSIS?” If you are able to attend this session, you will have the answer to that question.

I really enjoy the opportunity to speak on data related topics and meeting people who may have come upon my blog in the past. Having spoken at this event last year, I know what a good job Keith, Chris and Meredith and friends do organizing this event. I want to take the time to say thank you for all of your hard work as I really appreciate it. These events are a great place to learn and keep up with a lot of the changes going on in the industry. I anticipate there will be many lively discussions both before and after the event. That reminds me. If you get a chance, on Friday there are two great precons scheduled on Friday, February 6th , Powershell Basics with Mike Fal and Query Tuning, Troubleshooting and Execution Plans with Jason Kassay. Having been fortunate enough to meet both of them, I know that they are both extremely knowledgeable in their respective topics, and if you are in Albuquerque I encourage you to sign up for either of them as I am sure both will be excellent.

I hope that you will be able to attend as I know I will enjoy seeing you there.

 

Yours Always
Ginger Grant
Data aficionado et SQL Raconteur

Presentation Follow Up to Data Analytics and Distribution with Power BI

Thank you to all of the people who took the time to view my session on Data Analytics and Distribution with Power BI. I really enjoy getting the chance to decrease the confusion I hear regarding Power BI, and hope that you will find the question and answer section helpful if you are trying to learn more about the product.

Questions and Answers

 In Power BI Designer, is it possible to manipulate the colors of the charts?

While this may change as Power BI Designer is still in preview mode, currently the colors are assigned automatically. As you might guess, this is a feature that other people are interested in so it is on Microsoft’s list of things to add. If Power BI Designer color selections are implemented like they are in Power View, which is very similar to Power BI Designer, it is likely themes of colors will be available, rather than the ability to pick each color like in Report Designer, but we will have to wait and see.

Are Power BI designer/dashboard changes specific to each user?

If you have created a Power BI Designer Dashboard, you have the ability to share it with people in your organization. There are a couple of things that need to happen for this to work. The people that you share it with must have Power BI accounts, and they have to be in the same domain as you are. When you share the reports they are only able to read, not edit them. For more information regarding security and Power BI, see Microsoft’s guide here.

Can I grant access to users outside of our domain?

Power BI’s security model is a separate tenant from the security model for SharePoint in the Office365 cloud, but they are related as you can only grant access to Power BI if those users are able to access your version of Office 365 SharePoint. As stated in the previous question, for Power BI Designer Dashboard, the users must be part of the same domain.

How does Power BI perform predictions? Is it the same logic which is used in Data mining?

Power BI uses the Forecasting and Hindcasting features to perform predictive analytics. There are a number of different analytical categories, and the kind used in Power BI use Time Series. As the name suggests, Time Series models analyze a set of measurements performed over time to determine patterns in the past which can be used as guides going forward. Data Mining looks at variables, which may or may not include time, as it looks for patterns throughout the data. These underlying statistical models are not the same.

Does Power BI have Power Map feature?

Power BI definitely contains Power Map. In fact Power Map is only supported in preview mode if you do not have Power BI. This link can provide more information about the limits of Power Map in Excel. Power Map is designed to be run as a movie, and provide a directed look of the data on the map, rather than providing the interactive drill down mapping features which are available in Power View. You can share a Power Map by saving it as an mp4 video file and posting it anywhere. There are a number of Power Maps on You tube if you care to search there.

How does “R” play here?

The R language is completely agnostic as to what is the source of its data, you have the ability to use excel if you want to as a data source. If you want to use R within Excel, try the RExcel add-in, which is available here.

Is the PowerPivot where the data is stored for Power BI Designer?

No. Data can come from anywhere, not just from Power Pivot in Excel. For example, if you want to use a website as a data source, you could do that too as there are a number of different available data sources, and that is one of many.

Do you have to use Power BI Designer on the Web?

While Microsoft has designed the Power BI Designer as a web project, so that you can create Power BI Designer Dashboards as part of the preview, there is also an application available for download here. The desktop application works very similarly to the web version, with the exception of course that you will need to upload and Select Power BI Designer file as your data source. Should you wish to modify the dashboard once it is loaded, you can do so.

How would you determine anomalies or freak instances in data versus true trends?

The problem of determining anomalies is one which the practitioners of predictive analytics are constantly struggling. For trending to occur, the numbers of what was previously considered an anomaly need to increase. Forecasting within Power BI applies one of the more standard methods for accounting for anomalies, looking at the standard deviations and probability. The likelihood a number will fall within a certain range of numbers are based upon the number of times this has happened in the past, which is graphed as a bell curve. The values representing the far sides of the Bell curve are discarded, which is known as variance, which in Forecasting in Power BI is represented as 1δ .The number increases with the more variance you wish to represent.

Does SharePoint on premise support Power BI Designer?

No. Power BI Designer is currently in preview version in the US for Power BI subscribers. You can download the application to play with it.

Will Power BI be available in next version of SharePoint?

While I cannot speak for Microsoft, I can tell you that it isn’t there now. For more information on SharePoint, check out their website here.

Does this work in Office 365 SharePoint? And this replace the bi feature offered in SharePoint on the cloud?

Since I do not work for Microsoft, I am hesitant to talk much about how their licensing plans really work. For more information, please check out their website.

If we want to start learning Power BI, where do we start?

There are a number of great places to learn about Power BI, the best and most up to date being here. I have included some other places where you might want to go to learn more about Power BI

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur