Articles

Creating a Successful BI Project starts with Data Modeling

PowerBIDesktopModel

Recently I was helping someone debug an Analysis Services Multidimensional project, and didn’t come up with much to help the performance. Why? The underlying data model was completely unwieldy and the fix, which no one wanted to do, was to redo it completely. Having worked recently with a number of business analysts to migrate there Excel spreadsheets to Power BI to support the growing trend to Self-Service Business Analysis, has made me think a lot about what makes a project a success.  Self-Service BI has been hyped as the way that analysis can better do their job and not involve technical resources.  While I support the move to the Analysts being more involved in with the data to make good decisions using the data, these kind of projects still need experienced data professions help them make a the project a success. There isn’t a tool which can fix a project with a bad data model. The problems the analyst have are not so much with learning the tool, as Power BI was designed to be easy to use. The problem is with data modeling.

Reporting Views; Modeling for the Moment

A lot of business reporting is developed by using the following process, which you may find where you work. The database team can’t keep up with the report requests, so they create a number of views and provide business analysis with some tool, be it Report Builder, Excel or Access to gather the data to do reports. This method provides the ability for analysts who don’t know much about data modeling to create reports based on the information is provided. This process works for a while. As long as the data people need to do their jobs is provided, reports are created and the Database team doesn’t have to be involved. This whole methodology starts blowing up over time. Why? The reporting time starts to increase.

The Reporting Time Explosion

Once I was working at a company where the person in charge of doing the performance reporting went on an extended medical leave, and trained someone else on what was required to get the data and create the reports. She gathered data from this system and that system, added in some information on a spread sheet, ran some macros did some queries, updated some Excel spreadsheets and after that the reports were generated. This process required three hours every day to do this complicated series of task and a full week for monthly reporting. It took all of about two days for her replacement to be overwhelmed, and the task of doing the reports came to me. After a week, I had gathered all of the data together for the daily reporting and automated it, which took the daily reporting process from three hours a day to seconds of computer time. It took a couple of stored procedures, some SSRS reports and a new process for storing the data not in an Excel Spreadsheet, but in the application where it was supposed to be entered. By the end of the second week, the monthly reporting was completed as well. A task which took the majority the time person spent her day, was automated to button clicks in less than two weeks. Why? The task of gathering the data was given to someone who understood databases and data modeling. That’s the knowledge that is needed to set up a successful BI Project.

Business Knowledge needs to be combined with Technical Knowledge

To be an expert at something takes time and focus. There are only so many hours in a day, and if you are focused on spending those hours on creating technical solutions, you are bound to get really good at applying technical knowledge gained to solving problems. Likewise, if you spend all of your day looking at the data trying to solve business problems and answer questions about how the decisions made impact the data, you are going to get really good at analyzing business data. Tools help provide the ability to answer questions, which can be answered because the data model supports the type of analysis needed. To figure that out, someone who knows about data modeling needs to be involved to ensure the Self Service business intelligence project has a good foundational data model.  If that’s not there, it doesn’t matter what the tool is, the project won’t be successful.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

Creating a Date Dimension with a CTE

I am looking forward to SQL Server 2016, which includes an update to DAX which will generate dates for you, but since it hasn’t been released yet, the need still exists to create a date Table. Since I am creating a Tabular Analysis Service instance, an identified date dimension is needed for time functions. There are a number of different ways to create a date dimension, here I am going to take advantage of recu There are a number of ways you can create a date dimension, here I am going to describe a neat way of creating one using a recursive CTE, which decreases the SQL code required to generate one.

Date Dimension Fields

There are a number of fields which are pretty standard for date tables. The fields can vary if you need to add things like Fiscal years or Month End dates which have business specific rules like last Friday of the month. The most common fields needed for a Date Dimension are listed here.

  • DateKey – Integer key representing the date, for example 20150524
  • CalendarDate – Date time field
  • CalendarYear – Four digit year
  • QuarterNumber – Number from 1 to 4
  • QuarterName – First Quarter, Second Quarter, 1 Qtr 2015, First Quarter 2015 are some of the ways the name is listed. Pick whichever format people would like to see.
  • MonthNumber – 1 for January, 2 for February, 3 for March etc.
  • NameMonth –   January, February, March etc.
  • MonthDayNumber – July 15 would have the value of 15 here
  • WeekDayNumber – Date of the Month. July 4 would have a 4 here.
  • CalendarWeekofMonth – Counts the weeks within each month
  • NameDay – Monday, Tuesday etc.
  • DayNumber – Whether you choose 1 for Sunday or 1 for Monday is a business rule you need to find out first. This example shows 1 for Sunday, 2 for Monday etc.
  • YearDay – Sometimes referred to as the Julian number this counts the days from 1- 365 and can be useful in some calculations
  • YearWeek – Counts the weeks from 1 -52

As a best practice, remember reserved words such as DATE or Index or any other word which shows up as blue or pink in SQL Server Management Studio, should never be the names of any columns. Always select a name other than a reserved word when creating tables. Also friendly names containing spaces are great for exposing to users, but they are annoying in SQL Server, so leave the spaces out of the column names. Evaluate what dates you are going to be needing in the table so that you don’t have to go back and redo it. I am creating a smaller table here, but that is just because it is an example. Look at the dates you will be storing in your data warehouse when determining your starting dates, and set the end dates for probably about five years longer than you think the data warehouse will still be in use.

Using a CTE to Generate a Date Dimension

CTEs, which Microsoft added in 2005, is a great way to generate a date table by harnessing the power of computer to spin through your code, decreasing the need to write a lot of code. I am using a recursive CTE method here first published by Itzik Ben-Gan to generate the number table being used in the code below.

CREATE TABLE [dbo].[DimDate](
[DateKey] int NULL,
[CalendarDate] [smalldatetime] NULL,
[CalendarYear] [int] NULL,
[QuarterNumber] [int] NULL,
[QuarterName] [varchar](14) NULL,
[MonthNumber] [int] NULL,
[NameMonth] [nvarchar](30) NULL,
[MonthDayNumber] [int] NULL,
[CalendarWeekOfMonth] [int] NULL,
[NameDay] [nvarchar](30) NULL,
[DayNumber] [int] NULL,
[YearDay] [int] NULL,
[YearWeek] [int] NULL
) ON [PRIMARY]

GO

/*Make sure you change the start and end dates listed here to the dates you wish to use in your table*/
DECLARE @StartDate smalldatetime = '01/01/2014'
DECLARE @EndDate smalldatetime = '12/31/2016' ; /* don't forget the semi-colon or you will get an error*/

/*This CTE is used to create a list of numbers used to generate the calendar*/
WITH
A00(N) AS (SELECT 1 UNION ALL SELECT 1),
A02(N) AS (SELECT 1 FROM A00 a, A00 b),
A04(N) AS (SELECT 1 FROM A02 a, A02 b),
A08(N) AS (SELECT 1 FROM A04 a, A04 b),
A16(N) AS (SELECT 1 FROM A08 a, A08 b),
A32(N) AS (SELECT 1 FROM A16 a, A16 b),
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM A32),
/*Calendar dates are created here*/
CalendarBase as (
SELECT
DateKey = n
, CalendarDate = DATEADD(day, n - 1, @StartDate )
FROM cteTally
WHERE N <= DATEDIFF(day, @StartDate , @EndDate +1)
)

/*Using the list of dates created above will populate your date table here*/
insert into dbo.DimDate(DateKey
,CalendarDate
,CalendarYear
,QuarterNumber
,QuarterName
,MonthNumber
,NameMonth
,MonthDayNumber
,CalendarWeekOfMonth
,NameDay
,DayNumber
,YearDay
,YearWeek)

SELECT
DateKey       = CONVERT(char(8), CalendarDate, 112)
, CalendarDate
, CalendarYear  = YEAR(CalendarDate)
, QuarterNumber =  (DATEPART(QUARTER,CalendarDate) )
, QuarterName = 'Quarter ' + cast((DATEPART(QUARTER,CalendarDate) ) as char(1)) +' ' + cast(YEAR(CalendarDate) as char(4))
, MonthNumber = MONTH(CalendarDate)
, NameMonth     = DATENAME(Month, CalendarDate)
, WeekDayNumber   = DATEPART(DAY, CalendarDate)
, CalendarWeekOfMonth = DATEDIFF(week, DATEADD(day,1, CalendarDate - DAY(CalendarDate) + 1) -1, CalendarDate) +1
, NameDay       = DATENAME (Weekday,CalendarDate )
, DayNumber       = DATEPART(Dw, CalendarDate)
, YearDay       = DATEPART(DAYOFYEAR, CalendarDate)
, YearWeek      = DATEPART(WEEK, CalendarDate)
FROM CalendarBase

After running this code you will have a date table created and loaded.
Yours Always

Ginger Grant

Data aficionado et SQL Raconteur