Articles

Database Table Design

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750There are a number of different ways that you could decide to organize your data in a database. If you are creating a database to be used in a transactional system, your table design should follow a normalized design as much as possible.  Data should be grouped in logical groups, such as customers, products, sales, orders, quotes, tickets etc.  Redundantly repeating data in multiple places will cause problems in the future.  Your design may include hundreds of tables, and that is perfectly fine.

If the purpose of the database is for Power BI or for a data warehouse, dimensional modeling techniques should be deployed.  In this type of database design, generally speaking there are a number of tables containing descriptive data, such as product and customer and few fact tables which contain the actions which happened.  The actions include things like Sales.  The database design will look much like a star with the fact tables in the center and the dimension tables connected to it like satellites.  If you have one dimensional model connected to another dimension, that design is called a snowflake and some applications, like Analysis Services Multi-dimensional will not process it well.  Power BI and Analysis Services Tabular work very well with snowflake dimensions.

SnowflakeModel

Snowflake Data Model from Power BI

 

Table Design Gone Wrong

To paraphrase Ron White, the reason that I described database modeling is so that even people who know nothing about database design could appreciate my interview story. When I was working at a previous location, I assisted in providing technical reviews for database developer jobs.  We asked a number of typical questions about indexing and stored procedures, but I always tried to come up with at least one question which the candidate could not readily answer by cramming interview questions found on the internet.  I decided to ask one candidate, who did correctly answer the previous stock questions something that would let us know what kind of work he had really done.  I asked him “What do you do to determine how to design a table?” I was interested to find out what his thought process was, see if he would mention normal form or describe something he had done in the past.  I was completely surprised by his answer.

“Well, you can only have 256 columns in a table. After that you have to create a new one.”  This answer was a complete surprise.  I was really curious to find out where he had developed this completely warped view of how to determine what fields should go in a table.  It turned out that he learned all of his database skills from a co-worker, who had recently retired.  His co-worker had worked at the same location for a very long time and when he started used mainframes without any databases.  He had migrated some of the applications to databases and they wrote them this way because it “made sense”.   After that the interview was over, and we hired someone else.

I challenge anyone who is learning databases to please look up what people tell you to do on the internet.  This is useful for two reasons, the first being that it will probably help you learn the concept better to read about it another way.   The other reason is that you can find out if the person teaching you really knows what they are doing, so you will learn the correct way to do something.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

 

 

Calculated Tables and Role-Playing Dimensions

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Working with role playing dimensions, which are found when you have say multiple dates in a table and you want to relate them back to a single date table, have always been problematic in SQL Server Analysis Services Tabular. Tabular models only allow one active relationship to a single column at a time. The picture on the left shows how tabular models represent a role playing dimension, and the model on the right is the recommended method for how to model the relationships in Analysis Services Tabular as then users can filter the data on a number of different date tables.

TabularRolePlaying dimension Modeling

 

RolePlayingDimension

The big downside to this is one has to import the date table into the model multiple times, meaning the same data is imported again and again. At least that was the case until SQL Server 2016 was released. This weeks TSQL topic Fixing Old Problems with Shiny New Toys is really good reason to describe a better way of handling this problem.

 

Calculated Columns: The solution for Role Playing Dimensions

SQL Server 2016 provides a new method of solving the role playing dimension problem, using a calculated column. Instead of copying in the source from the date table, instead create a formula to get a copy. First switch to the data view, of the model. Then select Table->New Calculated Table. ThSSASScreenCalcTablee screen will change to the new table screen and the cursor will be pointed to the formula.

In my model I have one table called date. I am going to add a calculated table called order date. The DAX is couldn’t be simpler. Just select the table named ‘Date’ which is shown in the picture below. Rename the table to something more meaningful, like Order Date and that is it. The modeling required is the same, but now the model size does not increase to accommodate all of the date tables needed, as there is only one copy of the date table referenced multiple times. If you are using Power BI this same concept can be used for handling role playing dimensions as well.

SQL Server 2016 had a lot of great new features, and in addition to the flashy ones like R there are a lot of great enhancements to the Tabular model that are worth investigating as well.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

T-SQL Tuesday #80 – SSIS Projects, Packages, and Deployments

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Recently I was talking to someone who is just getting started with SSIS and wanted to know about the package deployment model. I hadn’t thought about a package deployment model in a while. TL;DR Do not use package deployment for SSIS. Why? Because SSIS code deployed in a project model is easier to maintain and migrate. Maintaining a large number of packages can be painful, something I know too well.

Making Maintenance Difficult One Package at a Time

Prior to SQL Server 2012, there was no project deployment. SSIS code was all deployed as packages. These packages could be stored within MSDB or they could be stored and run from the file system. In disorganized places like the one where I worked, they were deployed in both. Assuming nothing much changed since I left, they have all versions of SQL Server which were released prior to the day the new IT Director started in 2012. There was DTS on SQL Server 97, 2000 and SSIS on 2005, 2008 and 2008 R2. No reason to upgrade anything which still worked was their motto. When space was a problem, one could always go build another server. I think the LAN administrator was happiest when he was able to justify building a new server as he could spend hours shopping for parts on the internet and building the latest server.

I was given the task of supporting all of the SSIS code, which of course broke periodically. There were 300 packages on the myriad of different servers all named package1 which were deployed every way possible. As a bonus one could not trust the open source software control package to have the latest code, unless it was one I worked on previously. The hunt for where the code and the config file used to drive it, and getting access to where it was stored, was just part of the maintenance process. If one package called another package, then both packages needed to be found and reviewed. Each of those packages would have different ways of connecting to the same database too. My favorite was when one package called another package and they each used different IDs to access the exact same database. Ah the joys of troubleshooting SSIS Packages. SSISProjectIt is was on the top five list of the reason I was very happy when I quit that job.

SSIS Projects

If you create write SSIS code and use a project deployment model, you can create one data connection for all the packages which need one. The code is deployed to one place, the Integration Services Catalog. All the related code is deployed to one folder. If you need to change a connection which all the packages use, you can do it in one place. You want to pass some parameters for all of the packages to use? No problem. SSIS project deployment offers some great advantages. I cannot think of any reasons to use a package deployment for SSIS 2012 and beyond. If you are writing SSIS code it is how you want to deploy packages.

 

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur

T-SQL Tuesday #79 – Creating R Code to run on SQL Server 2016

TSQL2SDAY-300x300

As SQL Server 2016 was recently released, many people have not yet had used R with SQL Server. I thought that T-SQL Tuesday would be a great way to introduce this topic.  This post contains everything you need to run your first R program from the UI, get data from SQL Server and run the R code on SQL Server from the UI. If you are running open source R, this code will not work. If you are using Microsoft R Open, this code will not work. Only if you are running the version of R which Microsoft released with SQL Server 2016 will this code work.

The Two Versions of Microsoft R

Microsoft has not one version of R, they have two but two. These two different versions are needed because they have two different purposes in mind. Microsoft R Open, is open source and fully R compatible and is faster than open source R because they rewrote a number of the algorithms to include multi-threaded math libraries. If you want to run R code on SQL Server, this is the not the version you want to use. You want to use the non-open source version designed to run on R Server, which is included with SQL Server 2016, Microsoft RRE Open. This version will run R code not only in memory but swap to disk, to create code which can access SQL Server data without needing to create a file, and can run code on the server from the client. The version of RRE Open which is included in SQL Server 2016 is 8.0.3.

Running R on SQL Server

As a handy mnemonic device, all the RRE functions start with Rx, like prescription drugs. None of these features will work in R, unless you are using the Microsoft RRE Open version. For more information on how to set up Visual Studio 2015 to use the correct libraries, please read my previous post for instructions.

SQL Server R Code Walk-through

This code was created on a PC with SQL Server 2016 Developer Edition installed with the R tools, and the Community Edition of Visual Studio 2015. On my SQL Server instance, I have created a database called TestR and loaded the sample file AirlineDemoSmall.csv included with R server as a table with the same name. If you have SQL Server 2016 installed, the real directory for the sample files can be found here

C:\Program Files\Microsoft SQL Server\130\R_SERVER\library\RevoScaleR\SampleData

The table dbo.AirlineDemoSmall has 600,000 rows. Prior to running this code, create a table on SQL Server to hold the data. The code will load the table data and using some Rx commands, load the data from SQL Server, run the code on the R Server, and draw a histogram.

library(RevoScaleR)


sqlConnString <- "Driver=SQL Server;Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"
sqlsampleTable <- "AirlineDemoSmall"
# Set ComputeContext.
sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)


rxSetComputeContext(serverside)

sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1, table = sqlsampleTable)
rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)
rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)

Detailed Description of the R Code

To better understand each line of code, I provided the description for each line, along with some tips to resolve some possible erors.

library("RevoScaleR")

If you get an error running this line, chances are the R compiler doesn’t know where to find the library. Maybe you need to install it. If so run this command in the interactive window

install.packages('RevoScaleR')

If this command gives you an error, R can’t find where the library is. Resolve this issue by adding the path Run this command in the immediate window. Notice the slashes go the opposite way file explorer puts them

.libPaths(c(.libPaths(),"C:/Program Files/Microsoft SQL Server/130/R_SERVER/library"))

After setting the path, run the previous command to resolve the package, and then run the first line again, as this should resolve any previous errors.

sqlConnString <- "Driver=SQL Server; Server=MyLaptop\\SQLSERVER2016;Database=TestR;Uid=ReadData;Pwd=readd@t@"

This line sets the value of the connection string. I am running SQL Server 2016 on my laptop, in an instance called SQLServer2016. Notice I had to put two slashes going the wrong way to set my connection. I have hard coded a user id and password in plain text. For test, I would use a window authentication, which does require an ODBC connection so that I would not have to put the user id and password in code in plain text.

sqlsampleTable <- "AirlineDemoSmall"

This line of code sets a variable to the name of the table created in SQL Server with the data from the csv file.

sqlShareDir <- paste("C:\\Ginger\\AllShare\\", Sys.getenv("USERNAME"), sep = "")

R needs a temporary directory to serialize the R objects when the connection is created, which I am creating here.

sqlWait <- TRUE

Setting the state to wait means that I am creating a blocking transaction which will prevent the later code from being run until this statement is complete. This is a good setting for testing and if you other commands which cannot be run until you have data, such as rxHistogram which requires the dataset to wait.

sqlConsoleOutput <- FALSE

Setting the console output to false decreases the amount of informational messages I get in the immediate window. Since the messages aren’t really that helpful as they show things like how many records were read at the time, I generally set it to false.

serverside <- RxInSqlServer(connectionString = sqlConnString, shareDir = sqlShareDir,
wait = sqlWait, consoleOutput = sqlConsoleOutput)

This line uses the Revo R function RxInSqlServer (remember unlike SQL case is important) to create a connection to SQL Server, using the variables we created earlier to a variable called serverside.

rxSetComputeContext(serverside)

Setting the compute context dictates where my code is going to run. If the compute context is set to local, I am going to run on my local PC. Since I set it to the variable I set connecting my SQL Server connection, this means all of my R code will be using the available memory on the SQL Server PC, not mine. Yes,this does mean that I can starve out the resources on the server, a topic I will address at a later time. Since I am running everything on my laptop it doesn’t matter, but it could.

sqlPlaneDS <- RxSqlServerData(connectionString = sqlConnString, verbose = 1,
table = sqlsampleTable )

This line gets the data from SQL Server, using the connection string, and specifies what data to get. I could have used a query to get data as well, but in this case I grabbed everything from the table.

rxGetInfo(data = sqlPlaneDS, getVarInfo = TRUE, numRows = 3)

To validate that some data was retrieved, rxGetInfo shows the information retrieved from three rows. Why three rows? Because numRows = 3

rxHistogram( ~ CRSDepTime, data = sqlPlaneDS)

One of the big strengths of R is the ability to create data visualizations, so I felt compelled to include the command which creates a Histogram. HistogramThe ~ (tilde) is in front of the column name CRSDepTime from the table AirlineDemoSmall, and the data comes from the variable sqlPlaneDS where all of the data was loaded.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur