Limitations in Time Series Data Analysis and the growth of Advanced Data Analytics

As someone who regularly analyzes data, I have done my share of time series analysis to determine trends over time.  I am struck by the fallibility of this sort of analysis.  For those who are unfamiliar with this time of analysis, time series analysis is performed to try to identify patters in the noise of data to help predict future trends through the use of algorithms like ARIMA. As I kid I remember hearing in fast announcer voice the following text “Past performance is no indication of future results”.  As a matter of fact this is a rule that the SEC requires mutual funds to tell all of their investors this statement.  Yet I get asked to do it anyway.  While I enjoy working with data and using advance analysis techniques including R and Python, I think it is important to realize the limitations of this sort of analysis.  It is considered a good experiment in Machine Learning if you are 85% right.  This is not acceptable if you are talking about a self-driving car as running people over 15% of the time is generally not considered acceptable. There are times when looking at the future that the data is not always going to provide an answer.  When looking to find answers in data, that needs to be something people keep in mind.  While you can find some answers in data, other answers will require prognostication or plan old guessing.

Impact on Technology realized in 2026

http://michaeljswart.com/2016/06/t-sql-tuesday-079-its-2016/comment-page-1/#comment-186750Data analysis is all about pattern matching, and while I don’t find it to be infallible, looking at a wideset of data has led me to plan accordingly.  While I am no Faith Popcorn, my analysis of what I see in the marketplace has led me to make some changes in my own life as I believe change is coming to the industry.  Adam Mechanic’s prompting for looking ahead to 2016 has provided the impetus to publish these theories.  What I see in the marketplace is the tools which are used to support databases are improving.  I see the ability of software to provide relevant hints and automate tuning of database queries and performance to continually improve, meaning there will be less of a need to employ people to perform this task.  I see with databases being pushed more and more to the cloud and managed services less and less need to employ many people to perform dba roles. Where I see the industry moving is towards more people being employed in analyzing the data to determine meaning from it.  I see that in 2026 very little data analysis being performed with R and most analysis being performed in Python.  This means that if you are looking ahead, and are employed in areas where people are being supplemented with tools, the time is now to learn skills in areas where there is growth. If you have been thinking about learning data science, Python and advanced analytics tools now is the time to start so that you will be prepared for the future.

 

Yours Always,

Ginger Grant

Data aficionado et SQL Raconteur

Using Data Analysis to Pick Super Bowl Winners

I know that there is no way to compete with the major sports networks in the compilation of statistics about the two teams footballplaying in the super bowl. Instead I am going to focus on one feature, self-interest. Like many people, I have money in the stock market and I want my investments to make money next year. For this reason, I am an unqualified supporter of the Atlanta Falcons in the 2017 super bowl. The single data point I am using for my analysis is the fact that the falcons are an NFC Team, and when the NFC wins the stock market goes up. Go Falcons!

Correlation without Causation

Correlation does not imply causation is a common term in statistics and data analysis. It means that just because two variables move in relation to one another one does not mean that there is a cause and effect relationship between the two, even though it may seem like it. Just because when I washed my car it rained does not mean that I can control the precipitation patterns in the desert based on my propensity to visit the car wash. You may be thinking that having an NFC team win the super bowl and the stock market is an example of correlation without causation. After all the NFL does not control the world wide financial markets. If you look at the data though, 80% of the time the markets go up when the NFC wins. That is 50 years of data that supports that the winner does impact the market. Why might that be? Perhaps it follows Quantum Mechanics.

Observer Effect of Quantum Mechanics

When studying physics, specifically quantum mechanics researchers noticed that the observation changed the results. This is QuantumMechanicssomething commonly looked at when creating forecasts. Are the forecasts correct because the models are correct or because people believe them enough to make it happen. The superbowl winner impact on the stockmarket is well known. Perhaps it is for this reason that it becomes a self-fulfilling prophesy. This is the entire belief of many self-help ideas. If you believe it will happen, work to make it happen, it will happen. For whatever the reason, one cannot ignore 50 years of data.

Perhaps Patriots fans may think that I am pulling a lot of esoteric facts out of the air because I want the Patriots to lose. In all seriousness though, it is all about the data, and the observable effects of data knowledge. If you are watching the game and your team did not make the playoffs, and you are wondering who to root for because you do not care about the winner, perhaps this post helped you to decide.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur