Three classes of metrics: centrality, volatility, and bumpiness
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional
introduz uma nova classe de estatísticas para séries cronológicas: bumpiness
All statistical textbooks focus on centrality (median, average or mean) and volatility (variance). None mention the third fundamental class of metrics: bumpiness.
Here we introduce the concept of bumpiness and show how it can be used. Two different datasets can have same mean and variance, but a different bumpiness. Bumpiness is linked to how the data points are ordered, while centrality and volatility completely ignore order. So, bumpiness is useful for datasets where order matters, in particular time series. Also, bumpiness integrates the notion of dependence (among the data points), while centrality and variance do not. Note that a time series can have high volatility (high variance) and low bumpiness. The converse is true.
The attached Excel spreadsheet shows computations of the bumpiness coefficient r for various time series. It is also of interest to readers who wish to learn new Excel concepts such a random number generation with Rand, indirect references with Indirect, Rank, Large and other powerful but not well known Excel functions. It is also an example of a fully interactive Excel spreadsheet driven by two core parameters.
Finally, this article shows (1) how a new concept is thought of, (2) then a robust, modern definition materialized, and (3) eventually a more meaningful definition created based on, and compatible with previous science.
Tags: data mining, previsão
Recurrent neural networks, Time series data and IoT – Part One
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais ensino
Utilização de redes neuronais para previsão de séries univariadas
RNNs are already used for Time series analysis. Because IoT problems can often be modelled as a Time series, RNNs could apply to IoT data. In this multi-part blog, we first discuss Time series applications and then discuss how RNNs could apply to Time series applications. Finally, we discuss applicability to IoT.
In this article (Part One), we present the overall thought process behind the use of Recurrent neural networks and Time series applications – especially a type of RNN called Long Short Term Memory networks (LSTMs).
Tags: data mining, machine learning, previsão
Time Series Analysis using R-Forecast package
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional
Demonstra algumas das funcionalidades do pacote R forecast
In today’s blog post, we shall look into time series analysis using R package – forecast. Objective of the post will be explaining the different methods available in forecast package which can be applied while dealing with time series analysis/forecasting.
Tags: data mining, previsão, R-software
Avoiding a common mistake with time series
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais
Um caso em q a tendência mascara o resto da série criando correlações elevadas
A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.
If you work with data, throughout your career you’ll probably have to re-learn it several times. But you often see the principle demonstrated with a graph like this:
One line is something like a stock market index, and the other is an (almost certainly) unrelated time series like “Number of times Jennifer Lawrence is mentioned in the media.” The lines look amusingly similar. There is usually a statement like: “Correlation = 0.86”. Recall that a correlation coefficient is between +1 (a perfect linear relationship) and -1 (perfectly inversely related), with zero meaning no linear relationship at all. 0.86 is a high value, demonstrating that the statistical relationship of the two time series is strong.
The correlation passes a statistical test. This is a great example of mistaking correlation for causality, right? Well, no, not really: it’s actually a time series problem analyzed poorly, and a mistake that could have been avoided. You never should have seen this correlation in the first place.
The more basic problem is that the author is comparing two trended time series. The rest of this post will explain what that means, why it’s bad, and how you can avoid it fairly simply. If any of your data involves samples taken over time, and you’re exploring relationships between the series, you’ll want to read on.
Tags: previsão
How and Why: Decorrelate Time Series
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais
O problemas das autocorrelações nas séries cronológicas.
When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions.
A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found (using a few tests of randomness), then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series, or reveal otherwise undetected outliers.
Tags: previsão
Time Series Forecasting and Internet of Things (IoT) in Grain Storage
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional
Aplicações reais de previsão com séries cronológicas
Grain storage operators are always trying to minimize the cost of their supply chain. Understanding relationship between receival, outturn, within storage site and between storage site movements can provide us insights that can be useful in planning for the next harvest reason, estimating the throughput capacity of the system, relationship between throughout and inventory. This article explores the potential of scanner data in advance analytics. Combination of these two fields has the potential to be useful for grain storage business. The study describes Grain storage scenarios in the Australian context.
Tags: previsão
How We Combined Different Methods to Create Time Series Prediction
Posted by Armando Brito Mendes | Filed under estatística
Bom texto sobre a decomposição clássica para previsão.
Today, businesses need to be able to predict demand and trends to stay in line with any sudden market changes and economy swings. This is exactly where forecasting tools, powered by Data Science, come into play, enabling organizations to successfully deal with strategic and capacity planning. Smart forecasting techniques can be used to reduce any possible risks and assist in making well-informed decisions. One of our customers, an enterprise from the Middle East, needed to predict their market demand for the upcoming twelve weeks. They required a market forecast to help them set their short-term objectives, such as production strategy, as well as assist in capacity planning and price control. So, we came up with an idea of creating a custom time series model capable of tackling the challenge. In this article, we will cover the modelling process as well as the pitfalls we had to overcome along the way.
Tags: previsão
The 7 Most Important Data Mining Techniques
Posted by Armando Brito Mendes | Filed under materiais para profissionais
Pequena introdução a ulguns dos métodos mais usados em data mining
Data mining is the process of looking at large banks of information to generate new information. Intuitively, you might think that data “mining” refers to the extraction of new data, but this isn’t the case; instead, data mining is about extrapolating patterns and new knowledge from the data you’ve already collected.
Relying on techniques and technologies from the intersection of database management, statistics, and machine learning, specialists in data mining have dedicated their careers to better understanding how to process and draw conclusions from vast amounts of information. But what are the techniques they use to make this happen?
Tags: data mining
A Simple Introduction to Complex Stochastic Processes
Posted by Armando Brito Mendes | Filed under estatística, matemática
Stochastic processes have many applications, including in finance and physics. It is an interesting model to represent many phenomena. Unfortunately the theory behind it is very difficult, making it accessible to a few ‘elite’ data scientists, and not popular in business contexts.
1. Construction of Time-Continuous Stochastic Processes: Brownian Motion
2. General Properties
Tags: análise de sistemas
Data Analysis Method: Mathematics Optimization to Build Decision Making
Posted by Armando Brito Mendes | Filed under Investigação Operacional, matemática, SAD - DSS
Uma pequena introdução à utilização de otimização na análise de dados
Optimization is a problem associated with the best decision that is effective and efficient decisions whether it is worth maximum or minimum by way of determining a satisfactory solution.
Optimization is not a new science. It has grown even since Newton in the 17th century discovered how to count roots. Currently the science of optimization is still evolving in terms of techniques and applications. Many cases or problems in everyday life that involve optimization to solve them. Lately much developed especially in the emergence of new techniques to solve the problem of optimization. To mention some, among others, conic programming, semi definite programming, semi infinite programming and some meta heuristic techniques.
Tags: análise de dados, data mining, otimização