Time series decomposition works by splitting a time series into three components: seasonality, trends and random fluctiation. To show how this works, we will study the decompose( ) and STL( ) functions in the R language.

## Extracting Seasonality and Trend from Data: Decomposition Using R

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, lições, linguagens de programação, materiais ensino, materiais para profissionais

Uma excelente descrição da decomposição clássica com Python e R.

## Understanding Decomposition

#### Decompose One Time Series into Multiple Series

Time series decomposition is a mathematical procedure which transforms a time series into multiple different time series. The original time series is often split into 3 component series:

**Seasonal:**Patterns that repeat with a fixed period of time. For example, a website might receive more visits during weekends; this would produce data with a seasonality of 7 days.**Trend:**The underlying trend of the metrics. A website increasing in popularity should show a general trend that goes up.**Random:**Also call “noise”, “irregular” or “remainder,” this is the residuals of the original time series after the seasonal and trend series are removed.

Tags: engenharia, inferência, otimização, previsão

## How signal processing can be used to identify patterns in complex time series

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional

Uso de técnicas de processamento de sinal em séries cronológicas

The trend and seasonality can be accounted for in a linear model by including sinusoidal components with a given frequency. However, finding the appropriate frequency for each sinusoidal component requires a little more digging. This post shows how to use fast Fourier transforms to find these frequencies.

## How To Forecast Time Series Data With Multiple Seasonal Periods

Posted by Armando Brito Mendes | Filed under estatística, matemática, materiais para profissionais

Análise de séries complexas com múltiplos períodos sazonais

Time series data is produced in domains such as IT operations, manufacturing, and telecommunications. Examples of time series data include the number of client logins to a website on a daily basis, cell phone traffic collected per minute, and temperature variation in a region by the hour. Forecasting a time series signal ahead of time helps us make decisions such as planning capacity and estimating demand. Previous time series analysis blog posts focused on processing time series data that resides on Greenplum database using SQL functions. In this post, I will examine the modeling steps involved in forecasting a time series sequence with multiple seasonal periods. The various steps involved are outlined below:

- Multiple seasonality is modelled with the help of fourier series with different periods
- External regressors in the form of fourier terms are added to an ARIMA model to account for the seasonal behavior
- Akaike Information Criteria (AIC) is used to find the best fit model

Tags: previsão

## How To Use Multivariate Time Series Techniques For Capacity Planning on VMs

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais ensino

Métodos multivariados para séries cronológicas com VMs

Capacity planning is an arduous, ongoing task for many operations teams, especially for those who rely on Virtual Machines (VMs) to power their business. At Pivotal, we have developed a data science model capable of forecasting hundreds of thousands of models to automate this task using a multivariate time series approach. Open to reuse for other areas such as industrial equipment or vehicles engines, this technique can be applied broadly to anything where regular monitoring data can be collected.

Tags: data mining, machine learning, previsão

## Three classes of metrics: centrality, volatility, and bumpiness

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional

introduz uma nova classe de estatísticas para séries cronológicas: bumpiness

All statistical textbooks focus on centrality (median, average or mean) and volatility (variance). None mention the third fundamental class of metrics: bumpiness.

Here we introduce the concept of *bumpiness* and show how it can be used. Two different datasets can have same *mean* and *variance*, but a different *bumpiness*. Bumpiness is linked to how the data points are ordered, while centrality and volatility completely ignore order. So, bumpiness is useful for datasets where order matters, in particular time series. Also, bumpiness integrates the notion of dependence (among the data points), while centrality and variance do not. Note that a time series can have high volatility (high variance) and low bumpiness. The converse is true.

The attached Excel spreadsheet shows computations of the bumpiness coefficient r for various time series. It is also of interest to readers who wish to learn new Excel concepts such a random number generation with Rand, indirect references with Indirect, Rank, Large and other powerful but not well known Excel functions. It is also an example of a fully interactive Excel spreadsheet driven by two core parameters.

Finally, this article shows (1) how a new concept is thought of, (2) then a robust, modern definition materialized, and (3) eventually a more meaningful definition created based on, and compatible with previous science.

Tags: data mining, previsão

## Recurrent neural networks, Time series data and IoT – Part One

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais ensino

Utilização de redes neuronais para previsão de séries univariadas

RNNs are already used for Time series analysis. Because IoT problems can often be modelled as a Time series, RNNs could apply to IoT data. In this multi-part blog, we first discuss Time series applications and then discuss how RNNs could apply to Time series applications. Finally, we discuss applicability to IoT.

In this article (Part One), we present the overall thought process behind the use of Recurrent neural networks and Time series applications – especially a type of RNN called Long Short Term Memory networks (LSTMs).

Tags: data mining, machine learning, previsão

## Time Series Analysis using R-Forecast package

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional

Demonstra algumas das funcionalidades do pacote R forecast

In today’s blog post, we shall look into time series analysis using R package – forecast. Objective of the post will be explaining the different methods available in forecast package which can be applied while dealing with time series analysis/forecasting.

Tags: data mining, previsão, R-software

## Avoiding a common mistake with time series

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais

Um caso em q a tendência mascara o resto da série criando correlações elevadas

A basic mantra in statistics and data science is *correlation is not causation*, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.

If you work with data, throughout your career you’ll probably have to re-learn it several times. But you often see the principle demonstrated with a graph like this:

One line is something like a stock market index, and the other is an (almost certainly) unrelated time series like “Number of times Jennifer Lawrence is mentioned in the media.” The lines look amusingly similar. There is usually a statement like: “Correlation = 0.86”. Recall that a correlation coefficient is between +1 (a perfect linear relationship) and -1 (perfectly inversely related), with zero meaning no linear relationship at all. 0.86 is a high value, demonstrating that the statistical relationship of the two time series is strong.

The correlation passes a statistical test. This is a great example of mistaking correlation for causality, right? Well, no, not really: it’s actually a time series problem analyzed poorly, and a mistake that could have been avoided. You never should have seen this correlation in the first place.

The more basic problem is that the author is comparing two trended time series. The rest of this post will explain what that means, why it’s bad, and how you can avoid it fairly simply. If any of your data involves samples taken over time, and you’re exploring relationships between the series, you’ll want to read on.

Tags: previsão

## How and Why: Decorrelate Time Series

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais

O problemas das autocorrelações nas séries cronológicas.

When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions.

A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found (using a few tests of randomness), then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series, or reveal otherwise undetected outliers.

Tags: previsão

## Time Series Forecasting and Internet of Things (IoT) in Grain Storage

Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional

Aplicações reais de previsão com séries cronológicas

Grain storage operators are always trying to minimize the cost of their supply chain. Understanding relationship between receival, outturn, within storage site and between storage site movements can provide us insights that can be useful in planning for the next harvest reason, estimating the throughput capacity of the system, relationship between throughout and inventory. This article explores the potential of scanner data in advance analytics. Combination of these two fields has the potential to be useful for grain storage business. The study describes Grain storage scenarios in the Australian context.

Tags: previsão