Avoiding a common mistake with time series
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais
Um caso em q a tendência mascara o resto da série criando correlações elevadas
A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.
If you work with data, throughout your career you’ll probably have to re-learn it several times. But you often see the principle demonstrated with a graph like this:
One line is something like a stock market index, and the other is an (almost certainly) unrelated time series like “Number of times Jennifer Lawrence is mentioned in the media.” The lines look amusingly similar. There is usually a statement like: “Correlation = 0.86”. Recall that a correlation coefficient is between +1 (a perfect linear relationship) and -1 (perfectly inversely related), with zero meaning no linear relationship at all. 0.86 is a high value, demonstrating that the statistical relationship of the two time series is strong.
The correlation passes a statistical test. This is a great example of mistaking correlation for causality, right? Well, no, not really: it’s actually a time series problem analyzed poorly, and a mistake that could have been avoided. You never should have seen this correlation in the first place.
The more basic problem is that the author is comparing two trended time series. The rest of this post will explain what that means, why it’s bad, and how you can avoid it fairly simply. If any of your data involves samples taken over time, and you’re exploring relationships between the series, you’ll want to read on.
Tags: previsão
How and Why: Decorrelate Time Series
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais
O problemas das autocorrelações nas séries cronológicas.
When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions.
A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found (using a few tests of randomness), then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series, or reveal otherwise undetected outliers.
Tags: previsão
Time Series Forecasting and Internet of Things (IoT) in Grain Storage
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional
Aplicações reais de previsão com séries cronológicas
Grain storage operators are always trying to minimize the cost of their supply chain. Understanding relationship between receival, outturn, within storage site and between storage site movements can provide us insights that can be useful in planning for the next harvest reason, estimating the throughput capacity of the system, relationship between throughout and inventory. This article explores the potential of scanner data in advance analytics. Combination of these two fields has the potential to be useful for grain storage business. The study describes Grain storage scenarios in the Australian context.
Tags: previsão
How We Combined Different Methods to Create Time Series Prediction
Posted by Armando Brito Mendes | Filed under estatística
Bom texto sobre a decomposição clássica para previsão.
Today, businesses need to be able to predict demand and trends to stay in line with any sudden market changes and economy swings. This is exactly where forecasting tools, powered by Data Science, come into play, enabling organizations to successfully deal with strategic and capacity planning. Smart forecasting techniques can be used to reduce any possible risks and assist in making well-informed decisions. One of our customers, an enterprise from the Middle East, needed to predict their market demand for the upcoming twelve weeks. They required a market forecast to help them set their short-term objectives, such as production strategy, as well as assist in capacity planning and price control. So, we came up with an idea of creating a custom time series model capable of tackling the challenge. In this article, we will cover the modelling process as well as the pitfalls we had to overcome along the way.
Tags: previsão
The 7 Most Important Data Mining Techniques
Posted by Armando Brito Mendes | Filed under materiais para profissionais
Pequena introdução a ulguns dos métodos mais usados em data mining
Data mining is the process of looking at large banks of information to generate new information. Intuitively, you might think that data “mining” refers to the extraction of new data, but this isn’t the case; instead, data mining is about extrapolating patterns and new knowledge from the data you’ve already collected.
Relying on techniques and technologies from the intersection of database management, statistics, and machine learning, specialists in data mining have dedicated their careers to better understanding how to process and draw conclusions from vast amounts of information. But what are the techniques they use to make this happen?
Tags: data mining
A Simple Introduction to Complex Stochastic Processes
Posted by Armando Brito Mendes | Filed under estatística, matemática
Stochastic processes have many applications, including in finance and physics. It is an interesting model to represent many phenomena. Unfortunately the theory behind it is very difficult, making it accessible to a few ‘elite’ data scientists, and not popular in business contexts.
1. Construction of Time-Continuous Stochastic Processes: Brownian Motion
2. General Properties
Tags: análise de sistemas
Data Analysis Method: Mathematics Optimization to Build Decision Making
Posted by Armando Brito Mendes | Filed under Investigação Operacional, matemática, SAD - DSS
Uma pequena introdução à utilização de otimização na análise de dados
Optimization is a problem associated with the best decision that is effective and efficient decisions whether it is worth maximum or minimum by way of determining a satisfactory solution.
Optimization is not a new science. It has grown even since Newton in the 17th century discovered how to count roots. Currently the science of optimization is still evolving in terms of techniques and applications. Many cases or problems in everyday life that involve optimization to solve them. Lately much developed especially in the emergence of new techniques to solve the problem of optimization. To mention some, among others, conic programming, semi definite programming, semi infinite programming and some meta heuristic techniques.
Tags: análise de dados, data mining, otimização
An Introduction to Word Embeddings
Posted by Armando Brito Mendes | Filed under Sem categoria
bom texto sobre uma técnica em NLP – Natural Language Pocessing
Part 1: Applications
Written by Aaron Geelon So
If you already have a solid understanding of word embeddings and are well into your data science career, skip ahead to the next part!
Human language is unreasonably effective at describing how we relate to the world. With a few, short words, we can convey many ideas and actions with little ambiguity. Well, mostly.
Because we’re capable of seeing and describing so much complexity, a lot of structure is implicitly encoded into our language. It is no easy task for a computer (or a human, for that matter) to learn natural language, for it entails understanding how we humans observe the world, if not understanding how to observe the world.
For the most part, computers can’t understand natural language. Our programs are still line-by-line instructions telling a computer what to do — they often miss nuance and context. How can you explain sarcasm to a machine?
There’s good news though. There’s been some important breakthroughs in natural language processing (NLP), the domain where researchers try to teach computers human language.
Tags: data mining, machine learning, text mining
imagens criadas por campos vetoriais
Posted by Armando Brito Mendes | Filed under materiais ensino, software, visualização
This website allows you to explore vector fields in real time.
“Vector field” is just a fancy way of saying that each point on a screen has some vector associated with it. This vector could mean anything, but for our purposes we consider it to be a velocity vector.
Now that we have velocity vectors at every single point, let’s drop thousands of small particles and see how they move. Resulting visualization could be used by scientist to study vector fields, or by artist to get inspiration!
Learn more about this project on GitHub
Stay tuned for updates on Twitter.
With passion,
Anvaka
Tags: belo
Free Hadoop Tutorial: Master BigData
Posted by Armando Brito Mendes | Filed under lições, materiais ensino, software
BigData is the latest buzzword in the IT Industry. Apache’s Hadoop is a leading Big Data platform used by IT giants Yahoo, Facebook & Google. This course is geared to make a Hadoop Expert.
What should I know?
This is an absolute beginner guide to Hadoop. But knowledge of 1) Java 2) Linux will help
Syllabus
Tutorial | Introduction to BIG DATA: Types, Characteristics & Benefits |
Tutorial | Hadoop Tutorial: Features, Components, Cluster & Topology |
Tutorial | Hadoop Setup Tutorial – Installation & Configuration |
Tutorial | HDFS Tutorial: Read & Write Commands using Java API |
Tutorial | What is MapReduce? How it Works – Hadoop MapReduce Tutorial |
Tutorial | Hadoop & Mapreduce Examples: Create your First Program |
Tutorial | Hadoop MapReduce Tutorial: Counters & Joins with Example |
Tutorial | What is Sqoop? What is FLUME – Hadoop Tutorial |
Tutorial | Sqoop vs Flume vs HDFS in Hadoop |
Tutorial | Create Your First FLUME Program – Beginner’s Tutorial |
Tutorial | Hadoop PIG Tutorial: Introduction, Installation & Example |
Tutorial | Learn OOZIE in 5 Minutes – Hadoop Tutorial |
Tutorial | Big Data Testing: Functional & Performance |
Tutorial | Hadoop & MapReduce Interview Questions & Answers |
Tags: big data