Avoiding a common mistake with time series

clique na imagem para seguir o link

clique na imagem para seguir o link

Um caso em q a tendência mascara o resto da série criando correlações elevadas

A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.

If you work with data, throughout your career you’ll probably have to re-learn it several times. But you often see the principle demonstrated with a graph like this:

Dow Jones vs. Jennifer Lawrence

One line is something like a stock market index, and the other is an (almost certainly) unrelated time series like “Number of times Jennifer Lawrence is mentioned in the media.” The lines look amusingly similar. There is usually a statement like: “Correlation = 0.86”.  Recall that a correlation coefficient is between +1 (a perfect linear relationship) and -1 (perfectly inversely related), with zero meaning no linear relationship at all.  0.86 is a high value, demonstrating that the statistical relationship of the two time series is strong.

The correlation passes a statistical test. This is a great example of mistaking correlation for causality, right? Well, no, not really: it’s actually a time series problem analyzed poorly, and a mistake that could have been avoided. You never should have seen this correlation in the first place.

The more basic problem is that the author is comparing two trended time series. The rest of this post will explain what that means, why it’s bad, and how you can avoid it fairly simply. If any of your data involves samples taken over time, and you’re exploring relationships between the series, you’ll want to read on.

Tags:

How and Why: Decorrelate Time Series

clique na imagem para seguir o link

clique na imagem para seguir o link

O problemas das autocorrelações nas séries cronológicas.

When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions.

A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found (using a few tests of randomness), then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts  such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series, or reveal otherwise undetected outliers.

Tags:

Time Series Forecasting and Internet of Things (IoT) in Grain Storage

clique na imagem para seguir o link

clique na imagem para seguir o link

Aplicações reais de previsão com séries cronológicas

Grain storage operators are always trying to minimize the cost of their supply chain. Understanding relationship between receival, outturn, within storage site and between storage site movements can provide us insights that can be useful in planning for the next harvest reason, estimating the throughput capacity of the system, relationship between throughout and inventory. This article explores the potential of scanner data in advance analytics. Combination of these two fields has the potential to be useful for grain storage business. The study describes Grain storage scenarios in the Australian context.

Tags:

How We Combined Different Methods to Create Time Series Prediction

clique na imagem para seguir o link

clique na imagem para seguir o link

Bom texto sobre a decomposição clássica para previsão.

Today, businesses need to be able to predict demand and trends to stay in line with any sudden market changes and economy swings. This is exactly where forecasting tools, powered by Data Science, come into play, enabling organizations to successfully deal with strategic and capacity planning. Smart forecasting techniques can be used to reduce any possible risks and assist in making well-informed decisions. One of our customers, an enterprise from the Middle East, needed to predict their market demand for the upcoming twelve weeks. They required a market forecast to help them set their short-term objectives, such as production strategy, as well as assist in capacity planning and price control. So, we came up with an idea of creating a custom time series model capable of tackling the challenge. In this article, we will cover the modelling process as well as the pitfalls we had to overcome along the way.

Tags:

The 7 Most Important Data Mining Techniques

clique na imagem para seguir o link

clique na imagem para seguir o link

Pequena introdução a ulguns dos métodos mais usados em data mining

Data mining is the process of looking at large banks of information to generate new information. Intuitively, you might think that data “mining” refers to the extraction of new data, but this isn’t the case; instead, data mining is about extrapolating patterns and new knowledge from the data you’ve already collected.

Relying on techniques and technologies from the intersection of database management, statistics, and machine learning, specialists in data mining have dedicated their careers to better understanding how to process and draw conclusions from vast amounts of information. But what are the techniques they use to make this happen?

Tags:

A Simple Introduction to Complex Stochastic Processes

clique na imagem para seguir o link

clique na imagem para seguir o link

Stochastic processes have many applications, including in finance and physics. It is an interesting model to represent many phenomena. Unfortunately the theory behind it is very difficult, making it accessible to a few ‘elite’ data scientists, and not popular in business contexts.

1. Construction of Time-Continuous Stochastic Processes: Brownian Motion

2. General Properties

Tags:

Data Analysis Method: Mathematics Optimization to Build Decision Making

clique na imagem para seguir o link

clique na imagem para seguir o link

Uma pequena introdução à utilização de otimização na análise de dados

Optimization is a problem associated with the best decision that is effective and efficient decisions whether it is worth maximum or minimum by way of determining a satisfactory solution.

Optimization is not a new science. It has grown even since Newton in the 17th century discovered how to count roots. Currently the science of optimization is still evolving in terms of techniques and applications. Many cases or problems in everyday life that involve optimization to solve them. Lately much developed especially in the emergence of new techniques to solve the problem of optimization. To mention some, among others, conic programming, semi definite programming, semi infinite programming and some meta heuristic techniques.

Tags: , ,

An Introduction to Word Embeddings

clique na imagem para seguir o link

bom texto sobre uma técnica em NLP – Natural Language Pocessing

Part 1: Applications

Written by Aaron Geelon So

If you already have a solid understanding of word embeddings and are well into your data science career, skip ahead to the next part!

Human language is unreasonably effective at describing how we relate to the world. With a few, short words, we can convey many ideas and actions with little ambiguity. Well, mostly.

Because we’re capable of seeing and describing so much complexity, a lot of structure is implicitly encoded into our language. It is no easy task for a computer (or a human, for that matter) to learn natural language, for it entails understanding how we humans observe the world, if not understanding how to observe the world.

For the most part, computers can’t understand natural language. Our programs are still line-by-line instructions telling a computer what to do — they often miss nuance and context. How can you explain sarcasm to a machine?

There’s good news though. There’s been some important breakthroughs in natural language processing (NLP), the domain where researchers try to teach computers human language.

Tags: , ,

imagens criadas por campos vetoriais

clicar na imagem para seguir o link para a app

This website allows you to explore vector fields in real time.

“Vector field” is just a fancy way of saying that each point on a screen has some vector associated with it. This vector could mean anything, but for our purposes we consider it to be a velocity vector.

Now that we have velocity vectors at every single point, let’s drop thousands of small particles and see how they move. Resulting visualization could be used by scientist to study vector fields, or by artist to get inspiration!

Learn more about this project on GitHub

Stay tuned for updates on Twitter.

With passion,

Anvaka

Tags:

Free Hadoop Tutorial: Master BigData

clique na imagem para seguir o link

clique na imagem para seguir o link

BigData is the latest buzzword in the IT Industry. Apache’s Hadoop is a leading Big Data platform used by IT giants Yahoo, Facebook & Google. This course is geared to make a Hadoop Expert.

What should I know?


This is an absolute beginner guide to Hadoop. But knowledge of 1) Java 2) Linux will help

Syllabus

Tutorial Introduction to BIG DATA: Types, Characteristics & Benefits
Tutorial Hadoop Tutorial: Features, Components, Cluster & Topology
Tutorial Hadoop Setup Tutorial – Installation & Configuration
Tutorial HDFS Tutorial: Read & Write Commands using Java API
Tutorial What is MapReduce? How it Works – Hadoop MapReduce Tutorial
Tutorial Hadoop & Mapreduce Examples: Create your First Program
Tutorial Hadoop MapReduce Tutorial: Counters & Joins with Example
Tutorial What is Sqoop? What is FLUME – Hadoop Tutorial
Tutorial Sqoop vs Flume vs HDFS in Hadoop
Tutorial Create Your First FLUME Program – Beginner’s Tutorial
Tutorial Hadoop PIG Tutorial: Introduction, Installation & Example
Tutorial Learn OOZIE in 5 Minutes – Hadoop Tutorial
Tutorial Big Data Testing: Functional & Performance
Tutorial Hadoop & MapReduce Interview Questions & Answers

Tags: