Time series decomposition works by splitting a time series into three components: seasonality, trends and random fluctiation. To show how this works, we will study the decompose( ) and STL( ) functions in the R language.
API Integration in Python
Posted by Armando Brito Mendes | Filed under linguagens de programação, materiais para profissionais, software
able of Contents
- How to Make Friends and Influence APIs
- Talking REST
- Constructing an API Library
- Coming in Part 2
- Appendix: REST in a nutshell
Axes of evil: How to lie with graphs
Posted by Armando Brito Mendes | Filed under materiais ensino, materiais para profissionais, Sem categoria, visualização
Um blog com exemplos e links para outros sites.
As Mark Twain once said, “Never let the truth get in the way of a good story.” Here are a few techniques to hide those pesky numbers and tell the story you feel, not the one you can prove.
Don your handlebar mustache and practice your evil laugh — we’re going in.
A Beginner’s Guide to learn web scraping with python!
Posted by Armando Brito Mendes | Filed under lições, materiais ensino, materiais para profissionais, software
Boa descrição de web scraping com Python
Web Scraping with Python
Imagine you have to pull a large amount of data from websites and you want to do it as quickly as possible. How would you do it without manually going to each website and getting the data? Well, “Web Scraping” is the answer. Web Scraping just makes this job easier and faster.
In this article on Web Scraping with Python, you will learn about web scraping in brief and see how to extract data from a website with a demonstration. I will be covering the following topics:
Tags: Python
The Beautiful Hidden Logic of Cities
Posted by Armando Brito Mendes | Filed under mapas SIG's, materiais ensino, materiais para profissionais, visualização
Padrões identificados em mapas de cidades.
After finishing my map of the most common road suffixes by length, I realized I could also map each individual road, colored by its suffix. This has led to the loveliest maps I’ve made.
Driving around your city, you’re probably somewhat aware of Avenues and Boulevards and Streets and Roads and so on. Here in Portland, at least, I know that Avenues run north-south and Streets run east-west. However, it’s hard to get an overall view of how all these road designations knit together. By coloring them, we can suddenly see a new, stunning view of what we normally take for granted.
Tags: captura de conhecimento, image mining, mapas
Making of the Illustrations of the Natural Orders of Plants
Posted by Armando Brito Mendes | Filed under materiais para profissionais
clique na imagem para seguir o link
If someone told me when I was young that I would spend three months of my time tracing nineteenth century botanical illustrations and enjoy it, I would have scoffed, but that’s what I did to reproduce Elizabeth Twining’s Illustrations of the Natural Orders of Plants and I loved every minute.
After the unexpected successes of my Byrne’s Euclid and Werner’s Nomenclature of Colours projects (for which I’m very grateful) I got the itch to follow them up with another reproduction of an obscure catalog from the 1800s. However, finding interesting obscure catalogs want an easy task when I didn’t know what would pique my interest. Anything was fair game but I had an inkling that something based on the sciences would be most interesting. Scientific catalogs are organized, structured, and data can be extracted from them with some elbow grease.
Tags: belo, captura de conhecimento, data mining, machine learning
Extracting Seasonality and Trend from Data: Decomposition Using R
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, lições, linguagens de programação, materiais ensino, materiais para profissionais
Uma excelente descrição da decomposição clássica com Python e R.
Understanding Decomposition
Decompose One Time Series into Multiple Series
Time series decomposition is a mathematical procedure which transforms a time series into multiple different time series. The original time series is often split into 3 component series:
- Seasonal: Patterns that repeat with a fixed period of time. For example, a website might receive more visits during weekends; this would produce data with a seasonality of 7 days.
- Trend: The underlying trend of the metrics. A website increasing in popularity should show a general trend that goes up.
- Random: Also call “noise”, “irregular” or “remainder,” this is the residuals of the original time series after the seasonal and trend series are removed.
Tags: engenharia, inferência, otimização, previsão
Making it easier to discover datasets
Posted by Armando Brito Mendes | Filed under Bases de Dados, materiais para profissionais
Novo recurso da google para identificar conjuntos de dados.
In today’s world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.
Tags: análise de dados
When Variable Reduction Doesn’t Work
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
Um bom exemplo de como os procedimentos habituais nem sempre funcionam
Summary: Exceptions sometimes make the best rules. Here’s an example of well accepted variable reduction techniques resulting in an inferior model and a case for dramatically expanding the number of variables we start with.
of the things that keeps us data scientists on our toes is that the well-established rules-of-thumb don’t always work. Certainly one of the most well-worn of these rules is the parsimonious model; always seek to create the best model with the fewest variables. And woe to you who violate this rule. Your model will over fit, include false random correlations, or at very least will just be judged to be slow and clunky.
Certainly this is a rule I embrace when building models so I was surprised and then delighted to find a well conducted study by Lexis/Nexis that lays out a case where this clearly isn’t true.
Tags: data mining, problemas
How To Forecast Time Series Data With Multiple Seasonal Periods
Posted by Armando Brito Mendes | Filed under estatística, matemática, materiais para profissionais
Análise de séries complexas com múltiplos períodos sazonais
Time series data is produced in domains such as IT operations, manufacturing, and telecommunications. Examples of time series data include the number of client logins to a website on a daily basis, cell phone traffic collected per minute, and temperature variation in a region by the hour. Forecasting a time series signal ahead of time helps us make decisions such as planning capacity and estimating demand. Previous time series analysis blog posts focused on processing time series data that resides on Greenplum database using SQL functions. In this post, I will examine the modeling steps involved in forecasting a time series sequence with multiple seasonal periods. The various steps involved are outlined below:
- Multiple seasonality is modelled with the help of fourier series with different periods
- External regressors in the form of fourier terms are added to an ARIMA model to account for the seasonal behavior
- Akaike Information Criteria (AIC) is used to find the best fit model
Tags: previsão
Avoiding a common mistake with time series
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais
Um caso em q a tendência mascara o resto da série criando correlações elevadas
A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.
If you work with data, throughout your career you’ll probably have to re-learn it several times. But you often see the principle demonstrated with a graph like this:
One line is something like a stock market index, and the other is an (almost certainly) unrelated time series like “Number of times Jennifer Lawrence is mentioned in the media.” The lines look amusingly similar. There is usually a statement like: “Correlation = 0.86”. Recall that a correlation coefficient is between +1 (a perfect linear relationship) and -1 (perfectly inversely related), with zero meaning no linear relationship at all. 0.86 is a high value, demonstrating that the statistical relationship of the two time series is strong.
The correlation passes a statistical test. This is a great example of mistaking correlation for causality, right? Well, no, not really: it’s actually a time series problem analyzed poorly, and a mistake that could have been avoided. You never should have seen this correlation in the first place.
The more basic problem is that the author is comparing two trended time series. The rest of this post will explain what that means, why it’s bad, and how you can avoid it fairly simply. If any of your data involves samples taken over time, and you’re exploring relationships between the series, you’ll want to read on.
Tags: previsão