Avoiding a common mistake with time series

clique na imagem para seguir o link

clique na imagem para seguir o link

Um caso em q a tendência mascara o resto da série criando correlações elevadas

A basic mantra in statistics and data science is correlation is not causation, meaning that just because two things appear to be related to each other doesn’t mean that one causes the other. This is a lesson worth learning.

If you work with data, throughout your career you’ll probably have to re-learn it several times. But you often see the principle demonstrated with a graph like this:

Dow Jones vs. Jennifer Lawrence

One line is something like a stock market index, and the other is an (almost certainly) unrelated time series like “Number of times Jennifer Lawrence is mentioned in the media.” The lines look amusingly similar. There is usually a statement like: “Correlation = 0.86”.  Recall that a correlation coefficient is between +1 (a perfect linear relationship) and -1 (perfectly inversely related), with zero meaning no linear relationship at all.  0.86 is a high value, demonstrating that the statistical relationship of the two time series is strong.

The correlation passes a statistical test. This is a great example of mistaking correlation for causality, right? Well, no, not really: it’s actually a time series problem analyzed poorly, and a mistake that could have been avoided. You never should have seen this correlation in the first place.

The more basic problem is that the author is comparing two trended time series. The rest of this post will explain what that means, why it’s bad, and how you can avoid it fairly simply. If any of your data involves samples taken over time, and you’re exploring relationships between the series, you’ll want to read on.

Tags:

How and Why: Decorrelate Time Series

clique na imagem para seguir o link

clique na imagem para seguir o link

O problemas das autocorrelações nas séries cronológicas.

When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions.

A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found (using a few tests of randomness), then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts  such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series, or reveal otherwise undetected outliers.

Tags:

Time Series Forecasting and Internet of Things (IoT) in Grain Storage

clique na imagem para seguir o link

clique na imagem para seguir o link

Aplicações reais de previsão com séries cronológicas

Grain storage operators are always trying to minimize the cost of their supply chain. Understanding relationship between receival, outturn, within storage site and between storage site movements can provide us insights that can be useful in planning for the next harvest reason, estimating the throughput capacity of the system, relationship between throughout and inventory. This article explores the potential of scanner data in advance analytics. Combination of these two fields has the potential to be useful for grain storage business. The study describes Grain storage scenarios in the Australian context.

Tags:

How We Combined Different Methods to Create Time Series Prediction

clique na imagem para seguir o link

clique na imagem para seguir o link

Bom texto sobre a decomposição clássica para previsão.

Today, businesses need to be able to predict demand and trends to stay in line with any sudden market changes and economy swings. This is exactly where forecasting tools, powered by Data Science, come into play, enabling organizations to successfully deal with strategic and capacity planning. Smart forecasting techniques can be used to reduce any possible risks and assist in making well-informed decisions. One of our customers, an enterprise from the Middle East, needed to predict their market demand for the upcoming twelve weeks. They required a market forecast to help them set their short-term objectives, such as production strategy, as well as assist in capacity planning and price control. So, we came up with an idea of creating a custom time series model capable of tackling the challenge. In this article, we will cover the modelling process as well as the pitfalls we had to overcome along the way.

Tags:

A Simple Introduction to Complex Stochastic Processes

clique na imagem para seguir o link

clique na imagem para seguir o link

Stochastic processes have many applications, including in finance and physics. It is an interesting model to represent many phenomena. Unfortunately the theory behind it is very difficult, making it accessible to a few ‘elite’ data scientists, and not popular in business contexts.

1. Construction of Time-Continuous Stochastic Processes: Brownian Motion

2. General Properties

Tags:

Pianogram

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Uma visualização em gráfico de barras das notas de canções

Pianogram

This is what you get when you cross a histogram and piano keys to show note distribution of songs. It’s the pianogram. View examples such as Fur Elise or the classic Chopsticks, or punch in your own MIDI-formatted song for a taste of the distribution ivories.

Here’s the distribution for Kenny Loggins’ Danger Zone.

Because why not.

Tags:

working with alien SPSS files

clique no ícon para seguir o link

clique no ícon para seguir o link

Informação sobre diversos inquéritos e acesso aos dados respetivos

Close Encounters of the Fourth Kind: working with alien SPSS files

[New page 23 Oct 2014: last updated 5 June 2017]

[NB: Notes and commentaries below may arrive as pdf files in your download folder]

Close Encounters of the Fourth Kind: working with alien SPSS files (pdf)

An alternative working title would have been: Sows’ Ears and Silk Purses: working with other peoples’ SPSS files as a follow-up to Old Dog, Old Tricks, my 2006 presentation to ASSESS.  Thought about using Old Dog, New Tricks, but it doesn’t carry the same sense of horror and fun.

Slide-shows covered recent work on other people’s files, including a live demo of Jon Peck’s Python code to move question numbers from the end to the beginning of variable labels and to change labels from UPPER to Mixed case text.  Also included were some new tricks and demos of things I didn’t know SPSS would do until I tried.  I haven’t used PowerPoint since York 2006, but I  found [Alt][PrintScreen] and MS Snip incredibly useful for getting screenshots into Word, and they also copied easily into Ppt.  The presentation ran SPSS live, drawing on my explorations of:

British Social Attitudes
Commentary on SPSS file for British Social Attitudes 2011 (pdf)
Notes on British Social Attitudes 2004 teaching data set (pdf) as used by Marsh and Elliott, 2008

​(See also page British Social Attitudes which has links to later commentaries on the ease of use and understanding of SPSS saved files distributed by UKDS on page British Social Attitudes: ​Exploring the SPSS files and detailed accounts of my creation in 2016 of a cumulative mother fille for all waves 1983 to 2014 on page British Social Attitudes 1983 to 2014: Cumulative SPSS file

Understanding Society
Commentary on Understanding Society 2010 (pdf)

NORC General Social Survey (GSS)
As of March 2016, the NORC GSS website has been completely revamped and is easier to navigate.   Some of the content in the following commentaries may now be otiose.
Commentary on full NORC General Social Survey 2008 (pdf)
Commentary on subset of General Social Survey 2008 (pdf) (as used by Sweet & Grace-Martin)
Commentary on GSS 2008 SPSS files for Babbie et al (pdf) (as used by Babbie, Halley, Wagner & Zaino)

(UK) ONS National Well-being
[New page 2 May 2015]
ONS National Well-being

Commentary on Unrestricted Access Teaching Dataset (ONS Opinions Survey, Well‐Being Module (pdf)
Data set and user guide from the Cathie Marsh Centre for Census and Survey Research, Manchester now renamed the Cathie Marsh Institute for Social Research, . This dataset (SN7146) contains a selection of variables from the April 2011 wave of the ONS Opinions Survey, Well-Being Module, April –  August 2011 (SN 6893) which in turn is part of the regular government survey  Opinions and Lifestyle Survey, run in various guises since 1990

Tags: ,

British Social Attitudes

clique no ícon para seguir o link

clique no ícon para seguir o link

Inquerito e dados respetivos sobre atitudes sociais no UK

British Social Attitudes 1983 onwards
​Cumulative SPSS file

[New page 22 June 2016: last updated 14 Feb 2017]

Cumulative files 1983 onwards
Attempting analyses across waves became increasingly frustrating as I encountered a range of anomalies, incompatibilities and inconsistencies, not to mention universally incomplete and/or incorrect specifications of measurement levels, missing values and value labels.   Accordingly I set myself the  task of generating a complete cumulative SPSS file containing the data from all waves from 1983 to 2014 (one colleague described this undertaking as Herculean) to provide what will hopefully be a valuable resource for teachers, students and researchers.  The 2015 wave was added in January 2017.

Index to UKDS downloads for British Social Attitudes 1983 – 2014 is an Excel file detailing, for each wave 1983 – 2014, year of survey, link to UKDS, download filename, size of file, number of cases, number of variables, number of variables with non-numeric formats and the new working filename assigned to amended files. The amended *.sav files were sent to Natcen for approval and possible deposit with UKDS, but are now superseded.

Non-numeric variables in British Social Attitudes is a step-by-step account of identifying, in each wave, variables with the same name, but different formats.  Several of these variables are specified as Strings with widths varying from A4 to A60, but some are in fact numbers.  Others are dates or times in DATE or TIME format and one is in COMMA1.  These and other factors prevent merging data from different waves using the SPSS command ADD FILES.  It’s been quite complex and tedious tracking them all down, but I eventually managed to create cumulative files for 1983 – 1994 and 2011 – 2014.  Merging 1995 – 2005 and 2006 – 2009 was more daunting, as several pro​blems remained to be resolved, but I eventually managed to generate a draft cumulative file for the whole series.  Much more meticulous and painstaking detective work and editing was required before a beta version was ready for public release.

Cumulative SPSS file 1983 to 2014
This task was completed on 20 June 2016 and the pass-word protected “mother” file (0.99 gb) has now been lodged (via Dropbox) with Natcen and UKDS for approval and distribution.  Custom-written Python code, freely and generously supplied by Jon Peck (retired Senior Software Engineer, IBM-SPSS) has saved me weeks if not months of painstaking needle-in-haystack searches. I also wish to thank Dr Chris Stride (Sheffield) who suggested using the sort facility in Excel to separate variable names with single (positive) missing values from those with paired (positive and equivalent negative) missing values.

For sure, some mini-glitches may remain, but to find and resolve these would at this stage be completely uneconomic of my time.  However users are warned that, because metadata for repeated variables are taken from the most recent wave, the value labels for categories of some variables differ from those of earlier waves.  This is particularly true of ordinal variables for income groups.

Tags: ,

SSRC Survey Unit Quality of Life

clique no ícon para seguir o link

clique no ícon para seguir o link

Dados e descrição de um inquérito sobre qualidade de vida no UK

The abstracts contain details of content, sampling, fieldwork and available data files.  The questionnaires are facsimiles of the actual questionnaires used in the field.  The user manuals contain questionnaires, unweighted frequency counts on the raw data as well as technical information on fieldwork, sampling, coding, show-cards and interviewer instructions.  The SPSS saved files are restorations from original files generated in the 1970s with some editing of SPSS setup files from 1970s versions to SPSS  for Windows (11, 15, 18 and 19): a few (self-explanatory) derived variables have been left in.

Quality of Life in Britain: 1st Pilot Survey,  March 1971

1:  Abstract
2:  Questionnaire
3:  User Manual
4: SPSS saved file for 1st pilot

Quality of Life in Britain: 2nd Pilot Survey, Oct-Nov 1971

1:  Abstract
2:  Questionnaire
3:  User Manual
4: SPSS saved file for 2nd pilot

Quality of Life in Britain: 1st National Survey 1973

(replicated simultaneously in Stoke-on-Trent and Sunderland)

1:  Abstract
2: Questionnaire
3:  User Manual for main GB survey

4a: SPSS saved file for main GB survey 1973
4b: SPSS saved file for Stoke-on-Trent survey 1973
4c: SPSS saved file for Sunderland survey 1973

Quality of Life in Britain: 2nd National Survey 1975

1:  Abstract
2:  Questionnaire
3:  User Manual
4: SPSS saved file for main GB survey 1975

Tags: ,

British Social Attitudes

clique no ícon para seguir o link

clique no ícon para seguir o link

Dados de um inquérito britânico sobre atitudes sociais

The British Social Attitudes survey is the leading social research survey in Britain. Since 1983, the annual surveys
conducted by the National Centre for Social Research (Natcen) have continually monitored and interpreted the British public’s changing attitudes towards social, economic, political and moral issues. Its findings are reported
and interpreted in a series of annual reports.

The  British Social Attitudes Information System is maintained by the Centre for Comparative European
Survey Data (CCESD) to provide non-specialist users with on-line access and analysis of a cumulative database of over 20,000 survey questions asked in British Social Attitudes surveys over the last 25 years.
Data and documentation from all surveys from 1983 are routinely deposited with the UK Data Service (UKDS) based at Essex University and can be searched on their page  British Social Attitudes Survey .  Files are accessible from UKDS: the list of currently available waves is on British Social Attitudes links 1983 onwards.  I am currently working on tutorials using data from the 2011 wave and from the 2009 to 2014 waves in the cumulative file.   Some of these are already available on page 3.2 Three (or more) variables and in 4.2.1  Income differences – Statistical significance

Tags: ,