How and Why: Decorrelate Time Series
Posted by Armando Brito Mendes | Filed under estatística, Investigação Operacional, materiais para profissionais
O problemas das autocorrelações nas séries cronológicas.
When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions.
A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found (using a few tests of randomness), then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series, or reveal otherwise undetected outliers.
Tags: previsão
The 7 Most Important Data Mining Techniques
Posted by Armando Brito Mendes | Filed under materiais para profissionais
Pequena introdução a ulguns dos métodos mais usados em data mining
Data mining is the process of looking at large banks of information to generate new information. Intuitively, you might think that data “mining” refers to the extraction of new data, but this isn’t the case; instead, data mining is about extrapolating patterns and new knowledge from the data you’ve already collected.
Relying on techniques and technologies from the intersection of database management, statistics, and machine learning, specialists in data mining have dedicated their careers to better understanding how to process and draw conclusions from vast amounts of information. But what are the techniques they use to make this happen?
Tags: data mining
Playground to Politics
Posted by Armando Brito Mendes | Filed under data sets, estatística, materiais para profissionais
Dados de um questionário a 50 professores londrinos.
A study of values and attitudes among fifth formers in a North London comprehensive school.
This survey of teenage attitudes and opinions in a North London comprehensive school (11-18 mixed) was designed and conducted, under my guidance and supervision, by three of my sophomore students as part of their group research dissertation for BA Applied Social Studies (Social Research) at the Polytechnic of North London (PNL, now part of London Metropolitan University). . It aimed to discover something about pupils’ future expectations and awareness of, and attitudes towards, various current social issues and problems, particularly racism and sexism. It replicates various items and scales from other work (Wilson-Patterson, Eysenck, Himmelweit, Srole-Christie) particularly the St Paul’s Girls senior pupils study (Feb 1973) some of which were also used in the SSRC Survey Unit Quality of Life surveys 1971-75.
The self-completion questionnaire was completed in December 1981 by all fifth form pupils present on the day of the survey (N=142). It was administered during time-tabled Social Studies classes and, time permitting, was followed by discussion with class teachers and the PNL students of the issues covered in the survey.
Given the particularly high quality of this project, a user manual was prepared by John Hall and Alison Walker for use with the postgraduate Survey Analysis Workshop and the undergraduate course Data Management and Analysis. It serves as model documentation for similar small survey projects.
Tags: inquéritos
curso de KNIME
Posted by Armando Brito Mendes | Filed under mapas SIG's, materiais para profissionais, software, videos, visualização
Muito bom curso de KNIME, é introdutório mas introduz um grande número de funcionalidades.
KNIME Online Self-Training
Welcome to the KNIME Self-training course. The focus of this document is to get you started with KNIME as quickly as possible and guide you through essential steps of advanced analytics with KNIME. Optional and very useful topics such as reporting, KNIME Server and database handling are also included to give you an idea of what else is possible with KNIME.
- Installing KNIME Analytics Platform and Extensions
- Data Import / Export and Database / Big Data
- ETL
- Visualization
- Advanced Analytics
- Reporting
- KNIME Server
Tags: análise de dados, big data, data mining, Knime, text mining
MARS – Multivariate Adaptive Regression Splines
Posted by Armando Brito Mendes | Filed under materiais ensino, materiais para profissionais
Boa descrição destes algoritmos de análise de dados pelos proprios autores
An Overview of MARS
What is “MARS”?
MARS®, an acronym for Multivariate Adaptive Regression Splines, is a multivariate non-parametric regression procedure introduced in 1991 by world-renowned Stanford statistician and physicist, Jerome Friedman (Friedman, 1991). Salford Systems’ MARS, based on the original code, has been substantially enhanced with new features and capabilities in exclusive collaboration with Friedman.
Tags: análise de dados, data mining, machine learning
How to create a slicer in Excel
Posted by Armando Brito Mendes | Filed under lições, materiais ensino, materiais para profissionais, software
Bom tutorial de como usar umas das novas funcionalidades do Excel
For dashboards and quick filtering, you can’t beat Excel slicers. They’re easy to implement and even easier to use. Here are the basics–plus a few power tips.
Tags: Excel
SAP video analytics
Posted by Armando Brito Mendes | Filed under materiais para profissionais, videos
SME Solutions and Partner Innovation
Tags: análise de dados, data mining
MySQL Documentation
Posted by Armando Brito Mendes | Filed under linguagens de programação, materiais para profissionais, software
Montes de documentação sobre todos os produtos MySQL
Guardar
Tags: SQL
Deeplearning4j Documentation
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software
O site de um pacote java para deeplearing com montes de info. sobre redes neuronais e afins.
- How To
- Quickstart: Running Examples and DL4J in Your Projects
- Comprehensive Setup Guide
- Build Locally From Master
- Contribute to DL4J (Developer Guide)
- Choose a Neural Net
- Use the Maven Build Tool
- Vectorize Data With Canova
- Build a Data Pipeline
- Run Benchmarks
- Configure DL4J in Ivy, Gradle, SBT etc
- Find a DL4J Class or Method
- Save and Load Models
- Interpret Neural Net Output
- Visualize Data with t-SNE
- Swap CPUs for GPUs
- Customize an Image Pipeline
- Perform Regression With Neural Nets
- Troubleshoot Training & Select Network Hyperparameters
- Visualize, Monitor and Debug Network Learning
- Speed Up Spark With Native Binaries
- Build a Recommendation Engine With DL4J
- Use Recurrent Networks in DL4J
- Build Complex Network Architectures with Computation Graph
- Train Networks using Early Stopping
- Download Snapshots With Maven
- Customize a Loss Function
- Introduction to Neural Networks
- Multilayer Neural Nets
- Tutorials
- Datasets
- Scaleout
- Text
- Resources
- DL4J, Torch7, Theano and Caffe
- Glossary of Terms for Deep Learning and Neural Nets
- Deep Learning’s Accuracy
- DataVec: ETL for ML
- ND4J Backends: How They Work
- Model Zoo
- Unsupervised Learning: Use Cases
- Eigenvectors, PCA, Covariance and Entropy
- Thought Vectors, AI and NLP
- Questions to Ask When Applying DL
- AI, Machine Learning and Deep Learning
- DL and Reinforcement Learning
- Javadoc: DL4J Methods and Classes
- Canova Javadoc: Canova Methods and Classes
- ND4J User Guide
- ND4J Javadoc
- Scala, Spark and Deep Learning
- Further Reading on Deep Learning
- Deep Learning in Other Languages
- Use Cases
- Architecture
- Features
- Roadmap
- About
- Open Data
- Latest Release Notes
Guardar
Tags: análise de dados, big data, data mining, desnvolvimento de software, machine learning
The Many Faces of ROC Analysis
Posted by Armando Brito Mendes | Filed under materiais para profissionais
Bom tutorial sobre curvas ROC
Receiver Operating Characteristics (ROC) Analysis originated from signal detection theory, as a model of how well a receiver is able to detect a signal in the presence of noise. Its key feature is the distinction between hit rate (or true positive rate) and false alarm rate (or false positive rate) as two separate performance measures. ROC analysis has also widely been used in medical data analysis to study the effect of varying the threshold on the numerical outcome of a diagnostic test. It has been introduced to machine learning relatively recently, in response to classification tasks with varying class distributions or misclassification costs (hereafter referred to as skew). ROC analysis is set to cause a paradigm shift in machine learning. Separating performance on classes is almost always a good idea from an analytical perspective. For instance, it can help us to
- understand the behaviour and skew-sensitivity of many machine learning metrics, including rule learning heuristics and decision tree splitting criteria, by plotting their isometrics in ROC space;
- develop new metrics specifically designed to improve the Area Under the ROC Curve (AUC) of a model;
- understand fundamental algorithms such as the separate-and-conquer or sequential covering rule learning algorithm, by tracing its trajectory through a sequence of ROC spaces.
The goal of this tutorial is to develop the ROC perspective in a systematic way, demonstrating the many faces of ROC analysis in machine learning.
Tags: data mining, DW \ BI, machine learning