Top Excel Tips For Data Analysts
Posted by Armando Brito Mendes | Filed under Bases de Dados, SAD - DSS
Excelentes concelhos para utilização das últimas ferramentas implementadas no Excel.
Tags: Excel, programação em folha de cálculo
curso de KNIME
Posted by Armando Brito Mendes | Filed under mapas SIG's, materiais para profissionais, software, videos, visualização
Muito bom curso de KNIME, é introdutório mas introduz um grande número de funcionalidades.
KNIME Online Self-Training
Welcome to the KNIME Self-training course. The focus of this document is to get you started with KNIME as quickly as possible and guide you through essential steps of advanced analytics with KNIME. Optional and very useful topics such as reporting, KNIME Server and database handling are also included to give you an idea of what else is possible with KNIME.
- Installing KNIME Analytics Platform and Extensions
- Data Import / Export and Database / Big Data
- ETL
- Visualization
- Advanced Analytics
- Reporting
- KNIME Server
Tags: análise de dados, big data, data mining, Knime, text mining
Decision trees: Do Splitting Rules Really Matter?
Posted by Armando Brito Mendes | Filed under Sem categoria
Um bom texto sobre o critério de divisão em subgrupos nas árvores de decisão.
Do decision-tree splitting criteria matter? Contrary to popular opinion in data mining circles, our experience indicates that splitting criteria do matter; in fact, the difference between using the right rule and the wrong rule could add up to millions of dollars of lost opportunity.
So, why haven’t the differences been noticed? The answer is simple. When data sets are small and highly-accurate trees can be generated easily, the particular splitting rule does not matter. When your golf ball is one inch from the cup, which club or even which end you use is not important because you will be able to sink the ball in one stroke. Unfortunately, previous examinations of splitting rule performance, the ones that found no differences, did not look at data-mining problems with large data sets where obtaining a good answer is genuinely difficult.
When you are trying to detect fraud, identify borrowers who will declare bankruptcy in the next 12 months, target a direct mail campaign, or tackle other real-world business problems that do not admit of 90+ percent accuracy rates (with currently available data), the splitting rule you choose could materially affect the accuracy and value of your decision tree. Further, even when different splitting rules yield similarly accurate classifiers, the differences between them may still matter. With multiple classes, you might care how the errors are distributed across classes. Between two trees with equal overall error rates, you might prefer a tree that performs better on a particular class or classes. If the purpose of a decision tree is to yield insight into a causal process or into the structure of a database, splitting rules of similar accuracy can yield trees that vary greatly in their usefulness for interpreting and understanding the data.
This paper explores the key differences between three important splitting criteria: Gini, Twoing and Entropy, for three- and greater-level classification trees, and suggests how to choose the right one for a particular problem type. Although we can make recommendations as to which splitting rule is best suited to which type of problem, it is good practice to always use several splitting rules and compare the results. You should experiment with several different splitting rules and should expect different results from each. As you work with different types of data and problems, you will begin to learn which splitting rules typically work best for specific problem types. Nevertheless, you should never rely on a single rule alone; experimentation is always wise.
Gini, Twoing, and Entropy
The best known rules for binary recursive partitioning are Gini, Twoing, and Entropy. Because each rule represents a different philosophy as to the purpose of the decision tree, each grows a different style of tree.
Guardar
Tags: análise de dados, data mining
MARS – Multivariate Adaptive Regression Splines
Posted by Armando Brito Mendes | Filed under materiais ensino, materiais para profissionais
Boa descrição destes algoritmos de análise de dados pelos proprios autores
An Overview of MARS
What is “MARS”?
MARS®, an acronym for Multivariate Adaptive Regression Splines, is a multivariate non-parametric regression procedure introduced in 1991 by world-renowned Stanford statistician and physicist, Jerome Friedman (Friedman, 1991). Salford Systems’ MARS, based on the original code, has been substantially enhanced with new features and capabilities in exclusive collaboration with Friedman.
Tags: análise de dados, data mining, machine learning
Tinker With a Neural Network
Posted by Armando Brito Mendes | Filed under software, visualização
Uma excelente aplicação web para perceber como as redes neuronais funcionam
Um, What Is a Neural Network?
It’s a technique for building a computer program that learns from data. It is based very loosely on how we think the human brain works. First, a collection of software “neurons” are created and connected together, allowing them to send messages to each other. Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure. For a more detailed introduction to neural networks, Michael Nielsen’s Neural Networks and Deep Learning is a good place to start. For a more technical overview, try Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
This Is Cool, Can I Repurpose It?
Please do! We’ve open sourced it on GitHub with the hope that it can make neural networks a little more accessible and easier to learn. You’re free to use it in any way that follows our Apache License. And if you have any suggestions for additions or changes, please let us know.
We’ve also provided some controls below to enable you tailor the playground to a specific topic or lesson. Just choose which features you’d like to be visible below then save this link, or refresh the page.
What Do All the Colors Mean?
Orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.
The data points (represented by small circles) are initially colored orange or blue, which correspond to positive one and negative one.
In the hidden layers, the lines are colored by the weights of the connections between neurons. Blue shows a positive weight, which means the network is using that output of the neuron as given. An orange line shows that the network is assiging a negative weight.
In the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.
What Library Are You Using?
We wrote a tiny neural network library that meets the demands of this educational visualization. For real-world applications, consider the TensorFlow library.
Credits
This was created by Daniel Smilkov and Shan Carter. This is a continuation of many people’s previous work — most notably Andrej Karpathy’s convnet.js demo and Chris Olah’s articles about neural networks. Many thanks also to D. Sculley for help with the original idea and to Fernanda Viégas and Martin Wattenberg and the rest of the Big Picture and Google Brain teams for feedback and guidance.
Tags: data mining, machine learning, web apps
Os portugueses durante o euro com dados do multibanco
Posted by Armando Brito Mendes | Filed under estatística, visualização
Um bom exemplo da utilização de dados para inferir comportamentos mas a parte das coincidências de valores era dispensável
Como conquistámos o Euro 2016 através do Multibanco (com infografia)
Publicado em: 20/07/2016 – 19:11:26
À hora da final entre Portugal e França, o país parou… e os levantamentos também! Conheça esta e outras curiosidades que marcaram o comportamento dos portugueses com a rede Multibanco à medida que os 23 magníficos conquistavam o Europeu 2016
Guardar
Guardar
Tags: belo, big data, data mining, DW \ BI
How to create a slicer in Excel
Posted by Armando Brito Mendes | Filed under lições, materiais ensino, materiais para profissionais, software
Bom tutorial de como usar umas das novas funcionalidades do Excel
For dashboards and quick filtering, you can’t beat Excel slicers. They’re easy to implement and even easier to use. Here are the basics–plus a few power tips.
Tags: Excel
SAP video analytics
Posted by Armando Brito Mendes | Filed under materiais para profissionais, videos
SME Solutions and Partner Innovation
Tags: análise de dados, data mining
MySQL Documentation
Posted by Armando Brito Mendes | Filed under linguagens de programação, materiais para profissionais, software
Montes de documentação sobre todos os produtos MySQL
Guardar
Tags: SQL
Hackers Remotely Kill a Jeep on the Highway
Posted by Armando Brito Mendes | Filed under videos
Um exemplo dos problemas de segunrança ainda existentes no IoT.
Two hackers have developed a tool that can hijack a Jeep over the internet. WIRED senior writer Andy Greenberg takes the SUV for a spin on the highway while the hackers attack it from miles away.
Guardar
Tags: big data, data mining