Top Excel Tips For Data Analysts

Welcome to the KNIME Self-training course. The focus of this document is to get you started with KNIME as quickly as possible and guide you through essential steps of advanced analytics with KNIME. Optional and very useful topics such as reporting, KNIME Server and database handling are also included to give you an idea of what else is possible with KNIME.

Tags: análise de dados, big data, data mining, Knime, text mining

Read more | Comments off | December 16th, 2016

Decision trees: Do Splitting Rules Really Matter?

Posted by Armando Brito Mendes | Filed under Sem categoria

clicar na imagem para seguir o link

Um bom texto sobre o critério de divisão em subgrupos nas árvores de decisão.

Do decision-tree splitting criteria matter? Contrary to popular opinion in data mining circles, our experience indicates that splitting criteria do matter; in fact, the difference between using the right rule and the wrong rule could add up to millions of dollars of lost opportunity.

So, why haven’t the differences been noticed? The answer is simple. When data sets are small and highly-accurate trees can be generated easily, the particular splitting rule does not matter. When your golf ball is one inch from the cup, which club or even which end you use is not important because you will be able to sink the ball in one stroke. Unfortunately, previous examinations of splitting rule performance, the ones that found no differences, did not look at data-mining problems with large data sets where obtaining a good answer is genuinely difficult.

When you are trying to detect fraud, identify borrowers who will declare bankruptcy in the next 12 months, target a direct mail campaign, or tackle other real-world business problems that do not admit of 90+ percent accuracy rates (with currently available data), the splitting rule you choose could materially affect the accuracy and value of your decision tree. Further, even when different splitting rules yield similarly accurate classifiers, the differences between them may still matter. With multiple classes, you might care how the errors are distributed across classes. Between two trees with equal overall error rates, you might prefer a tree that performs better on a particular class or classes. If the purpose of a decision tree is to yield insight into a causal process or into the structure of a database, splitting rules of similar accuracy can yield trees that vary greatly in their usefulness for interpreting and understanding the data.

This paper explores the key differences between three important splitting criteria: Gini, Twoing and Entropy, for three- and greater-level classification trees, and suggests how to choose the right one for a particular problem type. Although we can make recommendations as to which splitting rule is best suited to which type of problem, it is good practice to always use several splitting rules and compare the results. You should experiment with several different splitting rules and should expect different results from each. As you work with different types of data and problems, you will begin to learn which splitting rules typically work best for specific problem types. Nevertheless, you should never rely on a single rule alone; experimentation is always wise.

Gini, Twoing, and Entropy

The best known rules for binary recursive partitioning are Gini, Twoing, and Entropy. Because each rule represents a different philosophy as to the purpose of the decision tree, each grows a different style of tree.

Guardar

Tags: análise de dados, data mining

Read more | Comments off | November 18th, 2016

MARS – Multivariate Adaptive Regression Splines

Posted by Armando Brito Mendes | Filed under materiais ensino, materiais para profissionais

clique na imagem para seguir o link

Boa descrição destes algoritmos de análise de dados pelos proprios autores

An Overview of MARS

What is “MARS”?

MARS®, an acronym for Multivariate Adaptive Regression Splines, is a multivariate non-parametric regression procedure introduced in 1991 by world-renowned Stanford statistician and physicist, Jerome Friedman (Friedman, 1991). Salford Systems’ MARS, based on the original code, has been substantially enhanced with new features and capabilities in exclusive collaboration with Friedman.

Tags: análise de dados, data mining, machine learning

Read more | Comments off | September 23rd, 2016

Tinker With a Neural Network

Posted by Armando Brito Mendes | Filed under software, visualização

clique na imagem para seguir o link

Uma excelente aplicação web para perceber como as redes neuronais funcionam

Um, What Is a Neural Network?

It’s a technique for building a computer program that learns from data. It is based very loosely on how we think the human brain works. First, a collection of software “neurons” are created and connected together, allowing them to send messages to each other. Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure. For a more detailed introduction to neural networks, Michael Nielsen’s Neural Networks and Deep Learning is a good place to start. For a more technical overview, try Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

This Is Cool, Can I Repurpose It?

Please do! We’ve open sourced it on GitHub with the hope that it can make neural networks a little more accessible and easier to learn. You’re free to use it in any way that follows our Apache License. And if you have any suggestions for additions or changes, please let us know.

We’ve also provided some controls below to enable you tailor the playground to a specific topic or lesson. Just choose which features you’d like to be visible below then save this link, or refresh the page.

Show test dataDiscretize outputPlay buttonStep buttonReset buttonLearning rateActivationRegularizationRegularization rateProblem typeWhich datasetRatio train dataNoise levelBatch size# of hidden layers

What Do All the Colors Mean?

Orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.

The data points (represented by small circles) are initially colored orange or blue, which correspond to positive one and negative one.

In the hidden layers, the lines are colored by the weights of the connections between neurons. Blue shows a positive weight, which means the network is using that output of the neuron as given. An orange line shows that the network is assiging a negative weight.

In the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.

What Library Are You Using?

We wrote a tiny neural network library that meets the demands of this educational visualization. For real-world applications, consider the TensorFlow library.

Credits

This was created by Daniel Smilkov and Shan Carter. This is a continuation of many people’s previous work — most notably Andrej Karpathy’s convnet.js demo and Chris Olah’s articles about neural networks. Many thanks also to D. Sculley for help with the original idea and to Fernanda Viégas and Martin Wattenberg and the rest of the Big Picture and Google Brain teams for feedback and guidance.

Tags: data mining, machine learning, web apps

Read more | Comments off | July 26th, 2016

Os portugueses durante o euro com dados do multibanco

Posted by Armando Brito Mendes | Filed under estatística, visualização

clique na imagem para seguir o link

Um bom exemplo da utilização de dados para inferir comportamentos mas a parte das coincidências de valores era dispensável

Como conquistámos o Euro 2016 através do Multibanco (com infografia)

Publicado em: 20/07/2016 – 19:11:26

À hora da final entre Portugal e França, o país parou… e os levantamentos também! Conheça esta e outras curiosidades que marcaram o comportamento dos portugueses com a rede Multibanco à medida que os 23 magníficos conquistavam o Europeu 2016

Guardar

Tags: belo, big data, data mining, DW \ BI

Read more | Comments off | July 21st, 2016

How to create a slicer in Excel

Posted by Armando Brito Mendes | Filed under lições, materiais ensino, materiais para profissionais, software

clicar para seguir o link

Bom tutorial de como usar umas das novas funcionalidades do Excel

For dashboards and quick filtering, you can’t beat Excel slicers. They’re easy to implement and even easier to use. Here are the basics–plus a few power tips.

Tags: Excel

Read more | Comments off | July 20th, 2016

SAP video analytics

Posted by Armando Brito Mendes | Filed under materiais para profissionais, videos

clicar para seguir o link

montes de vídeos sobre analytics da SAP

Digital Enterprise Platform

Analytics (28)
Cloud Platform (4)
Internet of Things (7)
SAP HANA Platform (12)
User Experience (2)

SAP Digital Business Services

SAPIndustry

SAPLineOfBusiness

SME Solutions and Partner Innovation

Tags: análise de dados, data mining

Read more | Comments off | July 20th, 2016

MySQL Documentation

Posted by Armando Brito Mendes | Filed under linguagens de programação, materiais para profissionais, software

clique na imagem para seguir o link

Montes de documentação sobre todos os produtos MySQL

MySQL Server

MySQL 5.7 Reference Manual (GA)

MySQL 5.6 Reference Manual (GA)

MySQL 5.6 Reference Manual (GA) (Japanese)

MySQL 5.5 Reference Manual (GA)

MySQL Enterprise

MySQL Enterprise Monitor 3.2

MySQL Enterprise Monitor 3.1

MySQL Enterprise Monitor 3.0

MySQL Enterprise Monitor 3.0 (Japanese)

Oracle Enterprise Manager for MySQL Database

MySQL Enterprise Backup 4.0

MySQL Enterprise Backup 3.12

MySQL Enterprise Backup 3.11

MySQL Enterprise Backup 3.11 (Japanese)

MySQL Enterprise Security

MySQL Enterprise Encryption

MySQL Enterprise Audit

MySQL Enterprise Firewall

MySQL Thread Pool

MySQL Cluster

MySQL Cluster NDB 7.3/7.4 Reference Guide (GA)

MySQL Cluster NDB 7.2 Reference Guide (GA)

MySQL Cluster NDB 6.3/7.0/7.1 Reference Guide (GA)

MySQL Cluster API Developer Guide

memcache and MySQL Cluster

MySQL Cluster Manager 1.4

MySQL Cluster Manager 1.3

MySQL Workbench

Connectors & APIs

Connectors and APIs

Connector/J 5.1 (GA)

Connector/J 6.0 (Milestone)

MySQL for Visual Studio

MySQL Utilities / Fabric

MySQL Router

X DevAPI

X DevAPI User Guide

MySQL Connector/J X DevAPI Reference

MySQL Connector/Net X DevAPI Reference

MySQL Connector/Node.js X DevAPI Reference

MySQL Connector/Python X DevAPI Reference

MySQL Shell X DevAPI Reference

Release Notes

MySQL 5.7 Release Notes

MySQL 5.6 Release Notes

MySQL 5.5 Release Notes

MySQL Cluster 7.4 Release Notes

MySQL Cluster 7.3 Release Notes

MySQL Cluster 7.2 Release Notes

MySQL Cluster 7.1 Release Notes

MySQL Cluster Manager 1.4 Release Notes

MySQL Cluster Manager 1.3 Release Notes

MySQL Enterprise Monitor 3.2 Release Notes

MySQL Enterprise Monitor 3.1 Release Notes

MySQL Enterprise Monitor 3.0 Release Notes

Oracle Enterprise Manager for MySQL Database Release Notes

MySQL Enterprise Backup 4.0 Release Notes

MySQL Enterprise Backup 3.12 Release Notes

MySQL Enterprise Backup 3.11 Release Notes

MySQL Shell Release Notes

Connector/J 5.1 Release Notes

Connector/J 6.0 Release Notes

Connector/ODBC Release Notes

Connector/Net Release Notes

Connector/Node.js Release Notes

Connector/Python Release Notes

Connector/C Release Notes

Connector/C++ Release Notes

MySQL Installer Release Notes

MySQL Notifier Release Notes

MySQL for Excel Release Notes

MySQL for Visual Studio Release Notes

MySQL Workbench Release Notes

MySQL Router Release Notes

MySQL Fabric Release Notes

MySQL Utilities Release Notes

Expert Guides

MySQL Internals

MySQL Test Framework 2.0

Guardar

Tags: SQL

Read more | Comments off | July 20th, 2016

Hackers Remotely Kill a Jeep on the Highway

Posted by Armando Brito Mendes | Filed under videos

clicar na imagem para seguir o link

Um exemplo dos problemas de segunrança ainda existentes no IoT.

Publicado a 21/07/2015

Two hackers have developed a tool that can hijack a Jeep over the internet. WIRED senior writer Andy Greenberg takes the SUV for a spin on the highway while the hackers attack it from miles away.

Guardar

Tags: big data, data mining

Read more | Comments off | July 15th, 2016

« Older Entries

Newer Entries »

Armando B. Mendes

Top Excel Tips For Data Analysts

curso de KNIME

KNIME Online Self-Training

Decision trees: Do Splitting Rules Really Matter?

MARS – Multivariate Adaptive Regression Splines

Boa descrição destes algoritmos de análise de dados pelos proprios autores

An Overview of MARS

Tinker With a Neural Network

Um, What Is a Neural Network?

This Is Cool, Can I Repurpose It?

What Do All the Colors Mean?

What Library Are You Using?

Credits

Os portugueses durante o euro com dados do multibanco

Um bom exemplo da utilização de dados para inferir comportamentos mas a parte das coincidências de valores era dispensável

Como conquistámos o Euro 2016 através do Multibanco (com infografia)

How to create a slicer in Excel

SAP video analytics

MySQL Documentation

Hackers Remotely Kill a Jeep on the Highway

Categorias de Posts

Palavras chave mais usadas

Arquivo

Recent Posts

Recent Comments

About