Top Excel Tips For Data Analysts

clicar na imagem apra seguir o link

clicar na imagem apra seguir o link

Excelentes concelhos para utilização das últimas ferramentas implementadas no Excel.

TIPS FOR DATA CLEANING
1) Change format of numbers from text to numeric
2) Unpivot columns in a data set (Multiple consolidation ranges and Power Query)
3) Merge data from several csv files into a single folder (RDBMerge Add-in and Power Query)
4) Fill empty spaces from content above (Ctrl + Enter trick and Power Query)
DATA ANALYSIS
5) Create auto expandable ranges with Excel Tables (Source for pivots, dropdown lists and formulas)
6) How to do two way lookup with INDEX and MATCH
7) Creating OR criteria within SUMIF/COUNTIF (Combination of SUMPRODUCT and SUMIF/COUNTIF)
8) Counting unique items within PivotTables (Using the Excel Data Model)
DATA VISUALIZATION
9) Quickly visualize trends with Sparklines
10) Create dynamic titles in charts (Use of cell references within chart objects)
11) Dealing with empty cells in charts and sparklines [use NA()]
12) Save time with Quick Analysis

Tags: ,

curso de KNIME

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Muito bom curso de KNIME, é introdutório mas introduz um grande número de funcionalidades.

KNIME Online Self-Training

Welcome to the KNIME Self-training course. The focus of this document is to get you started with KNIME as quickly as possible and guide you through essential steps of advanced analytics with KNIME. Optional and very useful topics such as reporting, KNIME Server and database handling are also included to give you an idea of what else is possible with KNIME.

  1. Installing KNIME Analytics Platform and Extensions
  2. Data Import / Export and Database / Big Data
  3. ETL
  4. Visualization
  5. Advanced Analytics
  6. Reporting
  7. KNIME Server

Tags: , , , ,

Decision trees: Do Splitting Rules Really Matter?

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Um bom texto sobre o critério de divisão em subgrupos nas árvores de decisão.

Do decision-tree splitting criteria matter? Contrary to popular opinion in data mining circles, our experience indicates that splitting criteria do matter; in fact, the difference between using the right rule and the wrong rule could add up to millions of dollars of lost opportunity.

So, why haven’t the differences been noticed? The answer is simple. When data sets are small and highly-accurate trees can be generated easily, the particular splitting rule does not matter. When your golf ball is one inch from the cup, which club or even which end you use is not important because you will be able to sink the ball in one stroke. Unfortunately, previous examinations of splitting rule performance, the ones that found no differences, did not look at data-mining problems with large data sets where obtaining a good answer is genuinely difficult.

When you are trying to detect fraud, identify borrowers who will declare bankruptcy in the next 12 months, target a direct mail campaign, or tackle other real-world business problems that do not admit of 90+ percent accuracy rates (with currently available data), the splitting rule you choose could materially affect the accuracy and value of your decision tree. Further, even when different splitting rules yield similarly accurate classifiers, the differences between them may still matter. With multiple classes, you might care how the errors are distributed across classes. Between two trees with equal overall error rates, you might prefer a tree that performs better on a particular class or classes. If the purpose of a decision tree is to yield insight into a causal process or into the structure of a database, splitting rules of similar accuracy can yield trees that vary greatly in their usefulness for interpreting and understanding the data.

This paper explores the key differences between three important splitting criteria: Gini, Twoing and Entropy, for three- and greater-level classification trees, and suggests how to choose the right one for a particular problem type. Although we can make recommendations as to which splitting rule is best suited to which type of problem, it is good practice to always use several splitting rules and compare the results. You should experiment with several different splitting rules and should expect different results from each. As you work with different types of data and problems, you will begin to learn which splitting rules typically work best for specific problem types. Nevertheless, you should never rely on a single rule alone; experimentation is always wise.

Gini, Twoing, and Entropy

The best known rules for binary recursive partitioning are Gini, Twoing, and Entropy. Because each rule represents a different philosophy as to the purpose of the decision tree, each grows a different style of tree.

Guardar

Tags: ,

MARS – Multivariate Adaptive Regression Splines

clique na imagem para seguir o link

clique na imagem para seguir o link

Boa descrição destes algoritmos de análise de dados pelos proprios autores

An Overview of MARS

What is “MARS”?

MARS®, an acronym for Multivariate Adaptive Regression Splines, is a multivariate non-parametric regression procedure introduced in 1991 by world-renowned Stanford statistician and physicist, Jerome Friedman (Friedman, 1991). Salford Systems’ MARS, based on the original code, has been substantially enhanced with new features and capabilities in exclusive collaboration with Friedman.

Tags: , ,

Tinker With a Neural Network

clique na imagem para seguir o link

clique na imagem para seguir o link

Uma excelente aplicação web para perceber como as redes neuronais funcionam

Um, What Is a Neural Network?

It’s a technique for building a computer program that learns from data. It is based very loosely on how we think the human brain works. First, a collection of software “neurons” are created and connected together, allowing them to send messages to each other. Next, the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure. For a more detailed introduction to neural networks, Michael Nielsen’s Neural Networks and Deep Learning is a good place to start. For a more technical overview, try Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

This Is Cool, Can I Repurpose It?

Please do! We’ve open sourced it on GitHub with the hope that it can make neural networks a little more accessible and easier to learn. You’re free to use it in any way that follows our Apache License. And if you have any suggestions for additions or changes, please let us know.

We’ve also provided some controls below to enable you tailor the playground to a specific topic or lesson. Just choose which features you’d like to be visible below then save this link, or refresh the page.

What Do All the Colors Mean?

Orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.

The data points (represented by small circles) are initially colored orange or blue, which correspond to positive one and negative one.

In the hidden layers, the lines are colored by the weights of the connections between neurons. Blue shows a positive weight, which means the network is using that output of the neuron as given. An orange line shows that the network is assiging a negative weight.

In the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.

What Library Are You Using?

We wrote a tiny neural network library that meets the demands of this educational visualization. For real-world applications, consider the TensorFlow library.

Credits

This was created by Daniel Smilkov and Shan Carter. This is a continuation of many people’s previous work — most notably Andrej Karpathy’s convnet.js demo and Chris Olah’s articles about neural networks. Many thanks also to D. Sculley for help with the original idea and to Fernanda Viégas and Martin Wattenberg and the rest of the Big Picture and Google Brain teams for feedback and guidance.

Tags: , ,

Os portugueses durante o euro com dados do multibanco

clique na imagem para seguir o link

clique na imagem para seguir o link

Um bom exemplo da utilização de dados para inferir comportamentos mas a parte das coincidências de valores era dispensável

Como conquistámos o Euro 2016 através do Multibanco (com infografia)

Publicado em: 20/07/2016 – 19:11:26

À hora da final entre Portugal e França, o país parou… e os levantamentos também! Conheça esta e outras curiosidades que marcaram o comportamento dos portugueses com a rede Multibanco à medida que os 23 magníficos conquistavam o Europeu 2016

Guardar

Guardar

Tags: , , ,

How to create a slicer in Excel

clicar para seguir o link

clicar para seguir o link

Bom tutorial de como usar umas das novas funcionalidades do Excel

For dashboards and quick filtering, you can’t beat Excel slicers. They’re easy to implement and even easier to use. Here are the basics–plus a few power tips.

Tags:

SAP video analytics

clicar para seguir o link

clicar para seguir o link

montes de vídeos sobre analytics da SAP
Digital Enterprise Platform
SAP Digital Business Services
SAPIndustry
SAPLineOfBusiness

SME Solutions and Partner Innovation

Tags: ,

MySQL Documentation

clique na imagem para seguir o link

clique na imagem para seguir o link

Montes de documentação sobre todos os produtos MySQL

Guardar

Tags:

Hackers Remotely Kill a Jeep on the Highway

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Um exemplo dos problemas de segunrança ainda existentes no IoT.

Publicado a 21/07/2015

Two hackers have developed a tool that can hijack a Jeep over the internet. WIRED senior writer Andy Greenberg takes the SUV for a spin on the highway while the hackers attack it from miles away.

Guardar

Tags: ,