Spreadsheet Addiction

Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais

clicar na imagem para seguir o link

Um bom e muito completo relato dos defeitos do MS Excel para análise de dados.
Some people will think that the “addiction” in the title is over the top, or at least used metaphorically. It is used literally, and is not an exaggeration.

Addiction is the persistent use of a substance where that use is detrimental to the user. It is not the substance that is the problem — more limited use may be beneficial. It is the extent and circumstances of the use that determine if the behavior is addictive or not.

Spreadsheets are a wonderful invention. They are an excellent tool for what they are good at. The problem is that they are often stretched far beyond their home territory. Dangerous abuse of spreadsheets is only too common.

I know there are many spreadsheets in financial companies that take all night to compute. These are complicated and commonly fail. When such spreadsheets are replaced by code more suited to the task, it is not unusual for the computation time to be cut to a few minutes and the process much easier to understand.

A 2012 example of spreadsheet addiction.

The technology acceptance model holds that there are two main factors that determine the uptake of a technology: the perceived usefulness and the perceived ease-of-use. Perception need not correspond to reality.

The perception of the ease-of-use of spreadsheets is to some extent an illusion. It is dead easy to get an answer from a spreadsheet, however, it is not necessarily easy to get the right answer. Thus the distorted view.

The difficulty of using alternatives to spreadsheets is overestimated by many people. Safety features can give the appearance of difficulty when in fact these are an aid.

The hard way looks easy, the easy way looks hard.

The remainder of this page is divided into the sections:

Spreadsheet Computation
The Treatment Center (Alternatives)
If You Must Persist
Specific Problems with Excel
Additional Links

Tags: análise de dados, Excel, programação em folha de cálculo

Read more | Comments off | December 1st, 2014

Using Open Source Technology in Higher Education

Posted by Armando Brito Mendes | Filed under estatística, software

Um blogue com muitos posts sobre a utilização do R

Using R for Basic Cross Tabulation Analysis: Part Three, Using the xtabs Function

crosstabs r r programming r statistics table analysis

Using R to Work with GSS Survey Data: Cross Tabulation Tables

chi squared cross tables crosstabs r r programming r statistics table analysis

R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey

create csv file file conversion r r programming r statistics r tutorial read spss files research

How to Set Up SSH to Remotely Control Your Raspberry Pi

mmand line raspberry pi raspberry pi computing Raspberry Pi Software Configuation remote access with ssh set up ssh ssh terminal program

Tags: análise de dados, data mining, desnvolvimento de software, Estat Descritiva, R-software, software estatístico

Read more | Comments off | August 21st, 2014

Income inequality seen in satellite images from Google Earth

Posted by Armando Brito Mendes | Filed under estatística, visualização

Uso de proxis para identificar vizinhanças pobres

Researchers Pengyu Zhua and Yaoqi Zhang noted in their 2008 paper that “the demand for urban forests is elastic with respect to price and highly responsive to changes in income.” Poor neighborhoods tend to have fewer trees and the rate of forestry growth is slower than that of richer neighborhoods.

Tim De Chant of Per Square Mile wondered if this difference could be seen through satellite images in Google Earth. It turns out that you can see the distinct difference in a lot of places. Above, for example, shows two areas in Rio de Janeiro: Rocinha on the left and Zona Sul on the right. Notice the tree-lined streets versus the not so green.

De Chant notes:

It’s easy to see trees as a luxury when a city can barely keep its roads and sewers in working order, but that glosses over the many benefits urban trees provide. They shade houses in the summer, reducing cooling bills. They scrub the air of pollution, especially of the particulate variety, which in many poor neighborhoods is responsible for increased asthma rates and other health problems. They also reduce stress, which has its own health benefits. Large, established trees can even fight crime.

Okay, I don’t now about that last part about fighting crime. Without seeing the data, I think that sounds like a correlation more than anything else, but still. Trees. Good.

Tags: análise de dados, data mining, image mining, mapas

Read more | Comments off | August 12th, 2014

Site sobre visualização da GE.com

Posted by Armando Brito Mendes | Filed under estatística, visualização

Site com muitos exemplos de visualização mantido pela GE

GE Works. Building, Moving, Powering and Curing the world. In the process, our technologies are generating data on a petabyte scale. This data contains valuable information that will drive insights, innovations, and discoveries, but it can be difficult to access and digest. Using data visualization, we’re pairing science and design to simplify the complexity and drive a deeper understanding of the context in which we operate.

Check out our latest video.

We encourage you to explore the projects below.

For further information about GE’s data visualization program, please contact us at datavizinfo@ge.com

To share your own visualizations, please visit www.visualizing.org

Tags: análise de dados, belo, data mining, Estat Descritiva, mapas

Read more | Comments off | August 11th, 2014

Better data centers through machine learning

Posted by Armando Brito Mendes | Filed under materiais para profissionais

Exemplo de aplicação de algoritmos de aprendizagem automática

It’s no secret that we’re obsessed with saving energy. For over a decade we’ve been designing and building data centers that use half the energy of a typical data center, and we’re always looking for ways to reduce our energy use even further. In our pursuit of extreme efficiency, we’ve hit upon a new tool: machine learning. Today we’re releasing a white paper (PDF) on how we’re using neural networks to optimize data center operations and drive our energy use to new lows.

Tags: análise de dados, data mining, previsão

Read more | Comments off | May 29th, 2014

Erros em gráficos na notícias

Posted by Armando Brito Mendes | Filed under estatística, visualização

Três exemplos de erros em gráficos nos canais de notícias

Fox News bar chart gets it wrong

Because Fox News. See also this, this, and this. [Thanks, Meron]

Tags: análise de dados, belo, data mining, Estat Descritiva

Read more | Comments off | April 4th, 2014

Read Histograms and Use Them in R

Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização

Bom tutorial para construir histogramas no R

Tutorials / histogram, R

How to Read Histograms and Use Them in R

By Nathan Yau

The chart type often goes overlooked because people don’t understand them. Maybe this will help.

Download Source

The histogram is one of my favorite chart types, and for analysis purposes, I probably use them the most. Devised by Karl Pearson (the father of mathematical statistics) in the late 1800s, it’s simple geometrically, robust, and allows you to see the distribution of a dataset.

If you don’t understand what’s driving the chart though, it can be confusing, which is probably why you don’t see it often in general publications.

Tags: análise de dados, data mining, Estat Descritiva, R-software, software estatístico

Read more | Comments off | March 13th, 2014

portal smart datacollective.com

Posted by Armando Brito Mendes | Filed under materiais para profissionais

Um portal de notícias sobre ciencia dos dados, big data, analytics

SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.

Tags: análise de dados, big data, bioinformatica, captura de conhecimento, data mining, decisao em grupo

Read more | Comments off | March 6th, 2014

17 short tutorials all data scientists should read

Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais

Excelentes textos fundamentais para cientistas dos dados

Here’s the list:

Related link: The Data Science Toolkit

Tags: análise de dados, big data, captura de conhecimento, data mining, Excel, R-software

Read more | Comments off | February 26th, 2014

Apache Spark

Posted by Armando Brito Mendes | Filed under materiais para profissionais, software

Uma alternativa ao Hadoop para computação com dados em memória

What is Apache Spark?

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.

What can it do?

Spark was initially developed for two applications where placing data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can run up to 100x faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our example jobs.

Spark is also the engine behind Shark, a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.

While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.

Tags: análise de dados, big data, data mining, DW \ BI

Read more | Comments off | January 13th, 2014

« Older Entries

Newer Entries »

Armando B. Mendes

Spreadsheet Addiction

Using Open Source Technology in Higher Education

Using R for Basic Cross Tabulation Analysis: Part Three, Using the xtabs Function

Using R to Work with GSS Survey Data: Cross Tabulation Tables

R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey

How to Set Up SSH to Remotely Control Your Raspberry Pi

Income inequality seen in satellite images from Google Earth

Site sobre visualização da GE.com

Better data centers through machine learning

Erros em gráficos na notícias

Fox News bar chart gets it wrong

Read Histograms and Use Them in R

How to Read Histograms and Use Them in R

portal smart datacollective.com

17 short tutorials all data scientists should read

Apache Spark

What is Apache Spark?

What can it do?

Categorias de Posts

Palavras chave mais usadas

Arquivo

Recent Posts

Recent Comments

About