Spreadsheet Addiction

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Um bom e muito completo relato dos defeitos do MS Excel para análise de dados.
Some people will think that the “addiction” in the title is over the top, or at least used metaphorically. It is used literally, and is not an exaggeration.

Addiction is the persistent use of a substance where that use is detrimental to the user. It is not the substance that is the problem — more limited use may be beneficial. It is the extent and circumstances of the use that determine if the behavior is addictive or not.

Spreadsheets are a wonderful invention. They are an excellent tool for what they are good at. The problem is that they are often stretched far beyond their home territory. Dangerous abuse of spreadsheets is only too common.

I know there are many spreadsheets in financial companies that take all night to compute. These are complicated and commonly fail. When such spreadsheets are replaced by code more suited to the task, it is not unusual for the computation time to be cut to a few minutes and the process much easier to understand.

A 2012 example of spreadsheet addiction.

The technology acceptance model holds that there are two main factors that determine the uptake of a technology: the perceived usefulness and the perceived ease-of-use. Perception need not correspond to reality.

The perception of the ease-of-use of spreadsheets is to some extent an illusion. It is dead easy to get an answer from a spreadsheet, however, it is not necessarily easy to get the right answer. Thus the distorted view.

The difficulty of using alternatives to spreadsheets is overestimated by many people. Safety features can give the appearance of difficulty when in fact these are an aid.

The hard way looks easy, the easy way looks hard.

The remainder of this page is divided into the sections:

Spreadsheet Computation
The Treatment Center (Alternatives)
If You Must Persist
Specific Problems with Excel
Additional Links

Tags: , ,

Using Open Source Technology in Higher Education

Um blog com muitos posts sobre a utilização do R

Um blogue com muitos posts sobre a utilização do R

Using R for Basic Cross Tabulation Analysis: Part Three, Using the xtabs Function

Using R to Work with GSS Survey Data: Cross Tabulation Tables

R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey

How to Set Up SSH to Remotely Control Your Raspberry Pi

Tags: , , , , ,

Income inequality seen in satellite images from Google Earth

Uso de proxis para identificar vizinhanças pobres

Uso de proxis para identificar vizinhanças pobres

Researchers Pengyu Zhua and Yaoqi Zhang noted in their 2008 paper that “the demand for urban forests is elastic with respect to price and highly responsive to changes in income.” Poor neighborhoods tend to have fewer trees and the rate of forestry growth is slower than that of richer neighborhoods.

Tim De Chant of Per Square Mile wondered if this difference could be seen through satellite images in Google Earth. It turns out that you can see the distinct difference in a lot of places. Above, for example, shows two areas in Rio de Janeiro: Rocinha on the left and Zona Sul on the right. Notice the tree-lined streets versus the not so green.

De Chant notes:

It’s easy to see trees as a luxury when a city can barely keep its roads and sewers in working order, but that glosses over the many benefits urban trees provide. They shade houses in the summer, reducing cooling bills. They scrub the air of pollution, especially of the particulate variety, which in many poor neighborhoods is responsible for increased asthma rates and other health problems. They also reduce stress, which has its own health benefits. Large, established trees can even fight crime.

Okay, I don’t now about that last part about fighting crime. Without seeing the data, I think that sounds like a correlation more than anything else, but still. Trees. Good.

Tags: , , ,

Site sobre visualização da GE.com

Site com muitos exemplos de visualização mantido pela GE

Site com muitos exemplos de visualização mantido pela GE

GE Works. Building, Moving, Powering and Curing the world. In the process, our technologies are generating data on a petabyte scale. This data contains valuable information that will drive insights, innovations, and discoveries, but it can be difficult to access and digest. Using data visualization, we’re pairing science and design to simplify the complexity and drive a deeper understanding of the context in which we operate.

Check out our latest video.

We encourage you to explore the projects below.

For further information about GE’s data visualization program, please contact us at datavizinfo@ge.com

To share your own visualizations, please visit www.visualizing.org

Tags: , , , ,

Better data centers through machine learning

Exemplo de aplicação de algoritmos de aprendizagem automática

Exemplo de aplicação de algoritmos de aprendizagem automática

It’s no secret that we’re obsessed with saving energy. For over a decade we’ve been designing and building data centers that use half the energy of a typical data center, and we’re always looking for ways to reduce our energy use even further. In our pursuit of extreme efficiency, we’ve hit upon a new tool: machine learning. Today we’re releasing a white paper (PDF) on how we’re using neural networks to optimize data center operations and drive our energy use to new lows.

Tags: , ,

Erros em gráficos na notícias

Três exemplos de erros em gráficos nos canais de notícias

Três exemplos de erros em gráficos nos canais de notícias

Fox News bar chart gets it wrong

Because Fox News. See also this, this, and this. [Thanks, Meron]

Tags: , , ,

Read Histograms and Use Them in R

Bom tutorial para construir histogramas no R

Bom tutorial para construir histogramas no R

Tutorials,

How to Read Histograms and Use Them in R

By Nathan Yau
The chart type often goes overlooked because people don’t understand them. Maybe this will help.

The histogram is one of my favorite chart types, and for analysis purposes, I probably use them the most. Devised by Karl Pearson (the father of mathematical statistics) in the late 1800s, it’s simple geometrically, robust, and allows you to see the distribution of a dataset.

If you don’t understand what’s driving the chart though, it can be confusing, which is probably why you don’t see it often in general publications.

Tags: , , , ,

portal smart datacollective.com

Um portal de notícias sobre ciencia dos dados, big data, analytics

Um portal de notícias sobre ciencia dos dados, big data, analytics

SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.

Tags: , , , , ,

17 short tutorials all data scientists should read

Excelentes textos fundamentais para cientistas dos dados

Excelentes textos fundamentais para cientistas dos dados

Here’s the list:

Related linkThe Data Science Toolkit

Tags: , , , , ,

Apache Spark

Uma alternativa ao Hadoop para computação com dados em memória

Uma alternativa ao Hadoop para computação com dados em memória

What is Apache Spark?

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.

To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.

What can it do?

Spark was initially developed for two applications where placing data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can run up to 100x faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our example jobs.

Spark is also the engine behind Shark, a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.

While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.

Tags: , , ,