9 “must read” articles on Big Data

Textos para big data

Textos para big data

My selection

(*) I disagree with this Harvard Business Review author. Senior data scientists work on high level data from various sources, use automated processes for EDA (exploratory analysis) and spend little to no time in tedious, routine, mundane tasks (less than 5% of my time, in my case). I also use robust techniques that work well on relatively dirty data, and … I create and design the data myself in many cases.

Tags: , , , ,

Erros em gráficos na notícias

Três exemplos de erros em gráficos nos canais de notícias

Três exemplos de erros em gráficos nos canais de notícias

Fox News bar chart gets it wrong

Because Fox News. See also this, this, and this. [Thanks, Meron]

Tags: , , ,

Exponential water tank

Uma excelente forma de perceber a distribuição exponencial

Uma excelente forma de perceber a distribuição exponencial

Hibai Unzueta, based on a paper by Albert Bartlett, demonstrates exponential growth with a simple animation. It depicts a man standing in a tank with finite capacity and water rising slowly, but at an exponential rate.

Our brains are wired to predict future behaviour based on past behaviour (see here). But what happens when something growths exponentially? For a long time, the numbers are so little in relation to the scale that we hardly see the changes. But even at moderate growth rates exponential functions reach a point where the numbers grow too fast. Once we confirm that our predictions about the future have failed, very little time to react may be left.

All looks safe at first, because the water rises so slowly, but it seems to rise all of a sudden. Oh, the suspense. What will happen to cartoon pixel man?

Tags: ,

What’s your kind of beer?

Um bom exemplo de um site cheio de visualizações tipo dashboard

Um bom exemplo de um site cheio de visualizações tipo dashboard

What’s your kind of beer?

Choose your preferred beer strength to begin exploring similar beers.

Explore Similar Beers by:

  • Overall
  • Aroma
  • Taste
  • Appearance

About the Data

Popularity and top beer styles are based on the number of users who rated the beer.

Tags:

Read Histograms and Use Them in R

Bom tutorial para construir histogramas no R

Bom tutorial para construir histogramas no R

Tutorials,

How to Read Histograms and Use Them in R

By Nathan Yau
The chart type often goes overlooked because people don’t understand them. Maybe this will help.

The histogram is one of my favorite chart types, and for analysis purposes, I probably use them the most. Devised by Karl Pearson (the father of mathematical statistics) in the late 1800s, it’s simple geometrically, robust, and allows you to see the distribution of a dataset.

If you don’t understand what’s driving the chart though, it can be confusing, which is probably why you don’t see it often in general publications.

Tags: , , , ,

Useful Videos on Information Visualization

Bons videos sobre visualização de dados

Bons videos sobre visualização de dados

Noah Iliinsky – Data Visualizations Done Wrong – A Beautiful Collection of Stories and Tips for Success.

The Four Pillars of Data Visualization

Designing Data Visualizations with Noah Iliinsky

Best Practices for Data Visualization

Designing Data Visualizatins

Seeing the Story in the Data and Learning to Effectively Communicate – Inspired by Stephen Few Principles, Visualization Guru

David McCandless: “The beauty of data visualization” – Data Detective Telling Stories From Visualization of Information

This also has a nice quiz about visualization principles.

As I collect more, I will consolidate this list.

Tags: , , ,

selfiecity

Um estudo sobre este tipo de fotos com muito boas visualizações

Um estudo sobre este tipo de fotos com muito boas visualizações

Investigating the style of self-portraits (selfies) in five cities across the world.


Selfiecity investigates selfies using a mix of theoretic, artistic and quantitative methods:

  • We present our findings about the demographics of people taking selfies, their poses and expressions.
  • Rich media visualizations (imageplots) assemble thousands of photos to reveal interesting patterns.
  • The interactive selfiexploratory allows you to navigate the whole set of 3200 photos.
  • Finally, theoretical essays discuss selfies in the history of photography, the functions of images in social media, and methods and dataset.

Selfiecity, from Lev Manovich, Moritz Stefaner, and a small group of analysts and researchers, is a detailed visual exploration of 3,200 selfies from five major cities around the world. The project is both a broad look at demographics and trends, as well as a chance to look closer at the individual observations.

There are several components to the project, but Imageplots (which you might recognize from a couple years ago) and the exploratory section, aptly named Selfiexploratory, will be of most interest.

The two parts let you filter through cities (Bangkok, Berlin, Moscow, New York, and Sao Paulo), age, gender, pose, mood, and a number of other factors, and this information is presented in a grid layout that self-updates as you browse.

So you can get a rough sense of how facets relate. There seems to be a higher proportion of female selfies and average age seems to skew towards younger as you’d expect. The average age of females in this selfie sample seems to be younger than that of males.

However, before you jump to too many conclusions about how countries vary or differences between the sexes, etc, consider the classification process, which was a combination of manual labor via Mechanical Turk and face recognition software. Age, for example, can be though to estimate from pictures alone since you have outside factors like makeup, angles, and poses. Do these things account for the two- to three-year average difference between the sexes? Maybe. So consider the data. But that should go without saying.

That said, Selfiecity is a fun one I spent a good amount of time browsing. It’s a weird, tiny peek into 3,200 people’s lives, with a dose of quant and art. And don’t miss the theoretical component in essay format, a reflection of social media, communities, and the self.

Tags: , ,

Data Intelligence and Analytics Resources

Excelentes textos sobre ciencia dos dados e big data

Excelentes textos sobre ciencia dos dados e big data

3. Big Data

4. Visualization

5. Best and Worst of Data Science

6. New Analytics Start-up Ideas

7. Rants about Healthcare, Education, etc.

8. Career Stuff, Training, Salary Surveys

9. Miscellaneous

10. DSC Webinar Series – with video access

Tags: , , ,

Interactive maps with R

Bibliotecas para construir mapas com alguma interação no R

Bibliotecas para construir mapas com alguma interação no R

You can make static maps in R relatively well, if you know what packages to use and what to look for, but there isn’t much direct interaction with your graphics. rMaps is a package that helps you create maps that you can mouse over and zoom in to.

Don’t get too excited though. A scan of the docs shows that it’s basically a wrapper around JavaScript libraries Leaflet, DataMaps and Crosslet, so you could learn those directly instead, and you’d be better for it in the long run if you plan to make more maps. But if you’re just working on a one-off or must stay in R because your life depends on, rMaps might be an option.

Tags: , , ,

The Dangers of Bling Data Visualizations

Excelente de descrição de erros em visualização

Excelente de descrição de erros em visualização

The Dangers of Bling Data Visualizations

Print
Reprints
Email
in Share3

Given the volume of information that’s pouring into the enterprise from so many disparate sources, knowledge workers need to be able to visualize information in order to analyze it and extrapolate insights effectively.

When business users can visualize information, they’re able to process it more effectively and make faster and better decisions, according to Aberdeen research. Business users are constantly seeking the best ways to understand the data behind the data. If a monthly sales figure is low, what are the reasons the sales team is underperforming? The most effective way to help business users understand the data behind the data is by making it visual for them.

Data visualization has recently made its way into the mainstream by the way of infographics, business intelligence dashboards and, in some cases, statistical graphics. However, today data visualization comes in many forms and more often than not there might be too much “bling” incorporated into these data representations, leaving an audience with nothing more than a pretty picture. In this article, we contrast some good and bad examples of visualizations via examination of the salient features of the graphical displays. We will also demonstrate how poorly designed visualizations can lead to erroneous decisions.

Tags: , ,