income rise hints at recovery

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Mais uma excelente representação gráfica interativa de um jornal on-line.

By Ted Mellnik and Lazaro Gamio, Published: Sept. 18, 2014

Although incomes are still lower than five years ago, most large metropolitan areas showed at least a tiny gain last year. The patterns suggest that while many regional economies may have turned a corner on the recession, incomes are making a slow advance toward 2009 levels. These charts show data for median household incomes released on Thursday by the Census Bureau in its American Community Survey. Related story.

Tags:

PlotDevice: Draw with Python

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Uma biblioteca de funções em Pyton para construir visualizações de dados.

You’ve been able to visualize data with Python for a while, but Mac application PlotDevice from Christian Swinehart couples code and graphics more tightly. Write code on the right. Watch graphics change on the right.

The application gives you everything you need to start writing programs that draw to a virtual canvas. It features a text editor with syntax highlighting and tab completion plus a zoomable graphics viewer and a variety of export options.

PlotDevice’s simple but com­pre­hen­sive set of graphics commands will be familiar to users of similar graphics tools like NodeBox or Processing. And if you’re new to programming, you’ll find there’s nothing better than being able to see the results of your code as you learn to think like a computer.

Looks promising. Although when I downloaded it and tried to run it, nothing happened. I’m guessing there’s still compatibility issues to iron out at version 0.9.4. Hopefully that clears up soon. [via Waxy]

Tags: , , ,

How People in America Spend Their Day

clicar na imagem para seguir o link

clicar na imagem para seguir o link

Um gráfico de áreas como forma de visualizar como os americanos ocupam o seu tempo ao longo do dia.

»

From Shan Carter, Amanda Cox, Kevin Quealy, and Amy Schoenfeld of The New York Times is this new interactive stacked time series on how different groups in America spend their day. The data itself comes from the American Time Use Survey. The interactive has a similar feel to Martin Wattenberg’s Baby Name Voyager, but it has the NYT pizazz that we’ve all come to know and love.

Explore time use by gender, race, age, education, and employment. View all activities (e.g. work, traveling) or select a specific action to drill down into the graph. From there, you’ll find time aggregates that you can compare against depending on what filter you’ve selected.

Tags: , , ,

Poverty and Race in America

Uma boa representação gráfica interactiva

Uma boa representação gráfica interactiva

Strategies to tackle poverty, inequality, and neighborhood distress must be informed by local data. The history, geography, and politics of individual metro regions all matter profoundly, and any serious policy strategy must be tailored to local realities.
To help take the policy conversation from the general to the specific, we offer a new mapping tool. It lets you explore changes from 1980 to 2010 in where poor people of different races and ethnicities lived, for every metropolitan region nationwide.
Understanding how the geography of poverty has changed can provide essential context for answering questions like: Are some poor neighborhoods isolated from the region’s job opportunities? What would it take to connect them? Where should family support services be targeted? Which neighborhoods should be prioritized for improvements in essential amenities and opportunities? How can poor people across the metro landscape be better connected to the services and opportunities they seek?
For metro regions to systematically reduce poverty and expand opportunity, local civic and political leaders, advocates, and practitioners should start by sitting down together to understand the evolving realities of poverty, race, and place in their communities. We hope our maps help catalyze these conversations.

Tags: , , ,

A World of Terror

Uma excelente visualização de dados interativa

Uma excelente visualização de dados interativa

Exploring the reach, frequency and impact of terrorism around the world

The data used in this tool comes from the Global Terrorism Database, the most comprehensive collection of terrorism data available.

GeographyThe 25 groups included here have been active in 73 countries on five continents. Of these, the country targeted by the most groups has been France: Al-Qa`ida, Basque Fatherland and Freedom, Hizballah, The IRA, and the Kurdistan Workers’ Party. The group with the greatest geographic spread is Hizballah, responsible for terrorism in 17 countries.
YearsOn average, these top 25 groups have been active almost 19 years (during this time frame), while all other groups have been active just over 2 years.
WoundedThe 25 groups listed here are responsible for 48% of all known wounded victims, with ISIS being responsible for the most wounded (10,585). However, Al-Qa`ida is more effective, wounding 230 per event on average.
KilledOf the total verified fatalities, over half (83,896, or 56%) are attributed to the 25 groups listed here. The greatest number of deaths by these groups, 6,857, happened in 2013.

Tags: ,

Using Open Source Technology in Higher Education

Um blog com muitos posts sobre a utilização do R

Um blogue com muitos posts sobre a utilização do R

Using R for Basic Cross Tabulation Analysis: Part Three, Using the xtabs Function

Using R to Work with GSS Survey Data: Cross Tabulation Tables

R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey

How to Set Up SSH to Remotely Control Your Raspberry Pi

Tags: , , , , ,

What are you going to do with that degree?

Boa visualização sobre o q fazem os licenciados com os seus títulos.

Boa visualização sobre o q fazem os licenciados com os seus títulos.

Jobs by college major

This is a quick Sankey visualization of how college majors relate to professions, based on data from the American Community survey. On the left are the largest college majors; to the right are the most common professions.

To see broad fields like “Sciences” and “Humanities”, see the edited version of this page.

The width of each stream shows how many people with that major are in that field. (The color shows whether that’s more or fewer people than expected based on how big the major is: hover over to see just how many more it is.) The width of each stream shows how many people with that major are in that field. (The color shows whether that’s more or fewer people than expected based on how big the major is).

You surely see that the lines are too small to understand in most cases: to actually see what’s going on with a particular field or job, click on a box and the chart will filter down to just the people who either majored in the field, or ended up employed in the job. (Click on one of the connecting lines to see both at once.)

I have not developed this that far because I am not sure how useful it ultimately is: my basic goal was a quick way to see, for example, what jobs history majors ended up in. (Largest is lawyers, but also schoolteachers; what you would expect, but worth knowing.)

You might also like my visualization of changing college degrees over time.

Tags: ,

Tutorial: How to detect spurious correlations

Uso de métodos robustos para identiicar correlações espúrias

Uso de métodos robustos para identiicar correlações espúrias

Tutorial: How to detect spurious correlations, and how to find the real ones

Specifically designed in the context of big data in our research lab, the new and simple strong correlation synthetic metric proposed in this article should be used, whenever you want to check if there is a real association between two variables, especially in large-scale automated data science or machine learning projects. Use this new metric now, to avoid being accused of reckless data science and even being sued for wrongful analytic practice.

Tags: , ,

Income inequality seen in satellite images from Google Earth

Uso de proxis para identificar vizinhanças pobres

Uso de proxis para identificar vizinhanças pobres

Researchers Pengyu Zhua and Yaoqi Zhang noted in their 2008 paper that “the demand for urban forests is elastic with respect to price and highly responsive to changes in income.” Poor neighborhoods tend to have fewer trees and the rate of forestry growth is slower than that of richer neighborhoods.

Tim De Chant of Per Square Mile wondered if this difference could be seen through satellite images in Google Earth. It turns out that you can see the distinct difference in a lot of places. Above, for example, shows two areas in Rio de Janeiro: Rocinha on the left and Zona Sul on the right. Notice the tree-lined streets versus the not so green.

De Chant notes:

It’s easy to see trees as a luxury when a city can barely keep its roads and sewers in working order, but that glosses over the many benefits urban trees provide. They shade houses in the summer, reducing cooling bills. They scrub the air of pollution, especially of the particulate variety, which in many poor neighborhoods is responsible for increased asthma rates and other health problems. They also reduce stress, which has its own health benefits. Large, established trees can even fight crime.

Okay, I don’t now about that last part about fighting crime. Without seeing the data, I think that sounds like a correlation more than anything else, but still. Trees. Good.

Tags: , , ,

A Programmer’s Guide to Data Mining

Um livro on-line com alguns dos métodos de data mining

Um livro on-line com alguns dos métodos de data mining

A guide to practical data mining, collective intelligence, and building recommendation systems by Ron Zacharski.

About This Book

Before you is a tool for learning basic data mining techniques. Most data mining textbooks focus on providing a theoretical foundation for data mining, and as result, may seem notoriously difficult to understand. Don’t get me wrong, the information in those books is extremely important. However, if you are a programmer interested in learning a bit about data mining you might be interested in a beginner’s hands-on guide as a first step. That’s what this book provides.
This guide follows a learn-by-doing approach. Instead of passively reading the book, I encourage you to work through the exercises and experiment with the Python code I provide. I hope you will be actively involved in trying out and programming data mining techniques. The textbook is laid out as a series of small steps that build on each other until, by the time you complete the book, you have laid the foundation for understanding data mining techniques. This book is available for download for free under a Creative Commons license (see link in footer). You are free to share the book, and remix it. Someday I may offer a paper copy, but the online version will always be free.

Table of Contents

This book’s contents are freely available as PDF files. When you click on a chapter title below, you will be taken to a webpage for that chapter. The page contains links for a PDF of that chapter and for any sample Python code and data that chapter requires. Please let me know if you see an error in the book, if some part of the book is confusing, or if you have some other comment. I will use these to revise the chapters.

Chapter 1: Introduction

Finding out what data mining is and what problems it solves. What will you be able to do when you finish this book.

Chapter 2: Get Started with Recommendation Systems

Introduction to social filtering. Basic distance measures including Manhattan distance, Euclidean distance, and Minkowski distance. Pearson Correlation Coefficient. Implementing a basic algorithm in Python.

Chapter 3: Implicit ratings and item-based filtering

A discussion of the types of user ratings we can use. Users can explicitly give ratings (thumbs up, thumbs down, 5 stars, or whatever) or they can rate products implicitly–if they buy an mp3 from Amazon, we can view that purchase as a ‘like’ rating.

Chapter 4: Classification

In  previous chapters we used  people’s ratings of products to make recommendations. Now we turn to using attributes of the products themselves to make recommendations. This approach is used by Pandora among others.

Chapter 5: Further Explorations in Classification

A discussion on how to evaluate classifiers including 10-fold cross-validation, leave-one-out, and the Kappa statistic. The k Nearest Neighbor algorithm is also introduced.

Chapter 6: Naïve Bayes

An exploration of Naïve Bayes classification methods. Dealing with numerical data using probability density functions.

Chapter 7: Naïve Bayes and unstructured text

This chapter explores how we can use Naïve Bayes to classify unstructured text. Can we classify twitter posts about a movie as to whether the post was a positive review or a negative one?

Chapter 8: Clustering

Clustering – both hierarchical and kmeans clustering.

Tags: ,