income rise hints at recovery
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
Mais uma excelente representação gráfica interativa de um jornal on-line.
By Ted Mellnik and Lazaro Gamio, Published: Sept. 18, 2014
Although incomes are still lower than five years ago, most large metropolitan areas showed at least a tiny gain last year. The patterns suggest that while many regional economies may have turned a corner on the recession, incomes are making a slow advance toward 2009 levels. These charts show data for median household incomes released on Thursday by the Census Bureau in its American Community Survey. Related story.
Tags: belo
PlotDevice: Draw with Python
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, software, visualização
Uma biblioteca de funções em Pyton para construir visualizações de dados.
You’ve been able to visualize data with Python for a while, but Mac application PlotDevice from Christian Swinehart couples code and graphics more tightly. Write code on the right. Watch graphics change on the right.
The application gives you everything you need to start writing programs that draw to a virtual canvas. It features a text editor with syntax highlighting and tab completion plus a zoomable graphics viewer and a variety of export options.
PlotDevice’s simple but comprehensive set of graphics commands will be familiar to users of similar graphics tools like NodeBox or Processing. And if you’re new to programming, you’ll find there’s nothing better than being able to see the results of your code as you learn to think like a computer.
Looks promising. Although when I downloaded it and tried to run it, nothing happened. I’m guessing there’s still compatibility issues to iron out at version 0.9.4. Hopefully that clears up soon. [via Waxy]
Tags: big data, data mining, desnvolvimento de software, Estat Descritiva
How People in America Spend Their Day
Posted by Armando Brito Mendes | Filed under estatística, visualização
Um gráfico de áreas como forma de visualizar como os americanos ocupam o seu tempo ao longo do dia.
»
From Shan Carter, Amanda Cox, Kevin Quealy, and Amy Schoenfeld of The New York Times is this new interactive stacked time series on how different groups in America spend their day. The data itself comes from the American Time Use Survey. The interactive has a similar feel to Martin Wattenberg’s Baby Name Voyager, but it has the NYT pizazz that we’ve all come to know and love.
Explore time use by gender, race, age, education, and employment. View all activities (e.g. work, traveling) or select a specific action to drill down into the graph. From there, you’ll find time aggregates that you can compare against depending on what filter you’ve selected.
Tags: belo, big data, data mining, Estat Descritiva
Poverty and Race in America
Posted by Armando Brito Mendes | Filed under estatística, mapas SIG's, visualização
Strategies to tackle poverty, inequality, and neighborhood distress must be informed by local data. The history, geography, and politics of individual metro regions all matter profoundly, and any serious policy strategy must be tailored to local realities.
To help take the policy conversation from the general to the specific, we offer a new mapping tool. It lets you explore changes from 1980 to 2010 in where poor people of different races and ethnicities lived, for every metropolitan region nationwide.
Understanding how the geography of poverty has changed can provide essential context for answering questions like: Are some poor neighborhoods isolated from the region’s job opportunities? What would it take to connect them? Where should family support services be targeted? Which neighborhoods should be prioritized for improvements in essential amenities and opportunities? How can poor people across the metro landscape be better connected to the services and opportunities they seek?
For metro regions to systematically reduce poverty and expand opportunity, local civic and political leaders, advocates, and practitioners should start by sitting down together to understand the evolving realities of poverty, race, and place in their communities. We hope our maps help catalyze these conversations.
Tags: belo, data mining, image mining, mapas
A World of Terror
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
Exploring the reach, frequency and impact of terrorism around the world
The data used in this tool comes from the Global Terrorism Database, the most comprehensive collection of terrorism data available.
Tags: belo, Estat Descritiva
Using Open Source Technology in Higher Education
Posted by Armando Brito Mendes | Filed under estatística, software
Using R for Basic Cross Tabulation Analysis: Part Three, Using the xtabs Function
crosstabsrr programmingr statisticstable analysis
Using R to Work with GSS Survey Data: Cross Tabulation Tables
chi squaredcross tablescrosstabsrr programmingr statisticstable analysis
R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey
create csv filefile conversionrr programmingr statisticsr tutorialread spss filesresearch
How to Set Up SSH to Remotely Control Your Raspberry Pi
mmand lineraspberry piraspberry pi computingRaspberry Pi Software Configuationremote access with sshset up sshsshterminal program
Tags: análise de dados, data mining, desnvolvimento de software, Estat Descritiva, R-software, software estatístico
What are you going to do with that degree?
Posted by Armando Brito Mendes | Filed under estatística, visualização
Jobs by college major
This is a quick Sankey visualization of how college majors relate to professions, based on data from the American Community survey. On the left are the largest college majors; to the right are the most common professions.
To see broad fields like “Sciences” and “Humanities”, see the edited version of this page.
The width of each stream shows how many people with that major are in that field. (The color shows whether that’s more or fewer people than expected based on how big the major is: hover over to see just how many more it is.) The width of each stream shows how many people with that major are in that field. (The color shows whether that’s more or fewer people than expected based on how big the major is).
You surely see that the lines are too small to understand in most cases: to actually see what’s going on with a particular field or job, click on a box and the chart will filter down to just the people who either majored in the field, or ended up employed in the job. (Click on one of the connecting lines to see both at once.)
I have not developed this that far because I am not sure how useful it ultimately is: my basic goal was a quick way to see, for example, what jobs history majors ended up in. (Largest is lawyers, but also schoolteachers; what you would expect, but worth knowing.)
You might also like my visualization of changing college degrees over time.
Tags: belo, Estat Descritiva
Tutorial: How to detect spurious correlations
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino
Tutorial: How to detect spurious correlations, and how to find the real ones
Specifically designed in the context of big data in our research lab, the new and simple strong correlation synthetic metric proposed in this article should be used, whenever you want to check if there is a real association between two variables, especially in large-scale automated data science or machine learning projects. Use this new metric now, to avoid being accused of reckless data science and even being sued for wrongful analytic practice.
Tags: data mining, Estat Descritiva, inferência
Income inequality seen in satellite images from Google Earth
Posted by Armando Brito Mendes | Filed under estatística, visualização
Researchers Pengyu Zhua and Yaoqi Zhang noted in their 2008 paper that “the demand for urban forests is elastic with respect to price and highly responsive to changes in income.” Poor neighborhoods tend to have fewer trees and the rate of forestry growth is slower than that of richer neighborhoods.
Tim De Chant of Per Square Mile wondered if this difference could be seen through satellite images in Google Earth. It turns out that you can see the distinct difference in a lot of places. Above, for example, shows two areas in Rio de Janeiro: Rocinha on the left and Zona Sul on the right. Notice the tree-lined streets versus the not so green.
De Chant notes:
It’s easy to see trees as a luxury when a city can barely keep its roads and sewers in working order, but that glosses over the many benefits urban trees provide. They shade houses in the summer, reducing cooling bills. They scrub the air of pollution, especially of the particulate variety, which in many poor neighborhoods is responsible for increased asthma rates and other health problems. They also reduce stress, which has its own health benefits. Large, established trees can even fight crime.
Okay, I don’t now about that last part about fighting crime. Without seeing the data, I think that sounds like a correlation more than anything else, but still. Trees. Good.
Tags: análise de dados, data mining, image mining, mapas
A Programmer’s Guide to Data Mining
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
A guide to practical data mining, collective intelligence, and building recommendation systems by Ron Zacharski.
About This Book
Before you is a tool for learning basic data mining techniques. Most data mining textbooks focus on providing a theoretical foundation for data mining, and as result, may seem notoriously difficult to understand. Don’t get me wrong, the information in those books is extremely important. However, if you are a programmer interested in learning a bit about data mining you might be interested in a beginner’s hands-on guide as a first step. That’s what this book provides.
This guide follows a learn-by-doing approach. Instead of passively reading the book, I encourage you to work through the exercises and experiment with the Python code I provide. I hope you will be actively involved in trying out and programming data mining techniques. The textbook is laid out as a series of small steps that build on each other until, by the time you complete the book, you have laid the foundation for understanding data mining techniques. This book is available for download for free under a Creative Commons license (see link in footer). You are free to share the book, and remix it. Someday I may offer a paper copy, but the online version will always be free.
Table of Contents
This book’s contents are freely available as PDF files. When you click on a chapter title below, you will be taken to a webpage for that chapter. The page contains links for a PDF of that chapter and for any sample Python code and data that chapter requires. Please let me know if you see an error in the book, if some part of the book is confusing, or if you have some other comment. I will use these to revise the chapters.
Chapter 1: Introduction
Finding out what data mining is and what problems it solves. What will you be able to do when you finish this book.
Chapter 2: Get Started with Recommendation Systems
Introduction to social filtering. Basic distance measures including Manhattan distance, Euclidean distance, and Minkowski distance. Pearson Correlation Coefficient. Implementing a basic algorithm in Python.
Chapter 3: Implicit ratings and item-based filtering
A discussion of the types of user ratings we can use. Users can explicitly give ratings (thumbs up, thumbs down, 5 stars, or whatever) or they can rate products implicitly–if they buy an mp3 from Amazon, we can view that purchase as a ‘like’ rating.
Chapter 4: Classification
In previous chapters we used people’s ratings of products to make recommendations. Now we turn to using attributes of the products themselves to make recommendations. This approach is used by Pandora among others.
Chapter 5: Further Explorations in Classification
A discussion on how to evaluate classifiers including 10-fold cross-validation, leave-one-out, and the Kappa statistic. The k Nearest Neighbor algorithm is also introduced.
Chapter 6: Naïve Bayes
An exploration of Naïve Bayes classification methods. Dealing with numerical data using probability density functions.
Chapter 7: Naïve Bayes and unstructured text
This chapter explores how we can use Naïve Bayes to classify unstructured text. Can we classify twitter posts about a movie as to whether the post was a positive review or a negative one?
Chapter 8: Clustering
Clustering – both hierarchical and kmeans clustering.
Tags: data mining, previsão