Warm and cold weather anomalies
Posted by Armando Brito Mendes | Filed under visualização
This year’s polar vortex churned up some global warming skeptics, but as we know, it’s more useful to look at trends over significant spans of time than isolated events. And, when you do look at a trend, it’s useful to have a proper baseline to compare against.
To this end, Enigma.io compared warm weather anomalies against cold weather anomalies, from 1964 to 2013. That is, they counted the number of days per year that were warmer than expected and the days it was colder than expected.
An animated map leads the post, but the meat is in the time series. There’s a clear trend towards more warm.
Since 1964, the proportion of warm and strong warm anomalies has risen from about 42% of the total to almost 67% of the total – an average increase of 0.5% per year. This trend, fitted with a generalized linear model, accounts for 40% of the year-to-year variation in warm versus cold anomalies, and is highly significant with a p-value approaching 0.0. Though we remain cautious about making predictions based on this model, it suggests that this yearly proportion of warm anomalies will regularly fall above 70% in the 2030’s.
Explore in full or download the data and analyze yourself. Nice work. [Thanks, Dan]
High-detail maps with Disser
Posted by Armando Brito Mendes | Filed under mapas SIG's, software, visualização
Open data consultancy Conveyal released Disser, a command-line tool to disaggregate geographic data to show more details. For example, we’ve seen data represented with uniformly distributed dots to represent populations, which is fine for a zoomed out view. However, when you get in close, it can be useful to see distributions more accurately represented.
If the goal of disaggregation is to make a reasonable guess at the data in its pre-aggregated form, we’ve done an okay job. There’s an obvious flaw with this map, though. People aren’t evenly distributed over a block — they’re concentrated into residential buildings.
So Disser combines datasets of different granularity, so that you can see spreads and concentrations that are closer to real life.
Tags: belo, image mining, mapas
9 “must read” articles on Big Data
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
My selection
- Big Data – From Descriptive to Prescriptive
- Can big data be racist?
- NodeXL Graph Gallery: Graph Details
- Best Metrics For Digital Marketing: Rock Your Own And Rent Strategies
- Big Data: from mining to meaning
- Beautiful versus useful visualizations (in French, but interesting)
- Learning and Teaching Machine Learning: A Personal Journey
- Big data techniques and technologies
- The Sexiest Job of the 21st Century is Tedious, and that Needs to C… (*)
- From the trenches: 360-degree data science
(*) I disagree with this Harvard Business Review author. Senior data scientists work on high level data from various sources, use automated processes for EDA (exploratory analysis) and spend little to no time in tedious, routine, mundane tasks (less than 5% of my time, in my case). I also use robust techniques that work well on relatively dirty data, and … I create and design the data myself in many cases.
Tags: belo, big data, data mining, Estat Descritiva, grafos
Erros em gráficos na notícias
Posted by Armando Brito Mendes | Filed under estatística, visualização
Fox News bar chart gets it wrong
Because Fox News. See also this, this, and this. [Thanks, Meron]
Tags: análise de dados, belo, data mining, Estat Descritiva
Exponential water tank
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, visualização
Hibai Unzueta, based on a paper by Albert Bartlett, demonstrates exponential growth with a simple animation. It depicts a man standing in a tank with finite capacity and water rising slowly, but at an exponential rate.
Our brains are wired to predict future behaviour based on past behaviour (see here). But what happens when something growths exponentially? For a long time, the numbers are so little in relation to the scale that we hardly see the changes. But even at moderate growth rates exponential functions reach a point where the numbers grow too fast. Once we confirm that our predictions about the future have failed, very little time to react may be left.
All looks safe at first, because the water rises so slowly, but it seems to rise all of a sudden. Oh, the suspense. What will happen to cartoon pixel man?
Tags: definição, inferência
What’s your kind of beer?
Posted by Armando Brito Mendes | Filed under estatística, visualização
What’s your kind of beer?
Choose your preferred beer strength to begin exploring similar beers.
Explore Similar Beers by:
- Overall
- Aroma
- Taste
- Appearance
About the Data
Popularity and top beer styles are based on the number of users who rated the beer.
Tags: belo
Read Histograms and Use Them in R
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
How to Read Histograms and Use Them in R
The histogram is one of my favorite chart types, and for analysis purposes, I probably use them the most. Devised by Karl Pearson (the father of mathematical statistics) in the late 1800s, it’s simple geometrically, robust, and allows you to see the distribution of a dataset.
If you don’t understand what’s driving the chart though, it can be confusing, which is probably why you don’t see it often in general publications.
Tags: análise de dados, data mining, Estat Descritiva, R-software, software estatístico
Useful Videos on Information Visualization
Posted by Armando Brito Mendes | Filed under estatística, videos, visualização
Noah Iliinsky – Data Visualizations Done Wrong – A Beautiful Collection of Stories and Tips for Success.
The Four Pillars of Data Visualization
Designing Data Visualizations with Noah Iliinsky
Best Practices for Data Visualization
Designing Data Visualizatins
Seeing the Story in the Data and Learning to Effectively Communicate – Inspired by Stephen Few Principles, Visualization Guru
David McCandless: “The beauty of data visualization” – Data Detective Telling Stories From Visualization of Information
This also has a nice quiz about visualization principles.
As I collect more, I will consolidate this list.
Tags: belo, big data, data mining, image mining
selfiecity
Posted by Armando Brito Mendes | Filed under videos, visualização
Investigating the style of self-portraits (selfies) in five cities across the world.
Selfiecity investigates selfies using a mix of theoretic, artistic and quantitative methods:
- We present our findings about the demographics of people taking selfies, their poses and expressions.
- Rich media visualizations (imageplots) assemble thousands of photos to reveal interesting patterns.
- The interactive selfiexploratory allows you to navigate the whole set of 3200 photos.
- Finally, theoretical essays discuss selfies in the history of photography, the functions of images in social media, and methods and dataset.
Selfiecity, from Lev Manovich, Moritz Stefaner, and a small group of analysts and researchers, is a detailed visual exploration of 3,200 selfies from five major cities around the world. The project is both a broad look at demographics and trends, as well as a chance to look closer at the individual observations.
There are several components to the project, but Imageplots (which you might recognize from a couple years ago) and the exploratory section, aptly named Selfiexploratory, will be of most interest.
The two parts let you filter through cities (Bangkok, Berlin, Moscow, New York, and Sao Paulo), age, gender, pose, mood, and a number of other factors, and this information is presented in a grid layout that self-updates as you browse.
So you can get a rough sense of how facets relate. There seems to be a higher proportion of female selfies and average age seems to skew towards younger as you’d expect. The average age of females in this selfie sample seems to be younger than that of males.
However, before you jump to too many conclusions about how countries vary or differences between the sexes, etc, consider the classification process, which was a combination of manual labor via Mechanical Turk and face recognition software. Age, for example, can be though to estimate from pictures alone since you have outside factors like makeup, angles, and poses. Do these things account for the two- to three-year average difference between the sexes? Maybe. So consider the data. But that should go without saying.
That said, Selfiecity is a fun one I spent a good amount of time browsing. It’s a weird, tiny peek into 3,200 people’s lives, with a dose of quant and art. And don’t miss the theoretical component in essay format, a reflection of social media, communities, and the self.
Tags: belo, data mining, image mining
Data Intelligence and Analytics Resources
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software, videos, visualização
3. Big Data
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predictive Power
4. Visualization
- Detecting Patterns with the Naked Eye
- 50+ Open Source Tools for Big Data
- 40 maps that explain the world
5. Best and Worst of Data Science
- 175 Analytic and Data Science Web Sites
- 6000 Companies Hiring Data Scientists
- 100 data science, analytics, big data, visualization books
6. New Analytics Start-up Ideas
- Uniquely identify a human being with two questions
- Selling data
- A new type of weapons-grade secure email
- R in your Browser
7. Rants about Healthcare, Education, etc.
- Why statistical community is disconnected from Big Data and how to fix it
- How to eliminate a trillion dollars in healthcare costs
- Job interview question: what is wrong with this picture?
8. Career Stuff, Training, Salary Surveys
- 17 short tutorials all data scientists should read (and practice)
- Why Companies can’t find analytic talent
- Six categories of data scientists …
9. Miscellaneous
- One Page R: A Survival Guide to Data Science with R
- Boosting Algorithms for Better Predictions
- Structuredness coefficient to find patterns and associations
10. DSC Webinar Series – with video access
- Predictive Analytics with Revolution Analytics and Hortonworks, The…
- BI For Big Data
- The Value of a Modern Data Architecture with Apache Hadoop and Tera…
- Accelerating Big Data
Tags: big data, captura de conhecimento, data mining, R-software