An intro to R for new programmers

Introdução às estruturas de dados no R

Introdução às estruturas de dados no R

Following the lead of JavaScript for Cats by Maxwell Ogden, Scott Chamberlain and Carson Sievert wrote R for Cats. It’s a playful introduction to R intended for those who have little to no programming experience.

The bulk of it so far is a primer on data structures, and there’s a little bit on functions and some dos and don’ts. It’s stuff you should know before you get into more advanced tutorials.

Mainly though: ooo look, kitty.

Once you’re done with that (It only takes about 30 minutes.), there are lots of other resources for getting started with R.

Tags: , ,

Create a barebones R package from scratch

Criar pacotes para o R é muito fácil

Criar pacotes para o R é muito fácil

While we’re on an R kick, Hilary Parker described how to create an R package from scratch, not just to share code with others but to save yourself some time on future projects. It’s not as hard as it seems.

This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, “I really should just make an R package with these functions so I don’t have to keep copy/pasting them like a goddamn luddite.” Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)

I need to do this. I’ve been meaning to wrap everything up for a while now, but it seemed like such a chore. Sometimes I’d even go back to my own tutorials for some copy and paste action. Now I know better. And that’s half the battle.

Tags: , ,

Using R in Nonparametric Statistical Analysis

Blog com vários tutoriais para usar estatísticas não paramétricas simples em R

Blog com vários tutoriais para usar estatísticas não paramétricas simples em R

Tags: , ,

Warm and cold weather anomalies

Mais um exemplo de boas visualizações, agora com dados de clima

Mais um exemplo de boas visualizações, agora com dados de clima

This year’s polar vortex churned up some global warming skeptics, but as we know, it’s more useful to look at trends over significant spans of time than isolated events. And, when you do look at a trend, it’s useful to have a proper baseline to compare against.

To this end, Enigma.io compared warm weather anomalies against cold weather anomalies, from 1964 to 2013. That is, they counted the number of days per year that were warmer than expected and the days it was colder than expected.

An animated map leads the post, but the meat is in the time series. There’s a clear trend towards more warm.

Since 1964, the proportion of warm and strong warm anomalies has risen from about 42% of the total to almost 67% of the total – an average increase of 0.5% per year. This trend, fitted with a generalized linear model, accounts for 40% of the year-to-year variation in warm versus cold anomalies, and is highly significant with a p-value approaching 0.0. Though we remain cautious about making predictions based on this model, it suggests that this yearly proportion of warm anomalies will regularly fall above 70% in the 2030’s.

Explore in full or download the data and analyze yourself. Nice work. [Thanks, Dan]

Tags: ,

High-detail maps with Disser

Software open source para trabalhar com mapas

Software open source para trabalhar com mapas

Open data consultancy Conveyal released Disser, a command-line tool to disaggregate geographic data to show more details. For example, we’ve seen data represented with uniformly distributed dots to represent populations, which is fine for a zoomed out view. However, when you get in close, it can be useful to see distributions more accurately represented.

If the goal of disaggregation is to make a reasonable guess at the data in its pre-aggregated form, we’ve done an okay job. There’s an obvious flaw with this map, though. People aren’t evenly distributed over a block — they’re concentrated into residential buildings.

So Disser combines datasets of different granularity, so that you can see spreads and concentrations that are closer to real life.

Tags: , ,

Why use R? Five reasons

Bom blogue, as principais razões para usar R

Bom blogue, as principais razões para usar R

Why use R? Five reasons.

In this post I will go through 5 reasons: zero cost, crazy popularity, awesome power, dazzling flexibility, and mind-blowing support. I believe R is the best statistical programming language to learn. As a blogger who has contributed over 150 posts in Stata and over 100 in R I have extensive experience with both a proprietary statistical programming language as well as the open source alternative.  In my graduate career I have also had the opportunity to experiment with the proprietary software SPSS, SAS, Mathematica, as well as MPlus.

Tags: , , ,

9 “must read” articles on Big Data

Textos para big data

Textos para big data

My selection

(*) I disagree with this Harvard Business Review author. Senior data scientists work on high level data from various sources, use automated processes for EDA (exploratory analysis) and spend little to no time in tedious, routine, mundane tasks (less than 5% of my time, in my case). I also use robust techniques that work well on relatively dirty data, and … I create and design the data myself in many cases.

Tags: , , , ,

Erros em gráficos na notícias

Três exemplos de erros em gráficos nos canais de notícias

Três exemplos de erros em gráficos nos canais de notícias

Fox News bar chart gets it wrong

Because Fox News. See also this, this, and this. [Thanks, Meron]

Tags: , , ,

Exponential water tank

Uma excelente forma de perceber a distribuição exponencial

Uma excelente forma de perceber a distribuição exponencial

Hibai Unzueta, based on a paper by Albert Bartlett, demonstrates exponential growth with a simple animation. It depicts a man standing in a tank with finite capacity and water rising slowly, but at an exponential rate.

Our brains are wired to predict future behaviour based on past behaviour (see here). But what happens when something growths exponentially? For a long time, the numbers are so little in relation to the scale that we hardly see the changes. But even at moderate growth rates exponential functions reach a point where the numbers grow too fast. Once we confirm that our predictions about the future have failed, very little time to react may be left.

All looks safe at first, because the water rises so slowly, but it seems to rise all of a sudden. Oh, the suspense. What will happen to cartoon pixel man?

Tags: ,

Introduction to social network methods

Um bom livro online sobre SNA

Um bom livro online sobre SNA

Robert A. Hanneman and Mark Riddle

Introduction to social network methods

Table of contents


About this book

This on-line textbook introduces many of the basics of formal approaches to the analysis of social networks.  The text relies heavily on the work of Freeman, Borgatti, and Everett (the authors of the UCINET software package). The materials here, and their organization, were also very strongly influenced by the text of Wasserman and Faust, and by a graduate seminar conducted by Professor Phillip Bonacich at UCLA.  Many other users have also made very helpful comments and suggestions based on the first version.   Errors and omissions, of course, are the responsibility of the authors.

You are invited to use and redistribute this text freely — but please acknowledge the source.

Hanneman, Robert A. and Mark Riddle.  2005.  Introduction to social network methods. Riverside, CA:  University of California, Riverside ( published in digital form athttp://faculty.ucr.edu/~hanneman/ )


Table of contents:

Preface
1.    Social network data
2.    Why formal methods?
3.    Using graphs to represent social relations
4.    Working with Netdraw to visualize graphs
5.    Using matrices to represent social relations
6.    Working with network data
7.    Connection
8.    Embedding
9.    Ego networks
10.  Centrality and power
11.  Cliques and sub-groups
12.  Positions and roles: The idea of equivalence
13.  Measures of similarity and structural equivalence
14.  Automorphic equivalence
15.  Regular equivalence
16.  Multiplex networks
17. Two-mode networks
18.  Some statistical tools
After word

Bibliography

Tags: , ,