An intro to R for new programmers
Posted by Armando Brito Mendes | Filed under estatística, software
Following the lead of JavaScript for Cats by Maxwell Ogden, Scott Chamberlain and Carson Sievert wrote R for Cats. It’s a playful introduction to R intended for those who have little to no programming experience.
The bulk of it so far is a primer on data structures, and there’s a little bit on functions and some dos and don’ts. It’s stuff you should know before you get into more advanced tutorials.
Mainly though: ooo look, kitty.
Once you’re done with that (It only takes about 30 minutes.), there are lots of other resources for getting started with R.
Tags: data mining, R-software, software estatístico
Create a barebones R package from scratch
Posted by Armando Brito Mendes | Filed under estatística, software
While we’re on an R kick, Hilary Parker described how to create an R package from scratch, not just to share code with others but to save yourself some time on future projects. It’s not as hard as it seems.
This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, “I really should just make an R package with these functions so I don’t have to keep copy/pasting them like a goddamn luddite.” Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)
I need to do this. I’ve been meaning to wrap everything up for a while now, but it seemed like such a chore. Sometimes I’d even go back to my own tutorials for some copy and paste action. Now I know better. And that’s half the battle.
Tags: data mining, R-software, software estatístico
Using R in Nonparametric Statistical Analysis
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software
- Using R in Nonparametric Statistical Analysis: The Kruskall-Wallace Test for One-Way Analysis of Variance
- Using R in Nonparametic Statistical Analysis: The Binomial Sign Test
Tags: desnvolvimento de software, R-software, software estatístico
Warm and cold weather anomalies
Posted by Armando Brito Mendes | Filed under visualização
This year’s polar vortex churned up some global warming skeptics, but as we know, it’s more useful to look at trends over significant spans of time than isolated events. And, when you do look at a trend, it’s useful to have a proper baseline to compare against.
To this end, Enigma.io compared warm weather anomalies against cold weather anomalies, from 1964 to 2013. That is, they counted the number of days per year that were warmer than expected and the days it was colder than expected.
An animated map leads the post, but the meat is in the time series. There’s a clear trend towards more warm.
Since 1964, the proportion of warm and strong warm anomalies has risen from about 42% of the total to almost 67% of the total – an average increase of 0.5% per year. This trend, fitted with a generalized linear model, accounts for 40% of the year-to-year variation in warm versus cold anomalies, and is highly significant with a p-value approaching 0.0. Though we remain cautious about making predictions based on this model, it suggests that this yearly proportion of warm anomalies will regularly fall above 70% in the 2030’s.
Explore in full or download the data and analyze yourself. Nice work. [Thanks, Dan]
High-detail maps with Disser
Posted by Armando Brito Mendes | Filed under mapas SIG's, software, visualização
Open data consultancy Conveyal released Disser, a command-line tool to disaggregate geographic data to show more details. For example, we’ve seen data represented with uniformly distributed dots to represent populations, which is fine for a zoomed out view. However, when you get in close, it can be useful to see distributions more accurately represented.
If the goal of disaggregation is to make a reasonable guess at the data in its pre-aggregated form, we’ve done an okay job. There’s an obvious flaw with this map, though. People aren’t evenly distributed over a block — they’re concentrated into residential buildings.
So Disser combines datasets of different granularity, so that you can see spreads and concentrations that are closer to real life.
Tags: belo, image mining, mapas
Why use R? Five reasons
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software
Why use R? Five reasons.
In this post I will go through 5 reasons: zero cost, crazy popularity, awesome power, dazzling flexibility, and mind-blowing support. I believe R is the best statistical programming language to learn. As a blogger who has contributed over 150 posts in Stata and over 100 in R I have extensive experience with both a proprietary statistical programming language as well as the open source alternative. In my graduate career I have also had the opportunity to experiment with the proprietary software SPSS, SAS, Mathematica, as well as MPlus.
Tags: big data, definição, R-software, software estatístico
9 “must read” articles on Big Data
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
My selection
- Big Data – From Descriptive to Prescriptive
- Can big data be racist?
- NodeXL Graph Gallery: Graph Details
- Best Metrics For Digital Marketing: Rock Your Own And Rent Strategies
- Big Data: from mining to meaning
- Beautiful versus useful visualizations (in French, but interesting)
- Learning and Teaching Machine Learning: A Personal Journey
- Big data techniques and technologies
- The Sexiest Job of the 21st Century is Tedious, and that Needs to C… (*)
- From the trenches: 360-degree data science
(*) I disagree with this Harvard Business Review author. Senior data scientists work on high level data from various sources, use automated processes for EDA (exploratory analysis) and spend little to no time in tedious, routine, mundane tasks (less than 5% of my time, in my case). I also use robust techniques that work well on relatively dirty data, and … I create and design the data myself in many cases.
Tags: belo, big data, data mining, Estat Descritiva, grafos
Erros em gráficos na notícias
Posted by Armando Brito Mendes | Filed under estatística, visualização
Fox News bar chart gets it wrong
Because Fox News. See also this, this, and this. [Thanks, Meron]
Tags: análise de dados, belo, data mining, Estat Descritiva
Exponential water tank
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, visualização
Hibai Unzueta, based on a paper by Albert Bartlett, demonstrates exponential growth with a simple animation. It depicts a man standing in a tank with finite capacity and water rising slowly, but at an exponential rate.
Our brains are wired to predict future behaviour based on past behaviour (see here). But what happens when something growths exponentially? For a long time, the numbers are so little in relation to the scale that we hardly see the changes. But even at moderate growth rates exponential functions reach a point where the numbers grow too fast. Once we confirm that our predictions about the future have failed, very little time to react may be left.
All looks safe at first, because the water rises so slowly, but it seems to rise all of a sudden. Oh, the suspense. What will happen to cartoon pixel man?
Tags: definição, inferência
Introduction to social network methods
Posted by Armando Brito Mendes | Filed under ARS - SNA, materiais ensino
Robert A. Hanneman and Mark Riddle
Introduction to social network methods
Table of contents
About this book
This on-line textbook introduces many of the basics of formal approaches to the analysis of social networks. The text relies heavily on the work of Freeman, Borgatti, and Everett (the authors of the UCINET software package). The materials here, and their organization, were also very strongly influenced by the text of Wasserman and Faust, and by a graduate seminar conducted by Professor Phillip Bonacich at UCLA. Many other users have also made very helpful comments and suggestions based on the first version. Errors and omissions, of course, are the responsibility of the authors.
You are invited to use and redistribute this text freely — but please acknowledge the source.
Hanneman, Robert A. and Mark Riddle. 2005. Introduction to social network methods. Riverside, CA: University of California, Riverside ( published in digital form athttp://faculty.ucr.edu/~hanneman/ )
Table of contents:
Preface
1. Social network data
2. Why formal methods?
3. Using graphs to represent social relations
4. Working with Netdraw to visualize graphs
5. Using matrices to represent social relations
6. Working with network data
7. Connection
8. Embedding
9. Ego networks
10. Centrality and power
11. Cliques and sub-groups
12. Positions and roles: The idea of equivalence
13. Measures of similarity and structural equivalence
14. Automorphic equivalence
15. Regular equivalence
16. Multiplex networks
17. Two-mode networks
18. Some statistical tools
After word
Bibliography
Tags: ARS\SNA applicações, ARS\SNA intro, grafos