khanacademy: Distribuição Normal

Boas aulas de video em pt sobre a distribuição normal padrão

Boas aulas de video em pt sobre a distribuição normal padrão

Vídeo original: ck12.org Exercise: Standard Normal Distribution and the Empirical Rule(http://www.khanacademy.org/video/ck12-org-exercise–standard-normal-distribution-and-the-empirical-rule) A Khan Academy Portugal disponibiliza explicações online de Matemática gratuitas desde o 1º até ao 12º ano de escolaridade. Este vídeo foi produzido pela Khan Academy e traduzido para português pela Fundação Portugal Telecom (ver todos os vídeos disponíveis em http://fundacao.telecom.pt/kha

Tags:

khanacademy – aulas de probabildade

Boas aulas de video em pt sobre probabilidade

Boas aulas de video em pt sobre probabilidade

Vídeo original: Compound Probability of Independent Events (https://www.khanacademy.org/math/trigonometry/prob_comb/independent_events_precalc/v/compound-probability-of-independent-events) A Khan Academy Portugal disponibiliza explicações online de Matemática gratuitas desde o 1º até ao 12º ano de escolaridade. Este vídeo foi produzido pela Khan Academy e traduzido para português pela Fundação Portugal Telecom (ver todos os vídeos disponíveis em http://fundacao.telecom.pt/khanac

Tags:

Resources for Getting Started with R

Recursos para começar a usar o R

Recursos para começar a usar o R

Resources for Getting Started with R

June 4, 2012  |  Software

R, the open source statistical software environment, is powerful but can be a challenge to approach for beginners. For me, the best way to learn R, especially on the visualization side of things, is to dive right in. Grab some data and make some charts, or better yet, find a graph you like and try to replicate it.

R core functionality and the many available packages let you do a lot without having to know what’s going on underneath. I use this approach in Visualize This and the tutorials around here. I like the satisfaction of immediate results. Then I learn the nitty gritty later.

That said, it doesn’t hurt to familiarize yourself with the environment. Also, visualization is a small part of what you can do with R, so it can help to know what else you can do analysis-wise.

Tags: , ,

An intro to R for new programmers

Introdução às estruturas de dados no R

Introdução às estruturas de dados no R

Following the lead of JavaScript for Cats by Maxwell Ogden, Scott Chamberlain and Carson Sievert wrote R for Cats. It’s a playful introduction to R intended for those who have little to no programming experience.

The bulk of it so far is a primer on data structures, and there’s a little bit on functions and some dos and don’ts. It’s stuff you should know before you get into more advanced tutorials.

Mainly though: ooo look, kitty.

Once you’re done with that (It only takes about 30 minutes.), there are lots of other resources for getting started with R.

Tags: , ,

Create a barebones R package from scratch

Criar pacotes para o R é muito fácil

Criar pacotes para o R é muito fácil

While we’re on an R kick, Hilary Parker described how to create an R package from scratch, not just to share code with others but to save yourself some time on future projects. It’s not as hard as it seems.

This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, “I really should just make an R package with these functions so I don’t have to keep copy/pasting them like a goddamn luddite.” Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)

I need to do this. I’ve been meaning to wrap everything up for a while now, but it seemed like such a chore. Sometimes I’d even go back to my own tutorials for some copy and paste action. Now I know better. And that’s half the battle.

Tags: , ,

Using R in Nonparametric Statistical Analysis

Blog com vários tutoriais para usar estatísticas não paramétricas simples em R

Blog com vários tutoriais para usar estatísticas não paramétricas simples em R

Tags: , ,

Warm and cold weather anomalies

Mais um exemplo de boas visualizações, agora com dados de clima

Mais um exemplo de boas visualizações, agora com dados de clima

This year’s polar vortex churned up some global warming skeptics, but as we know, it’s more useful to look at trends over significant spans of time than isolated events. And, when you do look at a trend, it’s useful to have a proper baseline to compare against.

To this end, Enigma.io compared warm weather anomalies against cold weather anomalies, from 1964 to 2013. That is, they counted the number of days per year that were warmer than expected and the days it was colder than expected.

An animated map leads the post, but the meat is in the time series. There’s a clear trend towards more warm.

Since 1964, the proportion of warm and strong warm anomalies has risen from about 42% of the total to almost 67% of the total – an average increase of 0.5% per year. This trend, fitted with a generalized linear model, accounts for 40% of the year-to-year variation in warm versus cold anomalies, and is highly significant with a p-value approaching 0.0. Though we remain cautious about making predictions based on this model, it suggests that this yearly proportion of warm anomalies will regularly fall above 70% in the 2030’s.

Explore in full or download the data and analyze yourself. Nice work. [Thanks, Dan]

Tags: ,

High-detail maps with Disser

Software open source para trabalhar com mapas

Software open source para trabalhar com mapas

Open data consultancy Conveyal released Disser, a command-line tool to disaggregate geographic data to show more details. For example, we’ve seen data represented with uniformly distributed dots to represent populations, which is fine for a zoomed out view. However, when you get in close, it can be useful to see distributions more accurately represented.

If the goal of disaggregation is to make a reasonable guess at the data in its pre-aggregated form, we’ve done an okay job. There’s an obvious flaw with this map, though. People aren’t evenly distributed over a block — they’re concentrated into residential buildings.

So Disser combines datasets of different granularity, so that you can see spreads and concentrations that are closer to real life.

Tags: , ,

Why use R? Five reasons

Bom blogue, as principais razões para usar R

Bom blogue, as principais razões para usar R

Why use R? Five reasons.

In this post I will go through 5 reasons: zero cost, crazy popularity, awesome power, dazzling flexibility, and mind-blowing support. I believe R is the best statistical programming language to learn. As a blogger who has contributed over 150 posts in Stata and over 100 in R I have extensive experience with both a proprietary statistical programming language as well as the open source alternative.  In my graduate career I have also had the opportunity to experiment with the proprietary software SPSS, SAS, Mathematica, as well as MPlus.

Tags: , , ,

9 “must read” articles on Big Data

Textos para big data

Textos para big data

My selection

(*) I disagree with this Harvard Business Review author. Senior data scientists work on high level data from various sources, use automated processes for EDA (exploratory analysis) and spend little to no time in tedious, routine, mundane tasks (less than 5% of my time, in my case). I also use robust techniques that work well on relatively dirty data, and … I create and design the data myself in many cases.

Tags: , , , ,