khanacademy – aulas de probabildade
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, videos
Vídeo original: Compound Probability of Independent Events (https://www.khanacademy.org/math/trigonometry/prob_comb/independent_events_precalc/v/compound-probability-of-independent-events) A Khan Academy Portugal disponibiliza explicações online de Matemática gratuitas desde o 1º até ao 12º ano de escolaridade. Este vídeo foi produzido pela Khan Academy e traduzido para português pela Fundação Portugal Telecom (ver todos os vídeos disponíveis em http://fundacao.telecom.pt/khanac
Tags: inferência
Resources for Getting Started with R
Posted by Armando Brito Mendes | Filed under estatística, software
Resources for Getting Started with R
R, the open source statistical software environment, is powerful but can be a challenge to approach for beginners. For me, the best way to learn R, especially on the visualization side of things, is to dive right in. Grab some data and make some charts, or better yet, find a graph you like and try to replicate it.
R core functionality and the many available packages let you do a lot without having to know what’s going on underneath. I use this approach in Visualize This and the tutorials around here. I like the satisfaction of immediate results. Then I learn the nitty gritty later.
That said, it doesn’t hurt to familiarize yourself with the environment. Also, visualization is a small part of what you can do with R, so it can help to know what else you can do analysis-wise.
Tags: data mining, R-software, software estatístico
An intro to R for new programmers
Posted by Armando Brito Mendes | Filed under estatística, software
Following the lead of JavaScript for Cats by Maxwell Ogden, Scott Chamberlain and Carson Sievert wrote R for Cats. It’s a playful introduction to R intended for those who have little to no programming experience.
The bulk of it so far is a primer on data structures, and there’s a little bit on functions and some dos and don’ts. It’s stuff you should know before you get into more advanced tutorials.
Mainly though: ooo look, kitty.
Once you’re done with that (It only takes about 30 minutes.), there are lots of other resources for getting started with R.
Tags: data mining, R-software, software estatístico
Create a barebones R package from scratch
Posted by Armando Brito Mendes | Filed under estatística, software
While we’re on an R kick, Hilary Parker described how to create an R package from scratch, not just to share code with others but to save yourself some time on future projects. It’s not as hard as it seems.
This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, “I really should just make an R package with these functions so I don’t have to keep copy/pasting them like a goddamn luddite.” Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)
I need to do this. I’ve been meaning to wrap everything up for a while now, but it seemed like such a chore. Sometimes I’d even go back to my own tutorials for some copy and paste action. Now I know better. And that’s half the battle.
Tags: data mining, R-software, software estatístico
Using R in Nonparametric Statistical Analysis
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software
- Using R in Nonparametric Statistical Analysis: The Kruskall-Wallace Test for One-Way Analysis of Variance
- Using R in Nonparametic Statistical Analysis: The Binomial Sign Test
Tags: desnvolvimento de software, R-software, software estatístico
9 “must read” articles on Big Data
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
My selection
- Big Data – From Descriptive to Prescriptive
- Can big data be racist?
- NodeXL Graph Gallery: Graph Details
- Best Metrics For Digital Marketing: Rock Your Own And Rent Strategies
- Big Data: from mining to meaning
- Beautiful versus useful visualizations (in French, but interesting)
- Learning and Teaching Machine Learning: A Personal Journey
- Big data techniques and technologies
- The Sexiest Job of the 21st Century is Tedious, and that Needs to C… (*)
- From the trenches: 360-degree data science
(*) I disagree with this Harvard Business Review author. Senior data scientists work on high level data from various sources, use automated processes for EDA (exploratory analysis) and spend little to no time in tedious, routine, mundane tasks (less than 5% of my time, in my case). I also use robust techniques that work well on relatively dirty data, and … I create and design the data myself in many cases.
Tags: belo, big data, data mining, Estat Descritiva, grafos
Erros em gráficos na notícias
Posted by Armando Brito Mendes | Filed under estatística, visualização
Fox News bar chart gets it wrong
Because Fox News. See also this, this, and this. [Thanks, Meron]
Tags: análise de dados, belo, data mining, Estat Descritiva
Exponential water tank
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, visualização
Hibai Unzueta, based on a paper by Albert Bartlett, demonstrates exponential growth with a simple animation. It depicts a man standing in a tank with finite capacity and water rising slowly, but at an exponential rate.
Our brains are wired to predict future behaviour based on past behaviour (see here). But what happens when something growths exponentially? For a long time, the numbers are so little in relation to the scale that we hardly see the changes. But even at moderate growth rates exponential functions reach a point where the numbers grow too fast. Once we confirm that our predictions about the future have failed, very little time to react may be left.
All looks safe at first, because the water rises so slowly, but it seems to rise all of a sudden. Oh, the suspense. What will happen to cartoon pixel man?
Tags: definição, inferência
SPSS Internet Resources
Posted by Armando Brito Mendes | Filed under estatística, software
The SPSS Inc website
SPSS are now owned by IBM. The following links lead to the appropriate IBM pages now.
http://www-01.ibm.com/software/analytics/spss/ The home page of the SPSS Inc. website
http://www-01.ibm.com/software/uk/analytics/spss/ SPSS Inc. UK page
(If at some future time SPSS Inc change the structure of their website, you may find that only the first of the above links still works.)
The ASSESS-NEWS list
http://www.jiscmail.ac.uk/lists/assess-news.html Information about it, and an archive of past messages.
Other useful links
news:comp.soft-sys.stat.spss The SPSS newsgroup (this carries fairly heavy traffic).
Tags: IBM SPSS Statistics, software estatístico
SPSS Macros on the Internet
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, software
What sources of SPSS macros are available on the Internet?
Here are a few that I know about; I hope other people will tell us about ones that should be listed but aren’t.
An obvious starting point is SPSS Inc’s own Macro Library at http://www.spss.com/tech/stat/macros/ (it doesn’t contain very many, though, and they are statistical rather than utilities). If you are planning to adapt or write macros, it’s also worth seeing what’s in SPSS Inc’s AnswerNet Solutions. Go to http://www.spss.com/tech/answer/, specify Product; SPSS Base and Free Text: macro, then click on the page’s Search button.Raynald Levesque’s site http://pages.infinit.net/rlevesqu/ includes many pages on macros (including examples and some tutorial materials). But you should also look at the examples in his pages on syntax, as some of these are based on macros.
Newsgroups are also a useful source of macros. Searches of their archives can be very rewarding if you can get your search terms right (see our Other Internet Resources page).
Confidence intervals for proportions, differences between proportions and related quantities. See Dr Robert G. Newcombe’s home page at http://www.uwcm.ac.uk/uwcm/ms/Robert.html. Note that these are SPSS programs rather than macros, despite being described as macros by the author.
Polytomous logistic regression (of particular interest to users of SPSS 8.0 and earlier). For macros by John Hendrickx and Prof. Dr. Steffen Kühnel see http://www.sls.wau.nl/bk/bedrijfskunde/jhendrickx/spss/mlogist/
Regression: evaluating collinearity in models with interactions or non-linear terms. For a macro by Ben Pelzer, Manfred te Grotenhuis, Jan Lammers, John Hendrickx, see http://www.sls.wau.nl/bk/bedrijfskunde/jhendrickx/spss/perturb/perturb.html
Tags: Estat Descritiva, IBM SPSS Statistics, inferência, software estatístico