Read Histograms and Use Them in R
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
How to Read Histograms and Use Them in R
The histogram is one of my favorite chart types, and for analysis purposes, I probably use them the most. Devised by Karl Pearson (the father of mathematical statistics) in the late 1800s, it’s simple geometrically, robust, and allows you to see the distribution of a dataset.
If you don’t understand what’s driving the chart though, it can be confusing, which is probably why you don’t see it often in general publications.
Tags: análise de dados, data mining, Estat Descritiva, R-software, software estatístico
The Dangers of Bling Data Visualizations
Posted by Armando Brito Mendes | Filed under estatística, visualização
The Dangers of Bling Data Visualizations
Given the volume of information that’s pouring into the enterprise from so many disparate sources, knowledge workers need to be able to visualize information in order to analyze it and extrapolate insights effectively.
When business users can visualize information, they’re able to process it more effectively and make faster and better decisions, according to Aberdeen research. Business users are constantly seeking the best ways to understand the data behind the data. If a monthly sales figure is low, what are the reasons the sales team is underperforming? The most effective way to help business users understand the data behind the data is by making it visual for them.
Data visualization has recently made its way into the mainstream by the way of infographics, business intelligence dashboards and, in some cases, statistical graphics. However, today data visualization comes in many forms and more often than not there might be too much “bling” incorporated into these data representations, leaving an audience with nothing more than a pretty picture. In this article, we contrast some good and bad examples of visualizations via examination of the salient features of the graphical displays. We will also demonstrate how poorly designed visualizations can lead to erroneous decisions.
Tags: big data, data mining, Estat Descritiva
introducing R to a non-programmer in one hour
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software
Biostatistics PhD candidate Alyssa Frazee was tasked with teaching her sister, an undergraduate in sociology, how to use R. She had only one hour.
Once you load in a dataset, things start to get fun. We learned a whole bunch of stuff from this data frame, like how to do basic tabulations and calculate summary statistics, how to figure out if you have missing data, and how to fit a simple linear model. This part was pretty fun because my sister started leading the session: instead of me saying “I’m going to show you how to do this,” it was her asking “Hey, could we make a scatterplot?” or “Do you think we could put the best-fit line on that plot?” I was really glad this happened — I hope it meant she was engaged and enjoying herself!
This is the nice thing about R. There are so many built-in functions and packages that you can get something useful with a few lines of code, and you don’t really even have to know what a function is to get started (although you should eventually). Then you can go as far down the rabbit hole as you want.
Tags: análise de dados, bioinformatica, Estat Descritiva, R-software, software estatístico
The Age of Data
Posted by Armando Brito Mendes | Filed under estatística, videos
Whiteboards
The Age of Data
Actian Big Data Analytics Platform
Actian DataCloud Platform
Big Data Analytics
Creating Value from Big Data and Hadoop
A New World for Analytics
The Need for an Analytic Platform
Seamless Integration
Analytic Offload
Creating Business Value with Analytics
Tags: big data, data mining, DW \ BI, Estat Descritiva
Probability and Monte Carlo methods
Posted by Armando Brito Mendes | Filed under estatística, Habilitações Académicas, matemática, materiais ensino
This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. I also emphasize using programming to help gain insight into mathematics. Consequently these lectures will not always be as rigorous as they could be.
Tags
monte carlo, numerical integration, probability, simulation
Tags: Estat Descritiva, R-software, software estatístico
Resultados do census de 2011
Posted by Armando Brito Mendes | Filed under data sets
A maior fonte de informação nacional sobre a população, a família e a habitação.
Aceda aos Resultados Definitivos:
Tags: Estat Descritiva, inquéritos
Paddy – design a multi-stage survey
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software
This game is a rice survey based on an actual survey carried out in Sri Lanka. In a small district there are 10 villages with a total of 160 farmers who each have one field in which to grow rice. A census of the area has been undertaken and the acreage cultivated by each farmer is known. There is now to be a crop cuttin survey whose main aim is to estimate the mean yield of rice per acre and hence the total production of rice in the district. The survey will also be used to investigate the use of fertilisers and the different varieties of rice used in the district.
The resources available allow for 30 plots to be sampled. The plots to be harvested are 1/80 acre but the yields are recorded in bushels per acre. Students use a multistage sampling scheme. For example:
- Select x villages
- From each village choose y fields
- Select z plots from each field
The game consists of 10 boxes each containing a number of envelopes, which themselves contain a number of slips of paper. The boxes represent a village so students select the boxes corresponding to their chosen villages. They open the boxes and select the envelopes labelled with their chosen field number. Information on the size of the field, the variety of rice used and the amount of fertiliser applied is also displayed on the envelope label. Finally, they select the slip of paper labelled with their chosen plot number and record the yield.
Tags: Estat Descritiva, inquéritos, software estatístico
To the Woods – a detailed comparison of Sampling methods
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software
To the Woods – a detailed comparison of Simple Random Sampling and Stratified Sampling
In this game the aim is to conduct a small survey to estimate the total number of trees in a forest and the proportion of large trees. A tree is considered ‘large’ if its diameter at breast height (DBH) is greater than 30cm. The area of forest from which the sample is to be taken is divided into two regions (‘East’ and ‘West’) by a river. Within each region it is possible to count the number of trees in any 50m x 50m plot. There are 168 plots in total – 96 to the West of the river and 72 to the East.
There are two alternative sampling solutions. Students take a sample of 14 plots and can either use simple random sampling or stratified sampling to choose them. They record the number of small trees, the number of large trees and the total number of trees for each of the 14 observations.
The game consists of 168 small pieces of card, which represent the plots, slipped into slits in a large piece of card representing the forest. A river can be drawn on the large piece of card to divide the forest into two regions. One side is labelled ‘West’ and the other ‘East’. The protruding sections of the plots are labelled with their region side (West or East) and plot number (1 to 96 and 1 to 72, respectively). The student pulls out the chosen plots and records the numbers of large and small trees, which is printed on the lower section of the plot.
Tags: Estat Descritiva, software estatístico
Tomato – jogo para aprender plano experimental
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino
Tomato – a game to help understand the issues involved in experimental design
Tomato simulates an experiment to test the effect of different factors on the yield of tomatoes grown in a greenhouse. Students simulate the conduct of an experiment starting from the discussion of the appropriate design up to the conclusions. There are three factors (variety, heat, light), each at two levels (Coward/Doger, Standard/Supplementary, Standard/Supplementary). Students have to allocate the eight treatments to the 12 plots in the greenhouse. They are asked to take account of the different sides (North/South) of the greenhouse when allocating the treatments, which introduces a blocking factor. A second blocking factor, year, has also been built into the model; the experiment can be run over two years, resulting in two seasons of the crop. The players can decide which treatments to apply in the first year and use the results to determine which treatments to apply in the second year. Alternatively, they may choose to design the scheme for both years at the start. This means that the game incorporates blocking and the possibility of using unbalanced designs. It also introduces the factorial structure of the treatments.
Tags: Estat Descritiva, software estatístico
visualizing.org
Posted by Armando Brito Mendes | Filed under materiais para profissionais, visualização
A community of creative people
making sense of complex issues
through data and design — join us
Visualizations Explore the best in data visualization and infographics created by our community
Visualizations Upload, host, and showcase your work under CC license
Open Data Find and discuss new data sets from NGO’s, governments and other sources, curated by Visualizing
Data Channels Engage with the scientists behind the data sets on Visualizing and explore related visualizations uploaded by our community
Visualizing Player Take advantage of the first-ever player for data visualization and infographics. Embed away!
Challenges Sharpen your skills and win unique prizes by entering our data visualization challenges
Visualizing Marathons A one-of-a-kind global series of 24-hour student data viz competitions
Partners Visualizing collaborates with a wide range of Academic, Knowledge, and Media Partners
Tags: belo, Estat Descritiva