Full Of Themselves
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Um relatório de tratamento de dados muito bem explicado
An analysis of title drops in movies
by Dominikus Baur + Alice Thudt
A title drop is when a character in a movie says the title of the movie they’re in. Here’s a large-scale analysis of 73,921 movies from the last 80 years on how often, when and maybe even why that happens.
Tags: análise de dados, filmes, IMDb, visualização
Who We Spend Time with as We Get Older
Posted by Armando Brito Mendes | Filed under Data Science, estatística, visualização
Um gráfico de barras horizontais animando com variações ao longo do tempo
By Nathan Yau
In high school, we spend most of our days with friends and immediate family. Then we get older and get jobs, get married, and grow our own families to spend more time with co-workers, spouses, and kids. Here’s how things change, based on a decade of data from the American Time Use Survey, from age 15 to 80.
Tags: análise de dados, animação, dinâmico, gráfico de barras
The World Chess Championship In 5 Charts
Posted by Armando Brito Mendes | Filed under Data Science, estatística, relatórios, visualização
Uma descrição de um campeonato de xadrez com gráficos de diferença, histogramas, mapas de calor e gráficos de radar.
How Magnus Carlsen cemented his GOAT status over 11 games.
By Simran Parwani and Oliver Roeder
Published Dec. 14, 2021
This article is part of our 2021 World Chess Championship series.
The 2021 World Chess Championship ended last week with Magnus Carlen of Norway, the world No. 1, defending his title against challenger Ian Nepomniachtchi of Russia. It was Carlsen’s fifth victory in the world championship, a title he has held since 2013, and the match went a long way toward cementing his status as the greatest chess player of all time.
The contest featured some of the best chess ever played by humans, nearly flawless even when examined by modern, superhuman machines. It also featured a few inexplicable blunders, and just three bad moves saw Nepomniachtchi’s chances slip quickly and irretrievably away. The match also generated a lot of data! We’ve charted some of it below.
Tags: análise de dados, gráico de diferenças, histogramas
The Most Frequently Used Emoji of 2021
Posted by Armando Brito Mendes | Filed under estatística, relatórios, visualização
Um relatório sobre a utilização de emojis com gráficos de pontos e high-low.
By Jennifer Daniel, Unicode Emoji Subcommittee Chair
The size of each emoji illustrates its relative popularity. Can you guess this year’s number one ranked emoji? 😉
92% of the world’s online population use emoji — but which emoji are we using? Well, it appears that reports of Tears of Joy’s death are greatly exaggerated 😂. According to data collected by the Unicode Consortium, the not-for-profit organization responsible for digitizing the world’s languages, Tears of Joy accounts for over 5% of all emoji use (the only other character that comes close is ❤️ and there is a steeeeeep cliff after that). The top ten emoji used worldwide are 😂 ❤️ 🤣 👍 😭 🙏 😘 🥰 😍 😊.
This collection of mostly positive vibes may seem familiar — it is not terribly different from the last time this data was published in 2019. As infinitely creative and diverse as the world is, the top 100 emoji comprise ~82% of total emoji shares. And yet …. There are 3,663 emoji. So, why does the Unicode Emoji Subcommittee keep reviewing proposals and adding new ones? 😵💫
This existential question haunts the subcommittee 👻. So, they set out to understand popularity on a more granular level: What are the most frequently used emoji? What do they have in common? Do we have too much of one type but not enough of another? How do we interpret the 83-spot leap (from 97 to 14!) in the use of Pleading Face 🥺? Check out the changes using the interactive tools of the #UnicodeEmojiMirror Project and share your observations!
Tags: análise de dados, emojis, Estat Descritiva
What People Spend Most of Their Money On, By Income Group, Relatively Speaking
Posted by Armando Brito Mendes | Filed under Data Science, relatórios, visualização
um relatório com muitos gráficos de linhas
By Nathan Yau
The more money people come across, the more things they can and tend to buy. More money on average means bigger houses, more expensive cars, and fancier restaurants. But what if you look at relative spending instead of total dollars?
For example, if a lower income group uses 9 percent of their total spending to pay a mortgage, does the higher income group also pay 9 percent? Or does additional income go to other spending categories?
It varies.
The charts below show how different income groups spend their money, based on data from the Bureau of Labor Statistics for 2020. Each chart represents a spending category. Each column represents an income group.
Tags: análise de dados, Estat Descritiva, rendimentos
Explained Visually
Posted by Armando Brito Mendes | Filed under Data Science, estatística, matemática, materiais ensino, visualização
Boas explicações visuais iterativas de conceitos de ML e matemática
Ordinary Least Squares Regression
EV 9 – 2015/02/12
Principal Component Analysis
Axis of easy.
EV 8 – 2015/01/29
Image Kernels
EV 6 – 2015/01/20
Eigenvectors and Eigenvalues
EV 5 – 2014/11/28
Pi (π)
EV 4 – 2014/11/21
Sine and Cosine
EV 3 – 2014/11/14
Exponentiation
EV 2 – 2014/11/07
Markov Chains
Mark on, Markov EV 1 – 2014/10/30 Conditional probability You probably wouldn’t understand.
Tags: análise de dados, ensino
How the Longest Running Shows Rated Over Episodes
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Um bom gráfico de barras com muita informação
By Nathan Yau
Most television shows don’t get past the first season, but there are some that manage to stick around. These are the 175 longest running shows on IMDb that have ratings.
Episodes are colored by average rating. Some shows are consistently good, some shows people seem to love to hate, and then there are shows that are good at some point but eventually drop off.
Tags: análise de dados, belo, Estat Descritiva, gráfico de barras
Age and Occupation
Posted by Armando Brito Mendes | Filed under Data Science, estatística, visualização
Um bom gráfico interativo de intervalos de confiança de idades, um para cada emprego
By Nathan Yau
Whether it’s because of experience, physical ability, or education level, some jobs tend towards a certain age of worker more than others. For example, fast food counter workers tend to be younger, whereas school bus drivers tend to be older.
These are the age ranges for 529 jobs. Search for your job or look at others.
Tags: análise de dados, belo, empregos, Estat Descritiva, idades
How Men and Women Spend Their Days
Posted by Armando Brito Mendes | Filed under Data Science, estatística, relatórios, visualização
Um bom exemplo de gráficos de linhas acumuladas ou gráfico de diferenças
By Nathan Yau
For the employed, unemployed, and those not in the labor force, the charts below show the percentage of people doing an activity over a day in 2020. Switch between a weekday or a weekend day. Select activities to see individually.
Tags: análise de dados, belo, Estat Descritiva, homens e mulheres, ocupação
Biased vs Unbiased: Debunking Statistical Myths
Posted by Armando Brito Mendes | Filed under estatística
Uma reflexão sobre os enviesamentos que usamos na ciência de dados.
Anyone who attended statistical training at the college level has been taught the four rules that you should always abide by, when developing statistical models and predictions:
- You should only use unbiased estimates
- You should use estimates that have minimum variance
- In any optimization problem (for instance to compute an estimate from a maximum likelihood function, or to detect the best, most predictive subset of variables), you should always shoot for a global optimum, not a local one.
- And if you violate any of the above three rules, at least you need to make sure that your estimate, when the number of observations is large, satisfies them.
As a data scientist and ex-statistician, I violate these rules (especially #1 – #3) almost daily. Indeed, that’s part of what makes data science different from statistical science.
Tags: análise de dados, data mining