Full Of Themselves

Um relatório de tratamento de dados muito bem explicado

An analysis of title drops in movies

by Dominikus Baur + Alice Thudt

A title drop is when a character in a movie says the title of the movie they’re in. Here’s a large-scale analysis of 73,921 movies from the last 80 years on how often, when and maybe even why that happens.

Tags: , , ,

Who We Spend Time with as We Get Older

Um gráfico de barras horizontais animando com variações ao longo do tempo

By Nathan Yau

In high school, we spend most of our days with friends and immediate family. Then we get older and get jobs, get married, and grow our own families to spend more time with co-workers, spouses, and kids. Here’s how things change, based on a decade of data from the American Time Use Survey, from age 15 to 80.

Tags: , , ,

The World Chess Championship In 5 Charts

Uma descrição de um campeonato de xadrez com gráficos de diferença, histogramas, mapas de calor e gráficos de radar.

How Magnus Carlsen cemented his GOAT status over 11 games.

By Simran Parwani and Oliver Roeder

Published Dec. 14, 2021

This article is part of our 2021 World Chess Championship series.

The 2021 World Chess Championship ended last week with Magnus Carlen of Norway, the world No. 1, defending his title against challenger Ian Nepomniachtchi of Russia. It was Carlsen’s fifth victory in the world championship, a title he has held since 2013, and the match went a long way toward cementing his status as the greatest chess player of all time.

The contest featured some of the best chess ever played by humans, nearly flawless even when examined by modern, superhuman machines. It also featured a few inexplicable blunders, and just three bad moves saw Nepomniachtchi’s chances slip quickly and irretrievably away. The match also generated a lot of data! We’ve charted some of it below.

Tags: , ,

The Most Frequently Used Emoji of 2021

Um relatório sobre a utilização de emojis com gráficos de pontos e high-low.

By Jennifer Daniel, Unicode Emoji Subcommittee Chair

The size of each emoji illustrates its relative popularity. Can you guess this year’s number one ranked emoji? 😉

92% of the world’s online population use emoji — but which emoji are we using? Well, it appears that reports of Tears of Joy’s death are greatly exaggerated 😂. According to data collected by the Unicode Consortium, the not-for-profit organization responsible for digitizing the world’s languages, Tears of Joy accounts for over 5% of all emoji use (the only other character that comes close is ❤️ and there is a steeeeeep cliff after that). The top ten emoji used worldwide are 😂 ❤️ 🤣 👍 😭 🙏 😘 🥰 😍 😊.

This collection of mostly positive vibes may seem familiar — it is not terribly different from the last time this data was published in 2019. As infinitely creative and diverse as the world is, the top 100 emoji comprise ~82% of total emoji shares. And yet …. There are 3,663 emoji. So, why does the Unicode Emoji Subcommittee keep reviewing proposals and adding new ones? 😵‍💫

This existential question haunts the subcommittee 👻. So, they set out to understand popularity on a more granular level: What are the most frequently used emoji? What do they have in common? Do we have too much of one type but not enough of another? How do we interpret the 83-spot leap (from 97 to 14!) in the use of Pleading Face 🥺? Check out the changes using the interactive tools of the #UnicodeEmojiMirror Project and share your observations!

Tags: , ,

What People Spend Most of Their Money On, By Income Group, Relatively Speaking

um relatório com muitos gráficos de linhas

By Nathan Yau

The more money people come across, the more things they can and tend to buy. More money on average means bigger houses, more expensive cars, and fancier restaurants. But what if you look at relative spending instead of total dollars?

For example, if a lower income group uses 9 percent of their total spending to pay a mortgage, does the higher income group also pay 9 percent? Or does additional income go to other spending categories?

It varies.

The charts below show how different income groups spend their money, based on data from the Bureau of Labor Statistics for 2020. Each chart represents a spending category. Each column represents an income group.

Tags: , ,

Explained Visually

Boas explicações visuais iterativas de conceitos de ML e matemática

Ordinary Least Squares Regression

Where do betas come from?

EV 9 – 2015/02/12

Principal Component Analysis

Axis of easy.

EV 8 – 2015/01/29

Image Kernels

The kernel’s secret recipe.

EV 6 – 2015/01/20

Eigenvectors and Eigenvalues

No, no. Do it eigen!

EV 5 – 2014/11/28

Pi (π)

Pi me to the moon.

EV 4 – 2014/11/21

Sine and Cosine

Sine on the line.

EV 3 – 2014/11/14


Growing, growing, gone. AB

EV 2 – 2014/11/07

Markov Chains

Mark on, Markov EV 1 – 2014/10/30 Conditional probability You probably wouldn’t understand.

Tags: ,

How the Longest Running Shows Rated Over Episodes

Um bom gráfico de barras com muita informação

By Nathan Yau

Most television shows don’t get past the first season, but there are some that manage to stick around. These are the 175 longest running shows on IMDb that have ratings.

Episodes are colored by average rating. Some shows are consistently good, some shows people seem to love to hate, and then there are shows that are good at some point but eventually drop off.

Tags: , , ,

Age and Occupation

Um bom gráfico interativo de intervalos de confiança de idades, um para cada emprego

By Nathan Yau

Whether it’s because of experience, physical ability, or education level, some jobs tend towards a certain age of worker more than others. For example, fast food counter workers tend to be younger, whereas school bus drivers tend to be older.

These are the age ranges for 529 jobs. Search for your job or look at others.

Tags: , , , ,

How Men and Women Spend Their Days

Um bom exemplo de gráficos de linhas acumuladas ou gráfico de diferenças

By Nathan Yau

For the employed, unemployed, and those not in the labor force, the charts below show the percentage of people doing an activity over a day in 2020. Switch between a weekday or a weekend day. Select activities to see individually.

Tags: , , , ,

Biased vs Unbiased: Debunking Statistical Myths

clique na imagem para seguir o link

Uma reflexão sobre os enviesamentos que usamos na ciência de dados.

Anyone who attended statistical training at the college level has been taught the four rules that you should always abide by, when developing statistical models and predictions:

  1. You should only use unbiased estimates
  2. You should use estimates that have minimum variance
  3. In any optimization problem (for instance to compute an estimate from a maximum likelihood function, or to detect the best, most predictive subset of variables), you should always shoot for a global optimum, not a local one.
  4. And if you violate any of the above three rules, at least you need to make sure that your estimate, when the number of observations is large, satisfies them.

As a data scientist and ex-statistician, I violate these rules (especially #1 – #3) almost daily. Indeed, that’s part of what makes data science different from statistical science.


Tags: ,