Remembering the lives lost to COVID-19 in America
Posted by Armando Brito Mendes | Filed under Data Science, infogramas \ dashboards, visualização
Na tentativa de mostrar a proporção dos números os autores apresentam uma estória gráfica baseada no tamanho de losângulos
As COVID-19 began to spread in the U.S. in March 2020, Trump administration officials estimated 100,000 to 200,000 Americans might die. A worst-case scenario, they said, meant between 1.6 million and 2.2 million might perish. The figures felt staggeringly high.
Two years later, the U.S. has reached 1 million deaths even as COVID has faded from the headlines.
At this grim milestone, we sought to refocus on the scale of loss suffered. Scroll below to see more.
Tags: belo, séries cronológicas
Who We Spend Time with as We Get Older
Posted by Armando Brito Mendes | Filed under Data Science, estatística, visualização
Um gráfico de barras horizontais animando com variações ao longo do tempo
By Nathan Yau
In high school, we spend most of our days with friends and immediate family. Then we get older and get jobs, get married, and grow our own families to spend more time with co-workers, spouses, and kids. Here’s how things change, based on a decade of data from the American Time Use Survey, from age 15 to 80.
Tags: análise de dados, animação, dinâmico, gráfico de barras
Data Quality for AI
Posted by Armando Brito Mendes | Filed under Data Science, materiais ensino, materiais para profissionais
Uma página da IBM com vários recursos sobre o pré-processamento e avaliação da qualidade dos dados.
This Data Quality for AI (or DQAI, for short) framework of services provides all the tools to enable model developers and data scientists to implement a formalized and systematic program of data preparation, the preliminary and most time consuming step of the model development lifecycle. This framework is appropriate for data being readied for supervised classification or regression tasks. It includes the necessary software to:
— implement quality checks,
— execute remediation,
— generate audit reports,
— automate all the above.
While pipe-lining of tasks is essential for scalability and repeatability, the included capabilities can also be used for custom data exploration and human-guided improvement of models. Utilization of the included services can be productive at any stage in the model development lifecycle, the offering is designed to be especially valuable early in the data processing, in the data preparation stage.
In addition to all that can be accomplished on original data sources, there are methods that, starting from an input dataset, can help synthesize new data — either for supplementation or for replacement — by learning constraints in the original data or having them specified by a developer. This can be helpful when regulatory or contractual issues prohibit direct usage of data in a modeling effort, when it is desirable to explore datasets with different constraints, or when more data is needed for training.
This offering is appropriate for use on both tabular and time series data and new supported modalities being developed.
Tags: data, data preparation, data quality
Age of Moms When Kids are Born
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Um bom exemplo de gráficos de alfinetes.
By Nathan Yau
People have kids at a wide range of ages, but the moments tend towards where we are in life. There are social norms and biological norms. Based on data from the National Center for Health Statistics, we can see how these ranges shift by child number.
Tags: belo, Estat Descritiva
The World Chess Championship In 5 Charts
Posted by Armando Brito Mendes | Filed under Data Science, estatística, relatórios, visualização
Uma descrição de um campeonato de xadrez com gráficos de diferença, histogramas, mapas de calor e gráficos de radar.
How Magnus Carlsen cemented his GOAT status over 11 games.
By Simran Parwani and Oliver Roeder
Published Dec. 14, 2021
This article is part of our 2021 World Chess Championship series.
The 2021 World Chess Championship ended last week with Magnus Carlen of Norway, the world No. 1, defending his title against challenger Ian Nepomniachtchi of Russia. It was Carlsen’s fifth victory in the world championship, a title he has held since 2013, and the match went a long way toward cementing his status as the greatest chess player of all time.
The contest featured some of the best chess ever played by humans, nearly flawless even when examined by modern, superhuman machines. It also featured a few inexplicable blunders, and just three bad moves saw Nepomniachtchi’s chances slip quickly and irretrievably away. The match also generated a lot of data! We’ve charted some of it below.
Tags: análise de dados, gráico de diferenças, histogramas
A catalog of all the Covid visualizations
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Muitas visualizações, em geral, muito boas e algumas muito originais
The COVID-19 Online Visualization Collection is a project to catalog Covid-related graphics across countries, sources, and styles. They call it COVIC for short, which seems like a stretch for an acronym and a confusing way to introduce a project to people. But, it does categorize over 10,000 figures, which could be useful as a reference and historical context.
What People Spend Most of Their Money On, By Income Group, Relatively Speaking
Posted by Armando Brito Mendes | Filed under Data Science, relatórios, visualização
um relatório com muitos gráficos de linhas
By Nathan Yau
The more money people come across, the more things they can and tend to buy. More money on average means bigger houses, more expensive cars, and fancier restaurants. But what if you look at relative spending instead of total dollars?
For example, if a lower income group uses 9 percent of their total spending to pay a mortgage, does the higher income group also pay 9 percent? Or does additional income go to other spending categories?
It varies.
The charts below show how different income groups spend their money, based on data from the Bureau of Labor Statistics for 2020. Each chart represents a spending category. Each column represents an income group.
Tags: análise de dados, Estat Descritiva, rendimentos
How to Use t-SNE Effectively
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Uma explicação bastante completa sobre as dificuldades de interpretação de gráficos obtidos pelo algoritmo t-SNE
Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading. By exploring how it behaves in simple cases, we can learn to use it more effectively.
Tags: belo, gráficos, interpretação, t-SNE
Explained Visually
Posted by Armando Brito Mendes | Filed under Data Science, estatística, matemática, materiais ensino, visualização
Boas explicações visuais iterativas de conceitos de ML e matemática
Ordinary Least Squares Regression
EV 9 – 2015/02/12
Principal Component Analysis
Axis of easy.
EV 8 – 2015/01/29
Image Kernels
EV 6 – 2015/01/20
Eigenvectors and Eigenvalues
EV 5 – 2014/11/28
Pi (π)
EV 4 – 2014/11/21
Sine and Cosine
EV 3 – 2014/11/14
Exponentiation
EV 2 – 2014/11/07
Markov Chains
Mark on, Markov EV 1 – 2014/10/30 Conditional probability You probably wouldn’t understand.
Tags: análise de dados, ensino
Map made of candy corn to show corn production
Posted by Armando Brito Mendes | Filed under Data Science, visualização
Um exemplo de um mapa feito com objetos físicos
With candy corn as her medium, Jill Hubley mapped corn production in the United States, based on data from the USDA. With just three hues of yellow, orange, and white and three heights to match, Hubley was able to clearly show the geographical patterns.