Data Quality for AI

Uma página da IBM com vários recursos sobre o pré-processamento e avaliação da qualidade dos dados.

This Data Quality for AI (or DQAI, for short) framework of services provides all the tools to enable model developers and data scientists to implement a formalized and systematic program of data preparation, the preliminary and most time consuming step of the model development lifecycle. This framework is appropriate for data being readied for supervised classification or regression tasks. It includes the necessary software to:

— implement quality checks,
— execute remediation,
— generate audit reports,
— automate all the above.

While pipe-lining of tasks is essential for scalability and repeatability, the included capabilities can also be used for custom data exploration and human-guided improvement of models. Utilization of the included services can be productive at any stage in the model development lifecycle, the offering is designed to be especially valuable early in the data processing, in the data preparation stage.

In addition to all that can be accomplished on original data sources, there are methods that, starting from an input dataset, can help synthesize new data — either for supplementation or for replacement — by learning constraints in the original data or having them specified by a developer. This can be helpful when regulatory or contractual issues prohibit direct usage of data in a modeling effort, when it is desirable to explore datasets with different constraints, or when more data is needed for training.

This offering is appropriate for use on both tabular and time series data and new supported modalities being developed.


Tags: , ,

Beyond the Top 1000 Names

Base de dados sobre nomes dos norte-americanos ao longo do tempo

To provide popular names and maintain an acceptable performance level on our servers, we provide only the top 1000 names through our forms. However, we provide almost all names for researchers interested in naming trends.

To safeguard privacy, we exclude from these files certain names that would indicate, or would allow the ability to determine, names with fewer than 5 occurrences in any geographic area. We provide these data on both a national and state-specific basis, in two separate collections of files, each zipped into a single file. The format of the data in the three file collections is described in a “readme” file contained in the respective zip files.

Tags: ,

GISTEMP Climate Spiral

Uma excelente visualização do aquecimento terrestre, veja até ao fim para uma evidência bastante clara

The visualization presents monthly global temperature anomalies between the years 1880-2021. These temperatures are based on the GISS Surface Temperature Analysis (GISTEMP v4), an estimate of global surface temperature change. Anomalies are defined relative to a base period of 1951-1980. The data file used to create this visualization can be accessed here.

The Goddard Institute of Space Studies (GISS) is a NASA laboratory managed by the Earth Sciences Division of the agency’s Goddard Space Flight Center in Greenbelt, Maryland. The laboratory is affiliated with Columbia University’s Earth Institute and School of Engineering and Applied Science in New York.

The ‘climate spiral’ is a visualization designed by climate scientist Ed Hawkins from the National Centre for Atmospheric Science, University of Reading. Climate spiral visualizations have been widely distributed, a version was even part of the opening ceremony of the Rio de Janeiro Olympics.

Tags: , , ,

A detailed guide to colors in data vis style guides

Excelente guia de cores para usar em gráficos

Lisa Charlotte Muth

I’ve heard you’re interested in creating a color palette as part of a data vis style guide. Maybe you decided to use a custom design theme at Datawrapper to make your charts more consistent-looking, and our support team asked you for some colors. Maybe you’re the first proper data vis designer at your organization, and want to bring order to chaos. Or maybe you want to redesign an existing palette because your requirements have changed.

This guide is very extensive — and can be a bit overwhelming. If you’re designing your very first color palette, don’t sweat. It’s simple:

Tags: ,

The vehicles of James Bond

Boas visualizações neste infograma sobre os veículos usados nos 25 filmes do James Bond

The name is Bond, James Bond.

2022 marks the 60th anniversary of the first James Bond movie, Dr. No. This movie became a seminal moment in cinema, and established many of the tropes which would become iconic throughout the franchise: the thrilling theme music, the gun barrel sequence, ending the movie in the arms of a beautiful Bond girl… often somewhere out at sea, on a boat.

But between Dr. No and No Time To Die, Bond has used a lot more vehicles than just boats. Let’s explore the cars, airplanes, tanks and space shuttles throughout all 25 official Bond movies!

Tags: , , ,

How the World’s Richest People Are Driving Global Warming

Um bom relatório com vários gráficos pouco comuns

By Eric RostonLeslie Kaufman and Hayley Warren24 de março de 2022

It’s the bedrock idea underpinning global climate politics: Countries that got rich by spewing greenhouse gasses have a responsibility to cut emissions faster than those that didn’t while putting up money to help poor nations adapt.

This framework made sense at the dawn of climate diplomacy. Back in 1990, almost two-thirds of all disparities in emissions could be explained by national rankings of pollution. But after more than three decades of rising income inequality worldwide, what if gaps between nation states are no longer the best way to understand the problem?

There’s growing evidence that the inequality between rich and poor people’s emissions within countries now overwhelms the country-to-country disparities. In other words: High emitters have more in common across international boundaries, no matter where they call home.

Tags: , , ,

How Russia will feel the sting of sanctions

clique na imagem para seguir o link

um artigo com bons gráficos de fitas e de áreas acumuladas

By Andrew Van DamYoujin Shin and Alyssa Fowers March 18, 2022 at 9:37 a.m. EDT

The United States, Europe and their allies rely on Russia for some oil and gas, and a few specialized materials. But they also supply Russia with much of its machinery, vehicles, technology and equipment that help Russia’s economy run.

That’s why sanctions can be so effective.

Without global trade, Russian factories would sit idle, businesses would shutter and shelves would sit bare. Even blocking some of those goods from countries that have already imposed sanctions or restrictions could dismember whole sectors in Russia. Some Russian companies that rely on imported components are already reeling — production lines at the automaker Lada reportedly went idle earlier this month.

Tags: , , ,

Who Takes Care of the Kids, By Household Income

clique na imagem para seguir o link

Um exemplo de gráfico de barras com sub-barras

By Nathan Yau

Childcare is expensive in the United States. So as you would expect, higher-income households tend to use non-parental childcare more, whereas lower-income households tend more towards only parental care. Here are the percentages, based on 2019 estimates from the National Center for Education Statistics.

Tags: ,

Age of Moms When Kids are Born

Um bom exemplo de gráficos de alfinetes.

By Nathan Yau

People have kids at a wide range of ages, but the moments tend towards where we are in life. There are social norms and biological norms. Based on data from the National Center for Health Statistics, we can see how these ranges shift by child number.

Tags: ,

Optimal Wordle Solutions

Uma aplicação que utiliza um processo de pesquisa em árvore para resolver o jogo wordle

The game Wordle has a lot of speculation online about what is the “best” first word. If we are exploring optimal strategies to solve the original game in the least number of guesses, most of it is wrong.

For humans, almost all of these words are great! However for optimal strategies, we need to examine all of the guesses, not just the first word. It turns out, it’s possible to solve 99% of all puzzles in only 4 guesses or with an average of ~3.42 guesses per win, but not with most of the “best” words found online.

Try out my solver with the best strategies that have been found so far.

Jonathan Olson

Tags: ,