Data Quality for AI

Uma página da IBM com vários recursos sobre o pré-processamento e avaliação da qualidade dos dados.

This Data Quality for AI (or DQAI, for short) framework of services provides all the tools to enable model developers and data scientists to implement a formalized and systematic program of data preparation, the preliminary and most time consuming step of the model development lifecycle. This framework is appropriate for data being readied for supervised classification or regression tasks. It includes the necessary software to:

— implement quality checks,
— execute remediation,
— generate audit reports,
— automate all the above.

While pipe-lining of tasks is essential for scalability and repeatability, the included capabilities can also be used for custom data exploration and human-guided improvement of models. Utilization of the included services can be productive at any stage in the model development lifecycle, the offering is designed to be especially valuable early in the data processing, in the data preparation stage.

In addition to all that can be accomplished on original data sources, there are methods that, starting from an input dataset, can help synthesize new data — either for supplementation or for replacement — by learning constraints in the original data or having them specified by a developer. This can be helpful when regulatory or contractual issues prohibit direct usage of data in a modeling effort, when it is desirable to explore datasets with different constraints, or when more data is needed for training.

This offering is appropriate for use on both tabular and time series data and new supported modalities being developed.


Tags: , ,