Armando B. Mendes

Build Pipelines with Pandas Using pdpipe

Posted by Armando Brito Mendes | Filed under linguagens de programação, software

KDnuggets — clique na imagem para seguir o link

Boa descrição de pipelines com os data.frame do Pandas.

Introduction

Pandas is an amazing library in the Python ecosystem for data analytics and machine learning. They form the perfect bridge between the data world, where Excel/CSV files and SQL tables live, and the modeling world where Scikit-learn or TensorFlow perform their magic.

A data science flow is most often a sequence of steps — datasets must be cleaned, scaled, and validated before they can be ready to be used by that powerful machine learning algorithm.

These tasks can, of course, be done with many single-step functions/methods that are offered by packages like Pandas but a more elegant way is to use a pipeline. In almost all cases, a pipeline reduces the chance of error and saves time by automating repetitive tasks.

In the data science world, great examples of packages with pipeline features are — dplyr in R language, and Scikit-learn in the Python ecosystem.

A data science flow is most often a sequence of steps — datasets must be cleaned, scaled, and validated before they can be ready to be used

Following is a great article about their use in a machine-learning workflow.

Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction
Are you familiar with Scikit-learn Pipelines? They are an extremely simple yet very useful tool for managing machine…

Pandas also offer a .pipe method which can be used for similar purposes with user-defined functions. However, in this article, we are going to discuss a wonderful little library called pdpipe, which specifically addresses this pipelining issue with Pandas DataFrame.

In almost all cases, a pipeline reduces the chance of error and saves time by automating repetitive tasks

Tags: data mining, pandas, pipelines, Python

Read more | Comments off | February 13th, 2020

About

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec libero. Suspendisse bibendum. Cras id urna. Morbi tincidunt, orci ac convallis aliquam, lectus turpis varius lorem, eu posuere nunc justo tempus leo. Donec mattis, purus nec placerat bibendum, dui pede condimentum odio, ac blandit ante orci ut diam.

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec libero. Suspendisse bibendum. Cras id urna. Learn more...

Armando B. Mendes

Build Pipelines with Pandas Using pdpipe

Categorias de Posts

Palavras chave mais usadas

Arquivo

Recent Posts

Recent Comments

About