{"id":2028,"date":"2020-02-13T09:12:37","date_gmt":"2020-02-13T10:12:37","guid":{"rendered":"http:\/\/sites.uac.pt\/amendes\/?p=2028"},"modified":"2020-02-13T09:21:18","modified_gmt":"2020-02-13T10:21:18","slug":"2028","status":"publish","type":"post","link":"https:\/\/sites.uac.pt\/amendes\/data-mining\/2028\/","title":{"rendered":"Build Pipelines with Pandas Using pdpipe"},"content":{"rendered":"\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a href=\"https:\/\/www.kdnuggets.com\/2019\/12\/build-pipelines-pandas-pdpipe.html\"><img loading=\"lazy\" decoding=\"async\" width=\"171\" height=\"72\" src=\"https:\/\/sites.uac.pt\/amendes\/files\/2016\/01\/KDnuggets.png\" alt=\"KDnuggets\" class=\"wp-image-1701\" \/><\/a><figcaption>clique na imagem para seguir o link<\/figcaption><\/figure><\/div>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>Boa descri\u00e7\u00e3o de pipelines com os data.frame do Pandas.<\/strong><\/pre>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\nIntroduction\n<p>Pandas is an amazing library in the Python ecosystem for data analytics and machine learning. They form the perfect bridge between the data world, where Excel\/CSV files and SQL tables live, and the modeling world where Scikit-learn or TensorFlow perform their magic.<\/p>\n<p>A data science flow is most often a sequence of steps \u2014 datasets must be cleaned, scaled, and validated before they can be ready to be used by that powerful machine learning algorithm.<\/p>\n<p>These tasks can, of course, be done with many single-step functions\/methods that are offered by packages like Pandas but a more elegant way is to use a pipeline. In almost all cases, a pipeline reduces the chance of error and saves time by automating repetitive tasks.<\/p>\n<p>In the data science world, great examples of packages with pipeline features are \u2014\u00a0<a href=\"https:\/\/dplyr.tidyverse.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">dplyr in R language<\/a>, and\u00a0<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/compose.html\" target=\"_blank\" rel=\"noreferrer noopener\">Scikit-learn in the Python ecosystem<\/a>.<\/p>\n<p>A data science flow is most often a sequence of steps \u2014 datasets must be cleaned, scaled, and validated before they can be ready to be used<\/p>\n<p>Following is a great article about their use in a machine-learning workflow.<\/p>\n<p><a href=\"https:\/\/www.kdnuggets.com\/2017\/12\/managing-machine-learning-workflows-scikit-learn-pipelines-part-1.html?source=post_page-----cade6128cd31----------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction<\/strong><\/a><br \/>Are you familiar with Scikit-learn Pipelines? They are an extremely simple yet very useful tool for managing machine\u2026<br \/>\u00a0<\/p>\n<p>Pandas also offer a\u00a0<code><strong>.pipe<\/strong><\/code>\u00a0method which can be used for similar purposes with user-defined functions. However, in this article, we are going to discuss a wonderful little library called\u00a0<a href=\"https:\/\/github.com\/shaypal5\/pdpipe\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>pdpipe<\/strong><\/a>, which specifically addresses this pipelining issue with Pandas DataFrame.<\/p>\n<p>In almost all cases, a pipeline reduces the chance of error and saves time by automating repetitive tasks<\/p>\n<\/div><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Boa descri\u00e7\u00e3o de pipelines com os data.frame do Pandas. Introduction Pandas is an amazing library in the Python ecosystem for data analytics and machine learning. They form the perfect bridge between the data world, where Excel\/CSV files and SQL tables live, and the modeling world where Scikit-learn or TensorFlow perform their magic. A data science [&hellip;]<\/p>\n","protected":false},"author":159,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[103,194,137],"tags":[4053,4054,4050],"class_list":["post-2028","post","type-post","status-publish","format-standard","hentry","category-data-mining","category-linguagens-de-programacao","category-software","tag-pandas","tag-pipelines","tag-python"],"_links":{"self":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/2028","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/users\/159"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/comments?post=2028"}],"version-history":[{"count":3,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/2028\/revisions"}],"predecessor-version":[{"id":2031,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/2028\/revisions\/2031"}],"wp:attachment":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/media?parent=2028"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/categories?post=2028"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/tags?post=2028"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}