{"id":1773,"date":"2016-11-18T12:17:54","date_gmt":"2016-11-18T13:17:54","guid":{"rendered":"http:\/\/sites.uac.pt\/amendes\/?p=1773"},"modified":"2016-11-18T12:32:12","modified_gmt":"2016-11-18T13:32:12","slug":"decision-trees-splitting","status":"publish","type":"post","link":"https:\/\/sites.uac.pt\/amendes\/data-mining\/decision-trees-splitting\/","title":{"rendered":"Decision trees: Do Splitting Rules Really Matter?"},"content":{"rendered":"<div id=\"attachment_1774\" style=\"width: 260px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.salford-systems.com\/resources\/whitepapers\/do-splitting-rules-really-matter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1774\" class=\"size-full wp-image-1774\" src=\"http:\/\/sites.uac.pt\/amendes\/files\/2016\/11\/salford-logo-final.png\" alt=\"clicar na imagem para seguir o link\" width=\"250\" height=\"75\" \/><\/a><p id=\"caption-attachment-1774\" class=\"wp-caption-text\">clicar na imagem para seguir o link<\/p><\/div>\n<p><span style=\"color: #ff0000\">Um bom texto sobre o crit\u00e9rio de divis\u00e3o em subgrupos nas \u00e1rvores de decis\u00e3o.<\/span><\/p>\n<p style=\"text-align: justify\">Do decision-tree splitting criteria  matter? Contrary to popular opinion in data mining circles, our  experience indicates that splitting criteria do matter; in fact, the  difference between using the right rule and the wrong rule could add up  to millions of dollars of lost opportunity.<\/p>\n<p style=\"text-align: justify\">So, why haven&#8217;t the differences been  noticed? The answer is simple. When data sets are small and  highly-accurate trees can be generated easily, the particular splitting  rule does not matter. When your golf ball is one inch from the cup,  which club or even which end you use is not important because you will  be able to sink the ball in one stroke. Unfortunately, previous  examinations of splitting rule performance, the ones that found no  differences, did not look at data-mining problems with large data sets  where obtaining a good answer is genuinely difficult.<\/p>\n<p style=\"text-align: justify\">When you are trying to detect fraud,  identify borrowers who will declare bankruptcy in the next 12 months,  target a direct mail campaign, or tackle other real-world business  problems that do not admit of 90+ percent accuracy rates (with currently  available data), the splitting rule you choose could materially affect  the accuracy and value of your decision tree. Further, even when  different splitting rules yield similarly accurate classifiers, the  differences between them may still matter. With multiple classes, you  might care how the errors are distributed across classes. Between two  trees with equal overall error rates, you might prefer a tree that  performs better on a particular class or classes. If the purpose of a  decision tree is to yield insight into a causal process or into the  structure of a database, splitting rules of similar accuracy can yield  trees that vary greatly in their usefulness for interpreting and  understanding the data.<\/p>\n<p style=\"text-align: justify\">This paper explores the key differences  between three important splitting criteria: Gini, Twoing and Entropy,  for three- and greater-level classification trees, and suggests how to  choose the right one for a particular problem type. Although we can make  recommendations as to which splitting rule is best suited to which type  of problem, it is good practice to always use several splitting rules  and compare the results. You should experiment with several different  splitting rules and should expect different results from each. As you  work with different types of data and problems, you will begin to learn  which splitting rules typically work best for specific problem types.  Nevertheless, you should never rely on a single rule alone;  experimentation is always wise.<\/p>\n<p style=\"text-align: justify\">Gini, Twoing, and Entropy<\/p>\n<p style=\"text-align: justify\">The best known rules for binary  recursive partitioning are Gini, Twoing, and Entropy. Because each rule  represents a different philosophy as to the purpose of the decision  tree, each grows a different style of tree.<\/p>\n<p><span>Guardar<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Um bom texto sobre o crit\u00e9rio de divis\u00e3o em subgrupos nas \u00e1rvores de decis\u00e3o. Do decision-tree splitting criteria matter? Contrary to popular opinion in data mining circles, our experience indicates that splitting criteria do matter; in fact, the difference between using the right rule and the wrong rule could add up to millions of dollars [&hellip;]<\/p>\n","protected":false},"author":159,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[103],"tags":[74],"class_list":["post-1773","post","type-post","status-publish","format-standard","hentry","category-data-mining","tag-analise-de-dados"],"_links":{"self":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/1773","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/users\/159"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/comments?post=1773"}],"version-history":[{"count":3,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/1773\/revisions"}],"predecessor-version":[{"id":1776,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/1773\/revisions\/1776"}],"wp:attachment":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/media?parent=1773"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/categories?post=1773"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/tags?post=1773"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}