{"id":1156,"date":"2014-01-13T12:43:49","date_gmt":"2014-01-13T13:43:49","guid":{"rendered":"http:\/\/sites.uac.pt\/amendes\/?p=1156"},"modified":"2014-01-13T12:43:49","modified_gmt":"2014-01-13T13:43:49","slug":"apache-spark","status":"publish","type":"post","link":"https:\/\/sites.uac.pt\/amendes\/data-mining\/apache-spark\/","title":{"rendered":"Apache Spark"},"content":{"rendered":"<div id=\"attachment_1157\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/spark.incubator.apache.org\/\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1157\" class=\"size-medium wp-image-1157\" src=\"https:\/\/sites.uac.pt\/amendes\/files\/2014\/01\/spark-project-header1-cropped-300x150.png\" alt=\"Uma alternativa ao Hadoop para computa\u00e7\u00e3o com dados em mem\u00f3ria\" width=\"300\" height=\"150\" srcset=\"https:\/\/sites.uac.pt\/amendes\/files\/2014\/01\/spark-project-header1-cropped-300x150.png 300w, https:\/\/sites.uac.pt\/amendes\/files\/2014\/01\/spark-project-header1-cropped.png 438w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-1157\" class=\"wp-caption-text\">Uma alternativa ao Hadoop para computa\u00e7\u00e3o com dados em mem\u00f3ria<\/p><\/div>\n<h2 id=\"what-is-apache-spark\">What is Apache Spark?<\/h2>\n<p>Apache Spark is an open source cluster computing system that aims to make data analytics <em>fast<\/em> \u2014 both fast to run and fast to write.<\/p>\n<p>To run programs faster, Spark offers a general execution model that  can optimize arbitrary operator graphs, and supports in-memory  computing, which lets it query data faster than disk-based engines like  Hadoop.<\/p>\n<p>To make programming faster, Spark provides clean, concise APIs in <a href=\"http:\/\/www.scala-lang.org\">Scala<\/a>, <a href=\"http:\/\/spark.incubator.apache.org\/docs\/latest\/quick-start.html#a-standalone-app-in-java\">Java<\/a> and <a href=\"http:\/\/spark.incubator.apache.org\/docs\/latest\/quick-start.html#a-standalone-app-in-python\">Python<\/a>. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.<\/p>\n<h2 id=\"what-can-it-do\">What can it do?<\/h2>\n<p>Spark was initially developed for two  applications where placing data in memory helps: <em>iterative<\/em> algorithms, which are common in machine learning, and <em>interactive<\/em> data mining. In both cases, Spark can run up to <strong>100x<\/strong> faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our <a href=\"http:\/\/spark.incubator.apache.org\/examples.html\">example jobs<\/a>.<\/p>\n<p>Spark is also the engine behind <a href=\"http:\/\/shark.cs.berkeley.edu\">Shark<\/a>, a fully <a href=\"http:\/\/hive.apache.org\">Apache Hive<\/a>-compatible data warehousing system that can run 100x faster than Hive.<\/p>\n<p>While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Apache Spark? Apache Spark is an open source cluster computing system that aims to make data analytics fast \u2014 both fast to run and fast to write. To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster [&hellip;]<\/p>\n","protected":false},"author":159,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[103,150,137],"tags":[74,174,171],"class_list":["post-1156","post","type-post","status-publish","format-standard","hentry","category-data-mining","category-materiais-para-profissionais","category-software","tag-analise-de-dados","tag-big-data","tag-dw-bi"],"_links":{"self":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/1156","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/users\/159"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/comments?post=1156"}],"version-history":[{"count":2,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/1156\/revisions"}],"predecessor-version":[{"id":1159,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/posts\/1156\/revisions\/1159"}],"wp:attachment":[{"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/media?parent=1156"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/categories?post=1156"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.uac.pt\/amendes\/wp-json\/wp\/v2\/tags?post=1156"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}