portal smart datacollective.com
Posted by Armando Brito Mendes | Filed under materiais para profissionais
SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.
Tags: análise de dados, big data, bioinformatica, captura de conhecimento, data mining, decisao em grupo
Data Intelligence and Analytics Resources
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software, videos, visualização
3. Big Data
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predictive Power
4. Visualization
- Detecting Patterns with the Naked Eye
- 50+ Open Source Tools for Big Data
- 40 maps that explain the world
5. Best and Worst of Data Science
- 175 Analytic and Data Science Web Sites
- 6000 Companies Hiring Data Scientists
- 100 data science, analytics, big data, visualization books
6. New Analytics Start-up Ideas
- Uniquely identify a human being with two questions
- Selling data
- A new type of weapons-grade secure email
- R in your Browser
7. Rants about Healthcare, Education, etc.
- Why statistical community is disconnected from Big Data and how to fix it
- How to eliminate a trillion dollars in healthcare costs
- Job interview question: what is wrong with this picture?
8. Career Stuff, Training, Salary Surveys
- 17 short tutorials all data scientists should read (and practice)
- Why Companies can’t find analytic talent
- Six categories of data scientists …
9. Miscellaneous
- One Page R: A Survival Guide to Data Science with R
- Boosting Algorithms for Better Predictions
- Structuredness coefficient to find patterns and associations
10. DSC Webinar Series – with video access
- Predictive Analytics with Revolution Analytics and Hortonworks, The…
- BI For Big Data
- The Value of a Modern Data Architecture with Apache Hadoop and Tera…
- Accelerating Big Data
Tags: big data, captura de conhecimento, data mining, R-software
17 short tutorials all data scientists should read
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
Here’s the list:
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predict…
- A little known component that should be part of most data science a…
- 11 Features any database, SQL or NoSQL, should have
- Clustering idea for very large datasets
- Hidden decision trees revisited
- Correlation and R-Squared for Big Data
- Marrying computer science, statistics and domain expertize
- New pattern to predict stock prices, multiplies return by factor 5
- What Map Reduce can’t do
- Excel for Big Data
- Fast clustering algorithms for massive datasets
- Source code for our Big Data keyword correlation API
- The curse of big data
- How to detect a pattern? Problem and solution
- Interesting Data Science Application: Steganography
Related link: The Data Science Toolkit
Tags: análise de dados, big data, captura de conhecimento, data mining, Excel, R-software
Tipos de recursos do Project
Posted by Armando Brito Mendes | Filed under Investigação Operacional, materiais para profissionais, planeamento
Tipos de recursos do Project – trabalho, material e custo. Temos visto em recentes artigos aqui no Blogtek aspectos ligados aos cuidados de configuração antes de iniciar o cadastramento das tarefas, a custos, a calendários, e hoje veremos como podem ser configurados os tipos de recursos do Project.
Tags: gestão de projetos
Docear – The Academic Literature Suite
Posted by Armando Brito Mendes | Filed under materiais para profissionais, refs bibliográficas, software
Docear is a unique solution to academic literature management, i.e. it helps you organizing, creating, and discovering academic literature. Among others, Docear offers:
- A single-section user-interface that allows the most comprehensive organization of your literature. With Docear, you can sort documents into categories; you can sort annotations (comments, bookmarks, and highlighted text from PDFs) into categories; you can sort annotations within PDFs; and you can view multiple annotations of multiple documents, in multiple categories – at once.
- A ‘literature suite concept‘ that combines several tools in a single application (pdf management, reference management, mind mapping, …). This allows you to draft your own papers, assignments, thesis, etc. directly in Docear and copy annotations and references from your collection directly into your draft.
- A recommender system that helps you to discover new literature: Docear recommends papers which are free, in full-text, instantly to download, and tailored to your information needs.
And did we mention that Docear is free, open source, available for Windows, Linux, and Mac OS X, and not evil?
Tags: gestão de projetos, motores de busca, text mining
Apache Spark
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software
What is Apache Spark?
Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.
To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.
To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.
What can it do?
Spark was initially developed for two applications where placing data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can run up to 100x faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our example jobs.
Spark is also the engine behind Shark, a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.
While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.
Tags: análise de dados, big data, data mining, DW \ BI
Big Data or Pig Data?
Posted by Armando Brito Mendes | Filed under materiais para profissionais
(A fable on huge amounts of data and why we don’t need models)
There was a pig who wanted to be a scientist. He was not interested in models. When asked how he planned on making sense of the world, the pig would say in a deep mysterious voice, “I don’t do models: the world is my model” and then with a twinkle in his eyes, look at his interlocutor smugly.
By his phrase, “I don’t do models, the world is my model”, he meant that the world’s data was enough for him, the pig scientist. The more the data, the more accurately the pig declared, he would be able to predict what might happen in the world.
Tags: big data, data mining, DW \ BI
Brainstorm
Posted by Armando Brito Mendes | Filed under materiais para profissionais, planeamento
Brainstorm, ou ainda Brainstorming, significa literalmente “tempestade de ideias”. No Brasil, por vezes é jocosamente denominado “toró de parpites”. É uma técnica criativa para obter ideias e soluções. De tão simples que é, muitas vezes é aplicada de forma inadequada, simplesmente como se fosse um bate-papo. Iremos ver aqui no Blogtek algumas técnicas para a busca de soluções de problemas.
Brainstorm – definição e aplicações
Brainstorm – princípios
Brainstorm – regras
Brainstorm – etapas
Tags: captura de conhecimento, decisao em grupo, gestão de projetos
The Field Guide to Data Science
Posted by Armando Brito Mendes | Filed under materiais para profissionais
Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Booz Allen Hamilton created The Field Guide to Data Science to help organizations of all types and missions understand how to make use of data as a resource. The text spells out what Data Science is and why it matters to organizations as well as how to create Data Science teams. Along the way, our team of experts provides field-tested approaches, personal tips and tricks, and real-life case studies. Senior leaders will walk away with a deeper understanding of the concepts at the heart of Data Science. Practitioners will add to their toolboxes.
Tags: big data, captura de conhecimento, data mining, DW \ BI
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
Alex Reinhart, a PhD statistics student at Carnegie Mellon University, covers some of the common analysis mistakes in Statistics Done Wrong.
Statistics Done Wrong is a guide to the most popular statistical errors and slip-ups committed by scientists every day, in the lab and in peer-reviewed journals. Many of the errors are prevalent in vast swathes of the published literature, casting doubt on the findings of thousands of papers. Statistics Done Wrong assumes no prior knowledge of statistics, so you can read it before your first statistics course or after thirty years of scientific practice.
The text is available for free online, and there’s a physical book version on the way.
Tags: análise de dados, data mining, decisão médica, inferência