Probability and Monte Carlo methods

Um bom texto de introdução à probabilidade e simulação de Monte-Carlo

Um bom texto de introdução à probabilidade e simulação de Monte-Carlo

This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. I also emphasize using programming to help gain insight into mathematics. Consequently these lectures will not always be as rigorous as they could be.

Tags

, , ,

Tags: , ,

noticias, textos e tudo o mais sobre big data

muito material interessante sobre big data

muito material interessante sobre big data

Tags: , ,

A Wakanow Guide to Geography

Um web-book sobre cartografia e SIG

Um web-book sobre cartografia e SIG

The Different Stages of Mapmaking

The Compass as a Mapping Device

Navigating with the Celestial Bodies

Follow these links to learn more about cartography:

Tags: ,

Machine Learning

Um bom curso video de machine learning

Um bom curso video de machine learning

About the Course

Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective when manual programming is not. Machine learning (also known as data mining, pattern recognition and predictive analytics) is used widely in business, industry, science and government, and  there is a great shortage of experts in it. If you pick up a machine learning textbook you may find it forbiddingly mathematical, but in this class you will learn that the key ideas and algorithms are in fact quite intuitive. And powerful!
Most of the class will be devoted to supervised learning (in other words, learning in which a teacher provides the learner with the correct answers at training time). This is the most mature and widely used type of machine learning. We will cover the main supervised learning techniques, including decision trees, rules, instances, Bayesian techniques, neural networks, model ensembles, and support vector machines. We will also touch on learning theory with an emphasis on its practical uses. Finally, we will cover the two main classes of unsupervised learning methods: clustering and dimensionality reduction. Throughout the class there will be an emphasis not just on individual algorithms but on ideas that cut across them and tips for making them work.
In the class projects you will build your own implementations of machine learning algorithms and apply them to problems like spam filtering, clickstream mining, recommender systems, and computational biology. This will get you as close to becoming a machine learning expert as you can in ten weeks!

Course Syllabus

Week One: Basic concepts in machine learning.
Week Two: Decision tree induction.
Week Three: Learning sets of rules and logic programs.
Week Four: Instance-based learning.
Week Five: Statistical learning.
Week Six: Neural networks.
Week Seven: Model ensembles.
Week Eight: Learning theory.
Week Nine: Support vector machines.
Week Ten: Clustering and dimensionality reduction.

Tags: ,

Machine Learning MOOC

Um curso muito completo de machine learning

Um curso muito completo de machine learning

About the Course

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.

FAQ

  • What is the format of the class?The class will consist of lecture videos, which are broken into small chunks, usually between eight and twelve minutes each. Some of these may contain integrated quiz questions. There will also be standalone quizzes that are not part of video lectures, and programming assignments.
  • How much programming background is needed for the course?The course includes programming assignments and some programming background will be helpful.
  • Do I need to buy a textbook for the course?No, it is self-contained.
  • Will I get a statement of accomplishment after completing this class?Yes. Students who successfully complete the class will receive a statement of accomplishment signed by the instructor.

Tags: , , , ,

Data Mining with Weka MOOC

Um curso em vídeo sobre a utilização do WEKA para data mining

Um curso em vídeo sobre a utilização do WEKA para data mining

Welcome to the free online course Data Mining with Weka

This 5 week MOOC introduced data mining concepts through practical experience with the free Weka tool.

The course featured:

The course will run again in early March 2014. To get notified about dates (enrolment, commencement), please subscribe to the announcement forum.

You can access the course material (videos, slides, etc) from here.

Tags: , ,

Why Predictive Modelers Should be Suspicious of Statistical Tests

Um excelente exemplo de correlações espúrias

Um excelente exemplo de correlações espúrias

Well, the danger is really not the statistical test per se, it the interpretation of the statistical test.

Yesterday I tweeted (@deanabb) this fun factoid: “Redskins predict Romney wins POTUS #overfit. if Redskins lose home game before election => challenger wins (17/18) http://www.usatoday.com/story/gameon/2012/11/04/nfl-redskins-rule-romney/1681023/” I frankly had never heard of this “rule” before and found it quite striking. It even has its own Wikipedia page (http://en.wikipedia.org/wiki/Redskins_Rule).

For those of us in the predictive analytics or data mining community, and those of us who use statistical tests to help out interpreting small data, 17/18 we know is a hugely significant finding. This can frequently be good: statistical tests will help us gain intuition about value of relationships in data even when they aren’t obvious.

Tags: , , ,

8 Mistakes Our Brains Make

A tomada de decisão intuitiva cai frequentemente em armadilhas

A tomada de decisão intuitiva cai frequentemente em armadilhas

1. We surround ourselves with information that matches our beliefs
2. We believe in the “swimmer’s body” illusion
3. We worry about things we’ve already lost
4. We incorrectly predict odds
5. We rationalize purchases we don’t want
6. We make decisions based on the anchoring effect
7. We believe our memories more than facts
8. We pay more attention to stereotypes than we think

Tags:

blog: Notícias de Gestão de Projetos

Um bom blog sobre gestão de projetos

Um bom blog sobre gestão de projetos

Tags:

Engineer solves Big Data Conjecture

Um problema de combinatória em contexto de big-data

Um problema de combinatória em contexto de big-data

IBM Distinguished Engineer solves Big Data Conjecture

A mathematical problem related to big data was solved by Jean-Francois Puget, engineer in the Solutions Analytics and Optimization group at IBM France. The problem was first mentioned on Data Science Central, and an award was offered to the first data scientist to solve it.

Bryan Gorman, Principal Physicist, Chief Scientist at Johns Hopkins University Applied Physics Laboratory, made a significant breakthrough in July, and won $500. Jean-Francois Puget completely solved the problem, independently from Bryan, and won a $1,000 award.

Tags: , ,