Machine Learning MOOC

Um curso muito completo de machine learning

Um curso muito completo de machine learning

About the Course

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.

FAQ

  • What is the format of the class?The class will consist of lecture videos, which are broken into small chunks, usually between eight and twelve minutes each. Some of these may contain integrated quiz questions. There will also be standalone quizzes that are not part of video lectures, and programming assignments.
  • How much programming background is needed for the course?The course includes programming assignments and some programming background will be helpful.
  • Do I need to buy a textbook for the course?No, it is self-contained.
  • Will I get a statement of accomplishment after completing this class?Yes. Students who successfully complete the class will receive a statement of accomplishment signed by the instructor.

Tags: , , , ,

LIBSVM — A Library for Support Vector Machines

Página dos autores da biblioteca LIBSVM, a mais usada para SVM

Página dos autores da biblioteca LIBSVM, a mais usada para SVM

LIBSVM — A Library for Support Vector Machines

Chih-Chung Chang and Chih-Jen Lin


Version 3.17 released on April Fools’ day, 2013. We slightly adjust the way class labels are handled internally. By default labels are ordered by their first occurrence in the training set. Hence for a set with -1/+1 labels, if -1 appears first, then internally -1 becomes +1. This has caused confusion. Now for data with -1/+1 labels, we specifically ensure that internally the binary SVM has positive data corresponding to the +1 instances. For developers, see changes in the subrouting svm_group_classes of svm.cpp.
We now have a nice page LIBSVM data sets providing problems in LIBSVM format.
A practical guide to SVM classification is available now! (mainly written for beginners)
LIBSVM tools available now!
We now have an easy script (easy.py) for users who know NOTHING about svm. It makes everything automatic–from data scaling to parameter selection.
The parameter selection tool grid.py generates the following contour of cross-validation accuracy. To use this tool, you also need to install python and gnuplot.

Tags: , , , , ,

Cross-validation in RapidMiner

Explica como utilizar a validação cruzada no RapidMiner

Explica como utilizar a validação cruzada no RapidMiner

Cross-validation is a standard statistical method to estimate the generalization error of a predictive model. In k-fold cross-validation a training set is divided into k equal-sized subsets. Then the following procedure is repeated for each subset: a model is built using the other (k - 1) subsets as the training set and its performance is evaluated on the current subset. This means that each subset is used for testing exactly once. The result of the cross-validation is the average of the performances obtained from the k rounds.

This post explains how to interpret cross-validation results in RapidMiner.

Tags: , ,

wekalist – resposta a questões

Um fórum de discussão sobre os algoritmos do WEKA

Um fórum de discussão sobre os algoritmos do WEKA

WEKA

This forum is an archive for the mailing list wekalist@list.scms.waikato.ac.nz (more options) Messages posted here will be sent to this mailing list.

WEKA machine learning software discussion

Tags: , ,

Presentation Graphs

Bons conselhos para a escolha de gráficos

Bons conselhos para a escolha de gráficos

Presentation graphs are key to effective visualisation, and can demonstrate data in a really engaging way. But with so many graphs to choose from, how do presenters know which one to choose? And how can they make the most of basic graphs to create engaging, truly visual slides?

Allow us to present the m62 guide to presentation graphs. We talk about the different types of graphs, and how best to use them in different situations. All of the graphs listed below can be produced quickly and easily with Microsoft PowerPoint live charts (Insert tab > Chart), but combining these with animation and other PowerPoint tools can produce even more effective graphs that will really engage your audience.

Tags: , , ,

visual exploration of US gun murders

Uma visualização animada muito dramática

Uma visualização animada muito dramática

Information visualization firm Periscopic just published a thoughtful interactive piece on gun murders in the United States, in 2010. It starts with the individuals: when they were killed, coupled with the years they potentially lost. Each arc represents a person, with lived years in orange and the difference in potential years in white. A mouseover on each arc shows more details about that person.

You can then select categories and demographics, which provide comparisons between ethnicities, gun type, sex, and others. Roll over the bar in the middle for a density plot representation.

Finally, specific breakouts on the bottom provide notables in the data and what they mean.

There are many routes that you could take with this data. At its core, it’s a multivariate dataset with many observations over an entire year. But Periscopic pays close attention to the context and the sensitivity of the data. They make the data relatable while also providing a view of the big picture—without stripping away what the data means. See it live here.

Tags: , , , ,

FlowingData Tutorials

Excelentes toturiais sobre visualizações de dados.

Excelentes tutoriais sobre visualizações de dados.

How to Animate Transitions Between Multiple Charts

Getting Started with Charts in R

How to Make an Interactive Choropleth Map

More on Making Heat Maps in R

Mapping with Diffusion-based Cartograms

How to Make an Interactive Network Visualization

A Variety of Area Charts with R

How to Draw in R and Make Custom Plots

How to Visualize and Compare Distributions

How to Make a Sankey Diagram to Show Flow

Interactive Time Series Chart with Filters

Calendar Heatmaps to Visualize Time Series Data

How to Hand Edit R Plots in Inkscape

How to Make a Contour Map

Using Color Scales and Palettes in R

Build Interactive Time Series Charts with Filters

How to map connections with great circles

How to Make Bubble Charts

How to visualize data with cartoonish faces ala Chernoff

How to: make a scatterplot with a smooth fitted line

An Easy Way to Make a Treemap

How to Make a Heatmap – a Quick and Easy Solution

How to Make an Interactive Area Graph with Flare

How to Make a US County Thematic Map Using Free Tools

How to Make a Graph in Adobe Illustrator

How to Make Your Own Twitter Bot – Python Implementation

Grabbing Weather Underground Data with BeautifulSoup

Tags: , , , , ,

Five years of traffic fatalities

Exwemplo de mapa tipo "tapete" para dados cronológicos e geográficos

Exemplo de mapa tipo "tapete" para dados cronológicos e geográficos

. John Nelson extended on that, pulling five years of data and subsetting by some factors: alcohol, weather, and if a pedestrian was involved. And he aggregated by time of day and day of week instead of calendar dates.

For example, the above is the breakdown of accidents that involved alcohol. As you might expect, there’s a higher count of traffic fatalities during the weekend and late night hours since people don’t have to work the next day. Or you can see when weather is a factor:

Tags: , , ,

Handbook of Statistical Analysis and Data Mining Applications

Livro completo no google books com as ligações entre a estatística e o DM

Livro completo no google books com as ligações entre a estatística e o DM

Índice

Tags: , ,

Data Mining for Business Intelligence

Livro completo no google books com boa introdução ao data mining

Livro completo no google books com boa introdução ao data mining

Índice

Tags: , , , ,