Data Mining with Weka MOOC

Posted by Armando Brito Mendes | Filed under Habilitações Académicas, materiais ensino, software, videos

Um curso em vídeo sobre a utilização do WEKA para data mining

Welcome to the free online course Data Mining with Weka

This 5 week MOOC introduced data mining concepts through practical experience with the free Weka tool.

The course featured:

video lectures by Professor Ian H. Witten
- available on YouTube and YouKu
- English captions on YouTube
- English & Chinese subtitles on YouKu
- CC-BY videos & slides
the open-source Weka data mining platform
access to chapters from Data Mining (3rd Edition)
- discounts from Morgan Kaufmann
online assessment leading to a statement of completion

The course will run again in early March 2014. To get notified about dates (enrolment, commencement), please subscribe to the announcement forum.

You can access the course material (videos, slides, etc) from here.

Tags: data mining, software estatístico, WEKA

Read more | Comments off | December 18th, 2013

WEKA: Remote Experiment

Posted by Armando Brito Mendes | Filed under software

permite computação distribuida usando um servidor com algoritmos WEKA

permite computação distribuída usando um servidor com algoritmos WEKA

Remote experiments enable you to distribute the computing load across multiple computers. In the following we will discuss the setup and operation for HSQLDB and MySQL.

Tags: análise de dados, data mining, DW \ BI, WEKA

Read more | Comments off | June 28th, 2013

LIBSVM — A Library for Support Vector Machines

Posted by Armando Brito Mendes | Filed under software

Página dos autores da biblioteca LIBSVM, a mais usada para SVM

LIBSVM — A Library for Support Vector Machines

Chih-Chung Chang and Chih-Jen Lin

Version 3.17 released on April Fools’ day, 2013. We slightly adjust the way class labels are handled internally. By default labels are ordered by their first occurrence in the training set. Hence for a set with -1/+1 labels, if -1 appears first, then internally -1 becomes +1. This has caused confusion. Now for data with -1/+1 labels, we specifically ensure that internally the binary SVM has positive data corresponding to the +1 instances. For developers, see changes in the subrouting svm_group_classes of svm.cpp.
We now have a nice page LIBSVM data sets providing problems in LIBSVM format.
A practical guide to SVM classification is available now! (mainly written for beginners)
LIBSVM tools available now!
We now have an easy script (easy.py) for users who know NOTHING about svm. It makes everything automatic–from data scaling to parameter selection.
The parameter selection tool grid.py generates the following contour of cross-validation accuracy. To use this tool, you also need to install python and gnuplot.

Tags: captura de conhecimento, data mining, otimização, R-software, RapidMiner, WEKA

Read more | Comments off | May 30th, 2013

wekalist – resposta a questões

Posted by Armando Brito Mendes | Filed under software

Um fórum de discussão sobre os algoritmos do WEKA

WEKA

This forum is an archive for the mailing list wekalist@list.scms.waikato.ac.nz (more options) Messages posted here will be sent to this mailing list.

WEKA machine learning software discussion

Tags: captura de conhecimento, data mining, WEKA

Read more | Comments off | May 23rd, 2013

WEKA Cost Benefit Analysis

Posted by Armando Brito Mendes | Filed under SAD - DSS, software, visualização

Análise de custo benefício para avaliação de modelos

The Cost/Benefit analysis component is a new visualization tool that was released in Weka versions 3.6.2 and 3.7.1. The tool is particularly useful for the analysis of predictive analytic outcomes for direct mail campaigns (or any ranking application where costs are involved). It allows the user to explore various cost/benefit tradeoffs by interactively selecting different population sizes from the ranked list of prospects or by varying the threshold on the predicted probability of the positive class.

The Cost/Benefit analysis tool is available from both the Explorer and Knowledge Flow user interfaces. In the figure below, the Knowledge Flow is being used to build a predictive model for a real-world direct mail application. The data is historical campaign data from a mail out to solicit donations to a charitable organization. The data set contains 47,706 records with 476 variables (summary variables for donor lifetime giving history, overlay demographics etc.). The percentage of donors in the data is approximately 5%. A 10-fold cross-validation is used to generate predictions from a naive Bayes classifier, and these are then passed to the Cost/Benefit analysis tool.

Tags: data mining, WEKA

Read more | Comments off | February 25th, 2013