WEKA: Remote Experiment
Posted by Armando Brito Mendes | Filed under software
Remote experiments enable you to distribute the computing load across multiple computers. In the following we will discuss the setup and operation for HSQLDB and MySQL.
Tags: análise de dados, data mining, DW \ BI, WEKA
Survs: Ferramenta para inquéritos on-line
Posted by Armando Brito Mendes | Filed under estatística, software
Create online surveys with your team easily and efficiently.
Survs is a web-based tool to create, distribute, and analyze online surveys. Its friendly interface and compelling features provide everything you need to get feedback.
Tags: inquéritos, software estatístico
List of R Resources
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, software
There is a wealth of resources on the Web and elsewhere to learn more about R. Here are some of the best.
Tags: data mining, Estat Descritiva, R-software, software estatístico
LIBSVM — A Library for Support Vector Machines
Posted by Armando Brito Mendes | Filed under software
LIBSVM — A Library for Support Vector Machines
Chih-Chung Chang and Chih-Jen Lin
Version 3.17 released on April Fools’ day, 2013. We slightly adjust the way class labels are handled internally. By default labels are ordered by their first occurrence in the training set. Hence for a set with -1/+1 labels, if -1 appears first, then internally -1 becomes +1. This has caused confusion. Now for data with -1/+1 labels, we specifically ensure that internally the binary SVM has positive data corresponding to the +1 instances. For developers, see changes in the subrouting svm_group_classes of svm.cpp.
We now have a nice page LIBSVM data sets providing problems in LIBSVM format.
A practical guide to SVM classification is available now! (mainly written for beginners)
LIBSVM tools available now!
We now have an easy script (easy.py) for users who know NOTHING about svm. It makes everything automatic–from data scaling to parameter selection.
The parameter selection tool grid.py generates the following contour of cross-validation accuracy. To use this tool, you also need to install python and gnuplot.
Tags: captura de conhecimento, data mining, otimização, R-software, RapidMiner, WEKA
Cross-validation in RapidMiner
Posted by Armando Brito Mendes | Filed under software
Cross-validation is a standard statistical method to estimate the generalization error of a predictive model. In -fold cross-validation a training set is divided into equal-sized subsets. Then the following procedure is repeated for each subset: a model is built using the other subsets as the training set and its performance is evaluated on the current subset. This means that each subset is used for testing exactly once. The result of the cross-validation is the average of the performances obtained from the rounds.
This post explains how to interpret cross-validation results in RapidMiner.
Tags: captura de conhecimento, data mining, RapidMiner
wekalist – resposta a questões
Posted by Armando Brito Mendes | Filed under software
WEKA
WEKA machine learning software discussion
Tags: captura de conhecimento, data mining, WEKA
WEKA Cost Benefit Analysis
Posted by Armando Brito Mendes | Filed under SAD - DSS, software, visualização
The Cost/Benefit analysis component is a new visualization tool that was released in Weka versions 3.6.2 and 3.7.1. The tool is particularly useful for the analysis of predictive analytic outcomes for direct mail campaigns (or any ranking application where costs are involved). It allows the user to explore various cost/benefit tradeoffs by interactively selecting different population sizes from the ranked list of prospects or by varying the threshold on the predicted probability of the positive class.
The Cost/Benefit analysis tool is available from both the Explorer and Knowledge Flow user interfaces. In the figure below, the Knowledge Flow is being used to build a predictive model for a real-world direct mail application. The data is historical campaign data from a mail out to solicit donations to a charitable organization. The data set contains 47,706 records with 476 variables (summary variables for donor lifetime giving history, overlay demographics etc.). The percentage of donors in the data is approximately 5%. A 10-fold cross-validation is used to generate predictions from a naive Bayes classifier, and these are then passed to the Cost/Benefit analysis tool.
Tags: data mining, WEKA
Mathematical Programming in MathProg
Posted by Armando Brito Mendes | Filed under Investigação Operacional, SAD - DSS, software
GNU MathProg (part of the open source GNU GLPK project) is a modeling language for describing linear and discrete optimization problems. Use this page to create and solve MathProg models using the glpk.js solver. You can either open a model from your computer, select an example from the menu bar, or create your own from scratch. To use —
- Create a MathProg model in the Model Editor.
- Click ‘Solve’ to generate the solution.
- Click ‘Save As…’ to save the model locally to your computer.
Tags: otimização, software de otimização
How to: network animation with R and the iGraph
Posted by Armando Brito Mendes | Filed under ARS - SNA, software
This article lists the steps I take to create a network animation in R, provides some example source code that you can copy and modify for your own work, and starts a discussion about programming and visualization as an interpretive approach in research. Before I start, take a look at this network animation created with R and the iGraph package. This animation is of a retweet network related to #BankTransferDay. Links (displayed as lines) are retweets, nodes (displayed as points) are user accounts. For each designated period of time (in this case, an hour), retweets are drawn and then fade out over 24 hours.
Tags: captura de conhecimento, data mining, grafos, R-software
Advanced Statistics
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software, visualização
Welcome to Malbowges, the part of Nether Hell dominated by thieves, counsellors of Fraud (or should that just be counsellors), falsifiers and sowers of discord. It’s not a nice place for Sunday lunch. You must wade through rivers of Lucifer’s sputum to reach the answers you seek, and when you find those answers, you’ll probably wish you hadn’t bothered. Revenge is mine, ah ha ha, yah ha ha, ya ha ha ha ha ha ha ha ha ha …
Tags: captura de conhecimento, data mining, decisão médica, IBM SPSS Statistics, inferência, software estatístico