WEKA: Remote Experiment
Posted by Armando Brito Mendes | Filed under software
Remote experiments enable you to distribute the computing load across multiple computers. In the following we will discuss the setup and operation for HSQLDB and MySQL.
Tags: análise de dados, data mining, DW \ BI, WEKA
Survs: Ferramenta para inquéritos on-line
Posted by Armando Brito Mendes | Filed under estatística, software
Create online surveys with your team easily and efficiently.
Survs is a web-based tool to create, distribute, and analyze online surveys. Its friendly interface and compelling features provide everything you need to get feedback.
Tags: inquéritos, software estatístico
List of R Resources
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, software
There is a wealth of resources on the Web and elsewhere to learn more about R. Here are some of the best.
Tags: data mining, Estat Descritiva, R-software, software estatístico
Introduction to R for SAS and SPSS Users
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
R is free software for data analysis and graphics that is similar to SAS and SPSS. Two million people are part of the R Open Source Community. Its use is growing very rapidly and Revolution Analytics distributes a commercial version of R that adds capabilities that are not available in the Open Source version. This 60-minute webinar is for people who are familiar with SAS or SPSS who want to know how R can strengthen their analytics strategy. It will include:
- What R is and how it compares to SAS and SPSS
- An overview of how to install and maintain it
- How to find R add-on modules comparable to those for SAS and SPSS
- Which of R’s many user interfaces are most like those of SAS and SPSS
- How to run R from within SAS and SPSS
- What a simple R program looks like
- Q&A with Bob Muenchen
Repaly the webcast and find out how SAS and SPSS users can take advantage of R.
Tags: data mining, IBM SPSS Statistics, R-software, software estatístico
Using Metadata to Find Paul Revere
Posted by Armando Brito Mendes | Filed under ARS - SNA, visualização
It’s just metadata. What can you do with that? Kieran Healy, a sociology professor at Duke University, shows what you can do, with just some basic social network analysis. Using metadata from Paul Revere’s Ride on the groups that people belonged to, Healy sniffs out Paul Revere as a main target. Bonus points for writing the summary from the point of a view of an 18th century analyst.
What a nice picture! The analytical engine has arranged everyone neatly, picking out clusters of individuals and also showing both peripheral individuals and—more intriguingly—people who seem to bridge various groups in ways that might perhaps be relevant to national security. Look at that person right in the middle there. Zoom in if you wish. He seems to bridge several groups in an unusual (though perhaps not unique) way. His name is Paul Revere.
You can grab the R code and dataset on github, too, if you want to follow along.
Tags: ARS\SNA applicações, ARS\SNA intro, belo, grafos
Stupid Calculations
Posted by Armando Brito Mendes | Filed under matemática, visualização
Josh Orter takes back-of-the-napkin math to the next level with Stupid Calculations, which promises to turn practical facts into utterly useless ones. Stupid calculation number one is the size of a giant iPhone screen if you combined all the iPhone screens ever sold into one.
The eye-glazing calculations are laid out below for those who appreciate the dirty work but, skipping ahead, the Kubrick-inspired monophone would stretch 5,059 feet into the sky and have a base measuring 2,846 feet across (Central Park is 2,640 feet wide). Its surface area would take in 2.07 billion square inches. That’s 14.39 million square feet or 330.54 acres. The new World Trade Center, by comparison, will have a surface area of 23 glass-clad acres, giving us enough screenage to watch Game of Thrones on all four sides of fourteen WTCs.
See also how long it would it take to drink the water in an olympic-sized pool through a straw.
Tags: belo, Estat Descritiva
Map Blog Dashboard
Posted by Armando Brito Mendes | Filed under videos, visualização
Videos uploaded within 48 hours may not yet appear in age and gender breakdowns.
Tags: belo
Agile & Scrum Portugal
Posted by Armando Brito Mendes | Filed under materiais para profissionais, planeamento
Agile & Scrum Portugal 2013 is getting ready to be another awesome event!
This year’s program combines AgilePT with the ScrumPT annual gathering, and therefore it will accommodate all interests of all agile community in Portugal
Tags: gestão de projetos
LIBSVM — A Library for Support Vector Machines
Posted by Armando Brito Mendes | Filed under software
LIBSVM — A Library for Support Vector Machines
Chih-Chung Chang and Chih-Jen Lin
Version 3.17 released on April Fools’ day, 2013. We slightly adjust the way class labels are handled internally. By default labels are ordered by their first occurrence in the training set. Hence for a set with -1/+1 labels, if -1 appears first, then internally -1 becomes +1. This has caused confusion. Now for data with -1/+1 labels, we specifically ensure that internally the binary SVM has positive data corresponding to the +1 instances. For developers, see changes in the subrouting svm_group_classes of svm.cpp.
We now have a nice page LIBSVM data sets providing problems in LIBSVM format.
A practical guide to SVM classification is available now! (mainly written for beginners)
LIBSVM tools available now!
We now have an easy script (easy.py) for users who know NOTHING about svm. It makes everything automatic–from data scaling to parameter selection.
The parameter selection tool grid.py generates the following contour of cross-validation accuracy. To use this tool, you also need to install python and gnuplot.
Tags: captura de conhecimento, data mining, otimização, R-software, RapidMiner, WEKA
Cross-validation in RapidMiner
Posted by Armando Brito Mendes | Filed under software
Cross-validation is a standard statistical method to estimate the generalization error of a predictive model. In -fold cross-validation a training set is divided into equal-sized subsets. Then the following procedure is repeated for each subset: a model is built using the other subsets as the training set and its performance is evaluated on the current subset. This means that each subset is used for testing exactly once. The result of the cross-validation is the average of the performances obtained from the rounds.
This post explains how to interpret cross-validation results in RapidMiner.