Novel Views: Les Miserables
Posted by Armando Brito Mendes | Filed under SAD - DSS, visualização
Jeff Clark took a detailed look at Victor Hugo’s Les Miserables via character mentions, word connections, and word usage. The above is character mentions with color showing sentiment. Red means negative, and blue positive.
Characters are listed from top to bottom in their order of appearance. The horizontal space is segmented into the 5 volumes of the novel. Each volume is subdivided further with a faint line indicating the various books and, finally, small rectangles indicate the chapters within the books. In the 5 volumes there are a total of 48 books and 365 chapters. The height of the small rectangles indicate how frequently that character is mentioned in that particular chapter.
There’s a good amount of blue towards the end, when everyone decides everyone else isn’t so bad.
Tags: captura de conhecimento, data mining, Estat Descritiva, text mining
How to: network animation with R and the iGraph
Posted by Armando Brito Mendes | Filed under ARS - SNA, software
This article lists the steps I take to create a network animation in R, provides some example source code that you can copy and modify for your own work, and starts a discussion about programming and visualization as an interpretive approach in research. Before I start, take a look at this network animation created with R and the iGraph package. This animation is of a retweet network related to #BankTransferDay. Links (displayed as lines) are retweets, nodes (displayed as points) are user accounts. For each designated period of time (in this case, an hour), retweets are drawn and then fade out over 24 hours.
Tags: captura de conhecimento, data mining, grafos, R-software
Women as Academic Authors, 1665-2010
Posted by Armando Brito Mendes | Filed under estatística, visualização
The Chronicle of Higher Education has a look at the percentage of academic papers published by women, over the past five centuries.
The articles and authors described in this data were drawn from the corpus of JSTOR, a digital archive of scholarly papers, by researchers at the Eigenfactor Project at the University of Washington. About two million articles, representing 1765 fields and sub-fields, were examined, spanning a period from 1665 to 2011. The data are presented here for three time periods, the latest one ending in 2010, and a view that combines all periods.
Percentage of female authors is on the horizontal, and each bubble is a subfield sized by total number of authors. The graphic starts with publishing for all years, but be sure to click on the tabs for each time span to see changes.
The data is based on the archive of about two million articles from JSTOR, and a hierarchical map equation method is used to determine subfields.
The gender classification they used for names seems like it could be nifty for some applications. Gender is inferred by comparing names against the ones kept by the U.S. Social Security Administration, which includes gender. If a name was used for female at least 95 percent of the time, it was classified as a female name, and the same was done with male. Anything ambiguous was not included in the study
Tags: análise de dados, captura de conhecimento, data mining, Estat Descritiva
Advanced Statistics
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software, visualização
Welcome to Malbowges, the part of Nether Hell dominated by thieves, counsellors of Fraud (or should that just be counsellors), falsifiers and sowers of discord. It’s not a nice place for Sunday lunch. You must wade through rivers of Lucifer’s sputum to reach the answers you seek, and when you find those answers, you’ll probably wish you hadn’t bothered. Revenge is mine, ah ha ha, yah ha ha, ya ha ha ha ha ha ha ha ha ha …
Tags: captura de conhecimento, data mining, decisão médica, IBM SPSS Statistics, inferência, software estatístico
Introductory Statistics
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, videos, visualização
Welcome to Limbo, where the lustful, gluttonous and wrathful wander in endless torment. Here you can uncover the searing agony of SPSS, the stomach churning fear of central tendency and the rancid bile of z-scores. Good luck, you’ll need it.
Tags: análise de dados, captura de conhecimento, data mining, decisão médica, Estat Descritiva, IBM SPSS Statistics, inferência, inquéritos, software estatístico
Determining the Number of Components and Factors Using Parallel Analysis and Velicer’s MAP Test
Posted by Armando Brito Mendes | Filed under estatística, software, visualização
Popular statistical software packages do not have the proper procedures for determining the number of components or factors in correlation matrices. Parallel analysis and Velicer’s minimum average partial (MAP) test are validated procedures that are widely recommended by statisticians. This paper described brief and efficient programs for conducting parallel analyses and the MAP test using SPSS, SAS, and MATLAB.
Scale development using popular statistical software packages often produces results that are baffling or misunderstood by many users, which can lead to inappropriate substantive interpretations and item selection decisions. High internal consistencies do not indicate unidimensionality; item-total correlations are inflated because each item is correlated with its own error as well as the common variance among items; and the default number-of-eigenvalues-greater-than-one rule, followed by principal components analysis and varimax rotation, produces inflated loadings and the possible appearance of numerous uncorrelated factors for items that measure the same construct (Gorsuch, 1997a, 1997b). Concerned investigators may then neglect the higher order general factor in their data as they use misleading statistical output to trim items and fashion unidimensional scales.
These problems can be circumvented in exploratory factor analysis by using more appropriate factor analytic procedures and by using extension analysis as the basis for adding items to scales. Extension analysis provides correlations between nonfactored items and the factors that exist in a set of core items. The extension item correlations are then used to decide which factor, if any, a prospective item belongs to. The decisions are unbiased because factors are defined without being influenced by the extension items. One can also examine correlations between extension items and any higher order factor(s) in the core items. The end result is a comprehensive, undisturbed, and informative picture of the correlational structure that exists in a set of core items and of the potential contribution and location of additional items to the structure.
Tags: análise de dados, captura de conhecimento, decisão médica, desnvolvimento de software, IBM SPSS Statistics, software estatístico
SPSS Programs for Analyzing Lag-Sequential Categorical Data
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software
This paper describes simple and flexible programs for conducting lag sequential event analyses using SAS and SPSS. The programs read a stream of codes and produce a variety of lag sequential statistics, including transitional frequencies, expected transitional frequencies, transitional probabilities, z values, adjusted residuals, Yule’s Q values, likelihood ratio tests of stationarity across time and homogeneity across groups or segments, transformed kappas for unidirectional dependence, bidirectional dependence, parallel and nonparallel dominance, and significance levels based on both parametric and randomization tests.
Tags: captura de conhecimento, data mining, desnvolvimento de software, IBM SPSS Statistics, previsão, qualidade, software estatístico
Survey Design Tips
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
Survey Design Tips
Writing great questions is an art that like all arts requires a great amount of work, practice, and help from others. The following discussion is one that identifies some of the common pitfalls in survey design and helps in creating a great survey.
Avoid loaded or leading words or questions
Misplaced questions
Mutually non-exclusive response categories
Nonspecific questions
Confusing or unfamiliar words
Non-directed questions give respondents excessive latitude
Forcing answers
Non-exhaustive listings
Unbalanced listings
Double barreled questions
Dichotomous questions
Long questions
Questions on future intentions
Tags: captura de conhecimento, inquéritos
Survey Design
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
The Steps in a Survey Project
- Establish the goals of the project – What you want to learn
- Determine your sample – Whom you will interview
- Choose interviewing methodology – How you will interview
- Create your questionnaire – What you will ask
- Pre-test the questionnaire, if practical – Test the questions
- Conduct interviews and enter data – Ask the questions
- Analyze the data – Produce the reports
Tags: captura de conhecimento, inquéritos
SPSS Topics Logistic Regression
Posted by Armando Brito Mendes | Filed under estatística, materiais ensino, software
SPSS Topics
Logistic Regression
These pages contain links from all parts of our web site and others web sites on logistic and related types of regression. The topics will vary from introductory to advanced.
Data Analysis Examples
Annotated Output
SPSS Frequently Asked Questions
Textbook Examples
SPSS Library
SPSS Code Fragments
Tags: análise de dados, captura de conhecimento, data mining, IBM SPSS Statistics, inferência, inquéritos, software estatístico