Novel Views: Les Miserables

Visualizações de dados inovadoras baseadas em text mining

Visualizações de dados inovadoras baseadas em text mining

Jeff Clark took a detailed look at Victor Hugo’s Les Miserables via character mentions, word connections, and word usage. The above is character mentions with color showing sentiment. Red means negative, and blue positive.

Characters are listed from top to bottom in their order of appearance. The horizontal space is segmented into the 5 volumes of the novel. Each volume is subdivided further with a faint line indicating the various books and, finally, small rectangles indicate the chapters within the books. In the 5 volumes there are a total of 48 books and 365 chapters. The height of the small rectangles indicate how frequently that character is mentioned in that particular chapter.

There’s a good amount of blue towards the end, when everyone decides everyone else isn’t so bad.

Tags: , , ,

How to: network animation with R and the iGraph

Construir animações de iformação relacional com R

Construir animações de informação relacional com R

This article lists the steps I take to create a network animation in R, provides some example source code that you can copy and modify for your own work, and starts a discussion about programming and visualization as an interpretive approach in research. Before I start, take a look at this network animation created with R and the iGraph package. This animation is of a retweet network related to #BankTransferDay. Links (displayed as lines) are retweets, nodes (displayed as points) are user accounts. For each designated period of time (in this case, an hour), retweets are drawn and then fade out over 24 hours.

Tags: , , ,

Women as Academic Authors, 1665-2010

Exemplo de representação gráfica interativa com bolhas num eixo

Exemplo de representação gráfica interativa com bolhas num eixo

The Chronicle of Higher Education has a look at the percentage of academic papers published by women, over the past five centuries.

The articles and authors described in this data were drawn from the corpus of JSTOR, a digital archive of scholarly papers, by researchers at the Eigenfactor Project at the University of Washington. About two million articles, representing 1765 fields and sub-fields, were examined, spanning a period from 1665 to 2011. The data are presented here for three time periods, the latest one ending in 2010, and a view that combines all periods.

Percentage of female authors is on the horizontal, and each bubble is a subfield sized by total number of authors. The graphic starts with publishing for all years, but be sure to click on the tabs for each time span to see changes.

The data is based on the archive of about two million articles from JSTOR, and a hierarchical map equation method is used to determine subfields.

The gender classification they used for names seems like it could be nifty for some applications. Gender is inferred by comparing names against the ones kept by the U.S. Social Security Administration, which includes gender. If a name was used for female at least 95 percent of the time, it was classified as a female name, and the same was done with male. Anything ambiguous was not included in the study

Tags: , , ,

Advanced Statistics

Bons slides e outros materiais sobre clusters, AFE, SEM, reg logistica, meta-análise, MANOVA, Reliability

Bons slides e outros materiais sobre clusters, AFE, SEM, reg logistica, meta-análise, MANOVA, Reliability

Welcome to Malbowges, the part of Nether Hell dominated by thieves, counsellors of Fraud (or should that just be counsellors), falsifiers and sowers of discord. It’s not a nice place for Sunday lunch. You must wade through rivers of Lucifer’s sputum to reach the answers you seek, and when you find those answers, you’ll probably wish you hadn’t bothered. Revenge is mine, ah ha ha, yah ha ha, ya ha ha ha ha ha ha ha ha ha …

Tags: , , , , ,

Introductory Statistics

Muito boas aulas e slides sobre testes não paramétricos e SPSS

Muito boas aulas e slides sobre testes não paramétricos e SPSS

Welcome to Limbo, where the lustful, gluttonous and wrathful wander in endless torment. Here you can uncover the searing agony of SPSS, the stomach churning fear of central tendency and the rancid bile of z-scores. Good luck, you’ll need it.

Tags: , , , , , , , ,

Determining the Number of Components and Factors Using Parallel Analysis and Velicer’s MAP Test

Softwere para análise do nº de componentes em ACP e AF

Softwere para análise do nº de componentes em ACP e AF

Popular statistical software packages do not have the proper procedures for determining the number of components or factors in correlation matrices. Parallel analysis and Velicer’s minimum average partial (MAP) test are validated procedures that are widely recommended by statisticians. This paper described brief and efficient programs for conducting parallel analyses and the MAP test using SPSS, SAS, and MATLAB.

Métodos para facilitar a interpretação da AFE

Métodos para facilitar a interpretação da AFE

Scale development using popular statistical software packages often produces results that are baffling or misunderstood by many users, which can lead to inappropriate substantive interpretations and item selection decisions. High internal consistencies do not indicate unidimensionality; item-total correlations are inflated because each item is correlated with its own error as well as the common variance among items; and the default number-of-eigenvalues-greater-than-one rule, followed by principal components analysis and varimax rotation, produces inflated loadings and the possible appearance of numerous uncorrelated factors for items that measure the same construct (Gorsuch, 1997a, 1997b). Concerned investigators may then neglect the higher order general factor in their data as they use misleading statistical output to trim items and fashion unidimensional scales.

These problems can be circumvented in exploratory factor analysis by using more appropriate factor analytic procedures and by using extension analysis as the basis for adding items to scales. Extension analysis provides correlations between nonfactored items and the factors that exist in a set of core items. The extension item correlations are then used to decide which factor, if any, a prospective item belongs to. The decisions are unbiased because factors are defined without being influenced by the extension items. One can also examine correlations between extension items and any higher order factor(s) in the core items. The end result is a comprehensive, undisturbed, and informative picture of the correlational structure that exists in a set of core items and of the potential contribution and location of additional items to the structure.

Tags: , , , , ,

SPSS Programs for Analyzing Lag-Sequential Categorical Data

Programas em SPSS e SAS para análise de séries categóricas

Programas em SPSS e SAS para análise de séries categóricas

This paper describes simple and flexible programs for conducting lag sequential event analyses using SAS and SPSS.  The programs read a stream of codes and produce a variety of lag sequential statistics, including transitional frequencies, expected transitional frequencies, transitional probabilities, z values, adjusted residuals, Yule’s Q values, likelihood ratio tests of stationarity across time and homogeneity across groups or segments, transformed kappas for unidirectional dependence, bidirectional dependence, parallel and nonparallel dominance, and significance levels based on both parametric and randomization tests.

Tags: , , , , , ,

Survey Design Tips

Conselho no desenho de questões de um questionário

Conselho no desenho de questões de um questionário

Survey Design Tips
Writing great questions is an art that like all arts requires a great amount of work, practice, and help from others. The following discussion is one that identifies some of the common pitfalls in survey design and helps in creating a great survey.

Avoid loaded or leading words or questions
Misplaced questions
Mutually non-exclusive response categories
Nonspecific questions
Confusing or unfamiliar words
Non-directed questions give respondents excessive latitude
Forcing answers
Non-exhaustive listings
Unbalanced listings
Double barreled questions
Dichotomous questions
Long questions
Questions on future intentions

Tags: ,

Survey Design

Bons conselhos para realização de inquéritos

Bons conselhos para realização de inquéritos

The Steps in a Survey Project

  1. Establish the goals of the project – What you want to learn
  2. Determine your sample – Whom you will interview
  3. Choose interviewing methodology – How you will interview
  4. Create your questionnaire – What you will ask
  5. Pre-test the questionnaire, if practical – Test the questions
  6. Conduct interviews and enter data – Ask the questions
  7. Analyze the data – Produce the reports

Tags: ,

SPSS Topics Logistic Regression

Vários materiais sobre Regressão Logística

Vários materiais sobre Regressão Logística

SPSS Topics
Logistic Regression

These pages contain links from all parts of our web site and others web sites on logistic and related types of regression.  The topics will vary from introductory to advanced.
Data Analysis Examples
Annotated Output
SPSS Frequently Asked Questions
Textbook Examples
SPSS Library
SPSS Code Fragments

Tags: , , , , , ,