portal smart datacollective.com
Posted by Armando Brito Mendes | Filed under materiais para profissionais
SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.
Tags: análise de dados, big data, bioinformatica, captura de conhecimento, data mining, decisao em grupo
Useful Videos on Information Visualization
Posted by Armando Brito Mendes | Filed under estatística, videos, visualização
Noah Iliinsky – Data Visualizations Done Wrong – A Beautiful Collection of Stories and Tips for Success.
The Four Pillars of Data Visualization
Designing Data Visualizations with Noah Iliinsky
Best Practices for Data Visualization
Designing Data Visualizatins
Seeing the Story in the Data and Learning to Effectively Communicate – Inspired by Stephen Few Principles, Visualization Guru
David McCandless: “The beauty of data visualization” – Data Detective Telling Stories From Visualization of Information
This also has a nice quiz about visualization principles.
As I collect more, I will consolidate this list.
Tags: belo, big data, data mining, image mining
selfiecity
Posted by Armando Brito Mendes | Filed under videos, visualização
Investigating the style of self-portraits (selfies) in five cities across the world.
Selfiecity investigates selfies using a mix of theoretic, artistic and quantitative methods:
- We present our findings about the demographics of people taking selfies, their poses and expressions.
- Rich media visualizations (imageplots) assemble thousands of photos to reveal interesting patterns.
- The interactive selfiexploratory allows you to navigate the whole set of 3200 photos.
- Finally, theoretical essays discuss selfies in the history of photography, the functions of images in social media, and methods and dataset.
Selfiecity, from Lev Manovich, Moritz Stefaner, and a small group of analysts and researchers, is a detailed visual exploration of 3,200 selfies from five major cities around the world. The project is both a broad look at demographics and trends, as well as a chance to look closer at the individual observations.
There are several components to the project, but Imageplots (which you might recognize from a couple years ago) and the exploratory section, aptly named Selfiexploratory, will be of most interest.
The two parts let you filter through cities (Bangkok, Berlin, Moscow, New York, and Sao Paulo), age, gender, pose, mood, and a number of other factors, and this information is presented in a grid layout that self-updates as you browse.
So you can get a rough sense of how facets relate. There seems to be a higher proportion of female selfies and average age seems to skew towards younger as you’d expect. The average age of females in this selfie sample seems to be younger than that of males.
However, before you jump to too many conclusions about how countries vary or differences between the sexes, etc, consider the classification process, which was a combination of manual labor via Mechanical Turk and face recognition software. Age, for example, can be though to estimate from pictures alone since you have outside factors like makeup, angles, and poses. Do these things account for the two- to three-year average difference between the sexes? Maybe. So consider the data. But that should go without saying.
That said, Selfiecity is a fun one I spent a good amount of time browsing. It’s a weird, tiny peek into 3,200 people’s lives, with a dose of quant and art. And don’t miss the theoretical component in essay format, a reflection of social media, communities, and the self.
Tags: belo, data mining, image mining
Data Intelligence and Analytics Resources
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software, videos, visualização
3. Big Data
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predictive Power
4. Visualization
- Detecting Patterns with the Naked Eye
- 50+ Open Source Tools for Big Data
- 40 maps that explain the world
5. Best and Worst of Data Science
- 175 Analytic and Data Science Web Sites
- 6000 Companies Hiring Data Scientists
- 100 data science, analytics, big data, visualization books
6. New Analytics Start-up Ideas
- Uniquely identify a human being with two questions
- Selling data
- A new type of weapons-grade secure email
- R in your Browser
7. Rants about Healthcare, Education, etc.
- Why statistical community is disconnected from Big Data and how to fix it
- How to eliminate a trillion dollars in healthcare costs
- Job interview question: what is wrong with this picture?
8. Career Stuff, Training, Salary Surveys
- 17 short tutorials all data scientists should read (and practice)
- Why Companies can’t find analytic talent
- Six categories of data scientists …
9. Miscellaneous
- One Page R: A Survival Guide to Data Science with R
- Boosting Algorithms for Better Predictions
- Structuredness coefficient to find patterns and associations
10. DSC Webinar Series – with video access
- Predictive Analytics with Revolution Analytics and Hortonworks, The…
- BI For Big Data
- The Value of a Modern Data Architecture with Apache Hadoop and Tera…
- Accelerating Big Data
Tags: big data, captura de conhecimento, data mining, R-software
17 short tutorials all data scientists should read
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
Here’s the list:
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predict…
- A little known component that should be part of most data science a…
- 11 Features any database, SQL or NoSQL, should have
- Clustering idea for very large datasets
- Hidden decision trees revisited
- Correlation and R-Squared for Big Data
- Marrying computer science, statistics and domain expertize
- New pattern to predict stock prices, multiplies return by factor 5
- What Map Reduce can’t do
- Excel for Big Data
- Fast clustering algorithms for massive datasets
- Source code for our Big Data keyword correlation API
- The curse of big data
- How to detect a pattern? Problem and solution
- Interesting Data Science Application: Steganography
Related link: The Data Science Toolkit
Tags: análise de dados, big data, captura de conhecimento, data mining, Excel, R-software
Little Book of R for Time Series!
Posted by Armando Brito Mendes | Filed under estatística, software
- How to install R
- Using R for Time Series Analysis
Tags: previsão, R-software
Tipos de recursos do Project
Posted by Armando Brito Mendes | Filed under Investigação Operacional, materiais para profissionais, planeamento
Tipos de recursos do Project – trabalho, material e custo. Temos visto em recentes artigos aqui no Blogtek aspectos ligados aos cuidados de configuração antes de iniciar o cadastramento das tarefas, a custos, a calendários, e hoje veremos como podem ser configurados os tipos de recursos do Project.
Tags: gestão de projetos
Analytic Hierarchy Process (AHP)
Posted by Armando Brito Mendes | Filed under ADM - multicritério, Investigação Operacional, planeamento, SAD - DSS
• To disseminate knowledge and resources on Analytic Hierarchy Process (AHP) based Multi Criteria Decision Making (MCDM) technique • To create a forum for AHP users • To disseminate AHP related activities taking place globally • Helping people and institutions in making complex decisions •
Tags: decisao em grupo, decisão médica
How many statisticians does it take to split a bill?
Posted by Armando Brito Mendes | Filed under estatística, software
stas
Some thoughts on the Fall term, now that Spring is well under way [edit: added a few more points]:
- RMarkdown and knitr are amazing. When I next teach a course using R, my students will be turning in homeworks using these tools: The output immediately shows whether the code runs and what its results are. This is much better than students copying and pasting possibly-broken code and unconnected output into a text file or (gasp) Word document.
- I’m glad my cohort socializes outside the office, taking each other out for birthday lunches or going to see a Pirates game. Some of the older PhD students are so focused on their thesis work that they don’t take time for a social break, and I’d like to avoid getting stuck in that rut.
However! Our lunches always lead us back to the age old question: How many statisticians does it take to split a bill? Answer: too long. I threw together a Shiny app, DinneR, to help us answer this question.
Tags: big data, data mining, R-software, software estatístico
Using Dates and Times in R
Posted by Armando Brito Mendes | Filed under estatística, software
Today at the Davis R Users’ Group, Bonnie Dixon gave a tutorial on the various ways to handle dates and times in R. Bonnie provided this great script which walks through essential classes, functions, and packages. Here it is piped throughknitr::spin
. The original R script can be found as a gist here.