PlotDevice: Draw with Python
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, software, visualização
Uma biblioteca de funções em Pyton para construir visualizações de dados.
You’ve been able to visualize data with Python for a while, but Mac application PlotDevice from Christian Swinehart couples code and graphics more tightly. Write code on the right. Watch graphics change on the right.
The application gives you everything you need to start writing programs that draw to a virtual canvas. It features a text editor with syntax highlighting and tab completion plus a zoomable graphics viewer and a variety of export options.
PlotDevice’s simple but comprehensive set of graphics commands will be familiar to users of similar graphics tools like NodeBox or Processing. And if you’re new to programming, you’ll find there’s nothing better than being able to see the results of your code as you learn to think like a computer.
Looks promising. Although when I downloaded it and tried to run it, nothing happened. I’m guessing there’s still compatibility issues to iron out at version 0.9.4. Hopefully that clears up soon. [via Waxy]
Tags: big data, data mining, desnvolvimento de software, Estat Descritiva
How People in America Spend Their Day
Posted by Armando Brito Mendes | Filed under estatística, visualização
Um gráfico de áreas como forma de visualizar como os americanos ocupam o seu tempo ao longo do dia.
»
From Shan Carter, Amanda Cox, Kevin Quealy, and Amy Schoenfeld of The New York Times is this new interactive stacked time series on how different groups in America spend their day. The data itself comes from the American Time Use Survey. The interactive has a similar feel to Martin Wattenberg’s Baby Name Voyager, but it has the NYT pizazz that we’ve all come to know and love.
Explore time use by gender, race, age, education, and employment. View all activities (e.g. work, traveling) or select a specific action to drill down into the graph. From there, you’ll find time aggregates that you can compare against depending on what filter you’ve selected.
Tags: belo, big data, data mining, Estat Descritiva
Big data: The next frontier for innovation
Posted by Armando Brito Mendes | Filed under materiais para profissionais
The amount of data in our world has been exploding, and analyzing large data sets—so-called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey’s Business Technology Office. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.
Tags: big data, data mining, modelos empresariais
Why use R? Five reasons
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software
Why use R? Five reasons.
In this post I will go through 5 reasons: zero cost, crazy popularity, awesome power, dazzling flexibility, and mind-blowing support. I believe R is the best statistical programming language to learn. As a blogger who has contributed over 150 posts in Stata and over 100 in R I have extensive experience with both a proprietary statistical programming language as well as the open source alternative. In my graduate career I have also had the opportunity to experiment with the proprietary software SPSS, SAS, Mathematica, as well as MPlus.
Tags: big data, definição, R-software, software estatístico
9 “must read” articles on Big Data
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais, visualização
My selection
- Big Data – From Descriptive to Prescriptive
- Can big data be racist?
- NodeXL Graph Gallery: Graph Details
- Best Metrics For Digital Marketing: Rock Your Own And Rent Strategies
- Big Data: from mining to meaning
- Beautiful versus useful visualizations (in French, but interesting)
- Learning and Teaching Machine Learning: A Personal Journey
- Big data techniques and technologies
- The Sexiest Job of the 21st Century is Tedious, and that Needs to C… (*)
- From the trenches: 360-degree data science
(*) I disagree with this Harvard Business Review author. Senior data scientists work on high level data from various sources, use automated processes for EDA (exploratory analysis) and spend little to no time in tedious, routine, mundane tasks (less than 5% of my time, in my case). I also use robust techniques that work well on relatively dirty data, and … I create and design the data myself in many cases.
Tags: belo, big data, data mining, Estat Descritiva, grafos
portal smart datacollective.com
Posted by Armando Brito Mendes | Filed under materiais para profissionais
SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.
Tags: análise de dados, big data, bioinformatica, captura de conhecimento, data mining, decisao em grupo
Useful Videos on Information Visualization
Posted by Armando Brito Mendes | Filed under estatística, videos, visualização
Noah Iliinsky – Data Visualizations Done Wrong – A Beautiful Collection of Stories and Tips for Success.
The Four Pillars of Data Visualization
Designing Data Visualizations with Noah Iliinsky
Best Practices for Data Visualization
Designing Data Visualizatins
Seeing the Story in the Data and Learning to Effectively Communicate – Inspired by Stephen Few Principles, Visualization Guru
David McCandless: “The beauty of data visualization” – Data Detective Telling Stories From Visualization of Information
This also has a nice quiz about visualization principles.
As I collect more, I will consolidate this list.
Tags: belo, big data, data mining, image mining
Data Intelligence and Analytics Resources
Posted by Armando Brito Mendes | Filed under materiais para profissionais, software, videos, visualização
3. Big Data
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predictive Power
4. Visualization
- Detecting Patterns with the Naked Eye
- 50+ Open Source Tools for Big Data
- 40 maps that explain the world
5. Best and Worst of Data Science
- 175 Analytic and Data Science Web Sites
- 6000 Companies Hiring Data Scientists
- 100 data science, analytics, big data, visualization books
6. New Analytics Start-up Ideas
- Uniquely identify a human being with two questions
- Selling data
- A new type of weapons-grade secure email
- R in your Browser
7. Rants about Healthcare, Education, etc.
- Why statistical community is disconnected from Big Data and how to fix it
- How to eliminate a trillion dollars in healthcare costs
- Job interview question: what is wrong with this picture?
8. Career Stuff, Training, Salary Surveys
- 17 short tutorials all data scientists should read (and practice)
- Why Companies can’t find analytic talent
- Six categories of data scientists …
9. Miscellaneous
- One Page R: A Survival Guide to Data Science with R
- Boosting Algorithms for Better Predictions
- Structuredness coefficient to find patterns and associations
10. DSC Webinar Series – with video access
- Predictive Analytics with Revolution Analytics and Hortonworks, The…
- BI For Big Data
- The Value of a Modern Data Architecture with Apache Hadoop and Tera…
- Accelerating Big Data
Tags: big data, captura de conhecimento, data mining, R-software
17 short tutorials all data scientists should read
Posted by Armando Brito Mendes | Filed under estatística, materiais para profissionais
Here’s the list:
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- A synthetic variance designed for Hadoop and big data
- Fast Combinatorial Feature Selection with New Definition of Predict…
- A little known component that should be part of most data science a…
- 11 Features any database, SQL or NoSQL, should have
- Clustering idea for very large datasets
- Hidden decision trees revisited
- Correlation and R-Squared for Big Data
- Marrying computer science, statistics and domain expertize
- New pattern to predict stock prices, multiplies return by factor 5
- What Map Reduce can’t do
- Excel for Big Data
- Fast clustering algorithms for massive datasets
- Source code for our Big Data keyword correlation API
- The curse of big data
- How to detect a pattern? Problem and solution
- Interesting Data Science Application: Steganography
Related link: The Data Science Toolkit
Tags: análise de dados, big data, captura de conhecimento, data mining, Excel, R-software
How many statisticians does it take to split a bill?
Posted by Armando Brito Mendes | Filed under estatística, software
stas
Some thoughts on the Fall term, now that Spring is well under way [edit: added a few more points]:
- RMarkdown and knitr are amazing. When I next teach a course using R, my students will be turning in homeworks using these tools: The output immediately shows whether the code runs and what its results are. This is much better than students copying and pasting possibly-broken code and unconnected output into a text file or (gasp) Word document.
- I’m glad my cohort socializes outside the office, taking each other out for birthday lunches or going to see a Pirates game. Some of the older PhD students are so focused on their thesis work that they don’t take time for a social break, and I’d like to avoid getting stuck in that rut.
However! Our lunches always lead us back to the age old question: How many statisticians does it take to split a bill? Answer: too long. I threw together a Shiny app, DinneR, to help us answer this question.
Tags: big data, data mining, R-software, software estatístico