U.S. CENSUS DATA FOR SOCIAL, ECONOMIC, AND HEALTH RESEARCH
Posted by Armando Brito Mendes | Filed under Data Science, data sets, estatística
Uma boa fonte de dados norte-americanos sobre census e saúde
IPUMS USA collects, preserves and harmonizes U.S. census microdata and provides easy access to this data with enhanced documentation. Data includes decennial censuses from 1790 to 2010 and American Community Surveys (ACS) from 2000 to the present.
Use it for GOOD — never for EVIL
Tags: census data, dados, saúde
THREE RESOURCES TO STUDY TIME USE
Posted by Armando Brito Mendes | Filed under data sets
Boa fonte de dados sem custo.
IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community contexts. Data and services available free of charge.
Tags: census, mapas, microdata, saúde, time, vários países
2020 Census Results
Posted by Armando Brito Mendes | Filed under Data Science, data sets, estatística
Fonte de dados do census dos EUA
Decennial Census P.L. 94-171 Redistricting Data Summary Files
Includes the official data, documentation, and support materials to assist in accessing P.L. 94-171 Redistricting Data.
2020 Census Apportionment Results
On August 12, we released the redistricting data to the states and the public. States may use these data in redrawing congressional, legislative, and local district boundaries. The Census Bureau will also deliver the final redistricting data toolkit to all states and the public by September 30. COVID-19-related delays and prioritizing the delivery of these apportionment results delayed our original plan.
More 2020 Census population results will be available later including statistics on age, sex, race and ethnicity, and details about the center of population. The results for the U.S. Island Areas will also be provided in a separate release at a later date.
Tags: census, dados, população
European Marine Observation and Data Network (EMODnet)
Posted by Armando Brito Mendes | Filed under Data Science, data sets
Uma boa fonte de dados sobre o oceano
Bathymetry
Data on bathymetry (water depth), coastlines, and geographical location of underwater features: wrecks.
Biology
Data on temporal and spatial distribution of species abundance and biomass from several taxa.
Chemistry
Data on the concentration of nutrients, organic matter, pesticides, heavy metals, radionuclides and antifoulants in water, sediment and biota.
Geology
Data on seabed substrate, sea-floor geology, coastal behaviour, geological events, and minerals.
Human activities
Data on the intensity and spatial extent of human activities at sea.
Physics
Data on salinity, temperature, waves, currents, sea-level, light attenuation, and FerryBoxes.
Seabed habitats
Data, maps and models on the spatial distribution and extent of seabed habitats and communities.
Tags: dados, mar, ocean, oceano
spatula for writing maintainable web scrapers
Posted by Armando Brito Mendes | Filed under Data Science, data sets, software
Boa biblioteca Python para web scraping
spatula is a modern Python library for writing maintainable web scrapers.
Source: https://github.com/jamesturk/spatula
Documentation: https://jamesturk.github.io/spatula/
Issues: https://github.com/jamesturk/spatula/issues
All the passes
Posted by Armando Brito Mendes | Filed under data sets, relatórios, visualização
Uma visualização e de mais de 882 mil passes de futebol
A visualisation of 882,536 passes from 890 matches played in various major leagues/cups such as
- the Champion League 1999
- FA Women’s Super League 2018
- FIFA World Cup 2018, La Liga 2004 – 2020
- NWSL 2018
- Premier League 2003 – 2004
- Women’s World Cup 2019
Data provided by StatsBomb
Original inspiration: Alexander Varlamov’s blog post
Tags: futebol, gráfico de pontos, movimento
Seeing How Much We Ate Over the Years
Posted by Armando Brito Mendes | Filed under Data Science, data sets, visualização
Um excelente relatório sobre o q os americanos têm vindo a comer desde q há registos. Gráficos excelentes.
The United States Department of Agriculture keeps track of food availability for over 200 items, which can be used to estimate food consumption at the national level. They have data for 1970 through 2019, so we can for example, see how much beef Americans consume per year on average and how that has changed over four decades.
So that’s what I did.
How long will chicken reign supreme? Who wins between lemon and lime? Is nonfat ice cream really ice cream? Does grapefruit ever make a comeback? Find out in the charts below.
The rankings are broken into six main food groups: proteins, vegetables, fruits, dairy, grains, and added fats.
Tags: alimentação, area charts, food
Why Am I Numb To The Numbers?
Posted by Armando Brito Mendes | Filed under Data Science, data sets, estatística, visualização
Sobre a dormência causada pelos grandes números.
COMIC: For My Job, I Check Death Tolls From COVID. Why Am I Numb To The Numbers?
April 25, 20218:16 AM ET
Each week I check the latest deaths from COVID-19 for NPR. After a while, I didn’t feel any sorrow at the numbers. I just felt numb. I wanted to understand why — and how to overcome that numbness.
Tags: Banda Desenhada, big number dumb, psicologia
data.world
Posted by Armando Brito Mendes | Filed under data sets
https://data.world/datasets/us-crime
repositório de open data sobre criminalidade nos EUA
There are 21 us crime datasets available on data.world.
Find open data about us crime contributed by thousands of users and organizations across the world.
TOP OPEN DATA TOPICS
waterfowl (5557)
geodata (6752)
transportation (6398)
geospatial (3771)
wildlife (2176)
health (3294)
ngda (3911)
transect (2872)
statistic (3122)
active (3022)
oregon (2214)
cso (3142)
environment (3601)
hxl (9076)
education (3705)
boundaries (2608)
completed (10013)
inlandwaters (2401)
doi (14169)
statbank (3110)
Tags: dados
OpenML Data
Posted by Armando Brito Mendes | Filed under Data Science, data sets
Bom site com muitos dados para Aprendizagem
Only showing active (verified) datasets.
3239 results
credit-g (1) This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: “` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 “` It is worse… 505934 runs19 likes239 downloads258 reach28 impact
1000 instances – 21 features – 2 classes – 0 missing values
blood-transfusion-service-center (1) Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan — this is a classification problem. To demonstrate the RFMTC marketing model (a modified version of RFM), this study… 467766 runs5 likes86 downloads91 reach41 impact
748 instances – 5 features – 2 classes – 0 missing values
monks-problems-2 (1) Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one… 394293 runs2 likes27 downloads29 reach37 impact
601 instances – 7 features – 2 classes – 0 missing val