Repositório de Dados da Universidade do Minho
Posted by Armando Brito Mendes | Filed under data sets
Um repositório de dados da Univ do Minho
Repositório de Dados da Universidade do Minho
Para partilhar, publicar e gerir dados de investigação.
Tags: dados investigação
WildChat
Posted by Armando Brito Mendes | Filed under data sets, LLMs
Um data set com um milhão de perguntas e respostas do chatGPT
The WildChat Dataset is a corpus of 1 million real-world user-ChatGPT interactions, characterized by a wide range of languages and a diversity of user prompts. It was constructed by offering free access to ChatGPT and GPT-4 in exchange for consensual chat history collection. Using this dataset, we finetuned Meta’s Llama-2 and created WildLlama-7b-user-assistant, a chatbot which is able to predict both user prompts and assistant responses.
To learn more: dataset / model / paper
National Longitudinal Surveys
Posted by Armando Brito Mendes | Filed under data sets, visualização
dados de inquéritos americanos com ficheiros muito grandes
Accessing NLS Data
Public-Use Data
NLS public-use data for each cohort are available at no cost via Investigator, an online search and extraction site that enables you to review NLS variables and create your own data sets. It is not necessary to get an account to browse data, but an account is necessary to save datasets online.
The Investigator User’s Guide describes how to use this website.
An available tutorial also teaches how to search for variables in the Investigator.
For users who have the capacity to utilize extremely large data files and the programs to handle them, downloads are available for NLSY97, NLSY79, and NLSY79 Child and Young Adult.
Tags: inquéritos, labor, statistics, survey
NBA Apps and data Database
Posted by Armando Brito Mendes | Filed under data sets, estatística, visualização
Uma lista de sites com dados, visualizações e apps sobre basqueteball
Sravan January 10, 2024 [NBA] #apps #shiny
This database has a list of apps and websites related to NBA Data and Visualizations.
Tags: app, basquetebol, dados, graficos
Survey of Consumer Finances (SCF)
Posted by Armando Brito Mendes | Filed under data sets
Survey of Consumer Finances (SCF)
The 2022 Survey of Consumer Finances (SCF) is the most recent survey conducted. Below are links to the bulletin article, interactive chartbook, historical bulletin tables, full public dataset, extract dataset, replicate weight files, and documentation.
How to get a notification of changes: If you would like to receive notification about additions to the web page and updates to these surveys, please sign our guest book.
How to send a comment or question: To send a comment about the SCF website or to make technical inquiries about the SCF, please fill out our feedback form. To ensure that your question is properly routed, please select the Survey of Consumer Finances as the “Economic Data” and select no other options above the field labeled “Type your message.”
Tags: dados, finanças, inquérito
Data for Tat
Posted by Armando Brito Mendes | Filed under data sets
Data for: Tat will tell: Tattoos and time preferences
Published: 21 October 2019| Version 1 | DOI: 10.17632/p7xw6yvd5c.1
Contributors:Bradley Ruffle,
Description
Dataset in Stata 10 format, collected from incentivized experiments and survey.
Categories: Economics, Social Psychology
This dataset is supplement to
*provided by DataCite
figshare – a home for research outputs
Posted by Armando Brito Mendes | Filed under Data Science, data sets, estatística
Uma excelente fonte de dados e estudos
the repository built to showcase all of your institution’s research outputs in one place
Tags: dados, data, estudos, research
zenodo – open science
Posted by Armando Brito Mendes | Filed under data sets
Um excelente site com montes de dados de todos os tipos
Passionate about Open Science!
Built and developed by researchers, to ensure that everyone can join in Open Science.
The OpenAIRE project, in the vanguard of the open access and open data movements in Europe was commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013.
In support of its research programme CERN has developed tools for Big Data management and extended Digital Library capabilities for Open Data. Through Zenodo these Big Science tools could be effectively shared with the long-tail of research.
Open Science knows no borders!
The need for a catch-all is not restricted to one funder, or one nation, so the concept caught on, and Zenodo rapidly started welcoming research from all over the world, and from every discipline.
The digital revolution has necessitated a retooling of the scholarly processes to handle data and software, but this is proceeding at varying speeds across different communities, disciplines, and nations. To ensure no one is left behind through lack of access to the necessary tools and resources, Zenodo makes the sharing, curation and publication of data and software a reality for all researchers.
Where is there more livestock than people?
Posted by Armando Brito Mendes | Filed under Data Science, data sets, mapas SIG's, visualização
Continuing my investigation of the USDA Quickstats site I first used here…
Notes on inspiration
I was first inspired to do this piece when I saw these analogous maps for France:
I figured that the USDA data I’d already been digging into had to have the data for the USA, and in fact, it did!
The data has holes in it–a county may appear one year but not the next. I got around this by using the most recent post-2010 data available for each county+animal type. When comparing these values to the human population, I made sure to use the ACS data for that same year.
The aesthetics came together very quickly. I considered doing the thing as Jules Grandin and keeping the maps ultra simple, but ultimately couldn’t resist showing the animal:human ratios instead of just which counties had more animals.
The first map ended up scratching that “ultra simple” itch, but with a bit of a twist. I chose not to show ratios in that one because it already has so much going on–I think adding in gradients of color just would have made it hard to read. I’m also quite proud of my venn diagram legend there!
Tags: animais, Estat Descritiva, mapas
Beyond the Top 1000 Names
Posted by Armando Brito Mendes | Filed under data sets
Base de dados sobre nomes dos norte-americanos ao longo do tempo
To provide popular names and maintain an acceptable performance level on our servers, we provide only the top 1000 names through our forms. However, we provide almost all names for researchers interested in naming trends.
To safeguard privacy, we exclude from these files certain names that would indicate, or would allow the ability to determine, names with fewer than 5 occurrences in any geographic area. We provide these data on both a national and state-specific basis, in two separate collections of files, each zipped into a single file. The format of the data in the three file collections is described in a “readme” file contained in the respective zip files.
- National data (7Mb)
- State-specific data (21Mb)
- Territory-specific data (300Kb)