The WildChat Dataset is a corpus of 1 million real-world user-ChatGPT interactions, characterized by a wide range of languages and a diversity of user prompts. It was constructed by offering free access to ChatGPT and GPT-4 in exchange for consensual chat history collection. Using this dataset, we finetuned Meta’s Llama-2 and created WildLlama-7b-user-assistant, a chatbot which is able to predict both user prompts and assistant responses.
To learn more: dataset / model / paper

Tags: , ,

National Longitudinal Surveys

Accessing NLS Data

Public-Use Data

NLS public-use data for each cohort are available at no cost via Investigator, an online search and extraction site that enables you to review NLS variables and create your own data sets. It is not necessary to get an account to browse data, but an account is necessary to save datasets online.

The Investigator User’s Guide describes how to use this website.

An available tutorial also teaches how to search for variables in the Investigator.

For users who have the capacity to utilize extremely large data files and the programs to handle them, downloads are available for NLSY97NLSY79, and NLSY79 Child and Young Adult.

Tags: , , ,

NBA Apps and data Database

Uma lista de sites com dados, visualizações e apps sobre basqueteball

Sravan January 10, 2024 [NBA] #apps #shiny

This database has a list of apps and websites related to NBA Data and Visualizations.










Tags: , , ,

Survey of Consumer Finances (SCF)

Survey of Consumer Finances (SCF)

The 2022 Survey of Consumer Finances (SCF) is the most recent survey conducted. Below are links to the bulletin article, interactive chartbook, historical bulletin tables, full public dataset, extract dataset, replicate weight files, and documentation.

How to get a notification of changes: If you would like to receive notification about additions to the web page and updates to these surveys, please sign our guest book.

How to send a comment or question: To send a comment about the SCF website or to make technical inquiries about the SCF, please fill out our feedback form. To ensure that your question is properly routed, please select the Survey of Consumer Finances as the “Economic Data” and select no other options above the field labeled “Type your message.”

Tags: , ,

Data for Tat

Data for: Tat will tell: Tattoos and time preferences

Published: 21 October 2019| Version 1 | DOI: 10.17632/p7xw6yvd5c.1

Contributors:Bradley Ruffle,


Dataset in Stata 10 format, collected from incentivized experiments and survey.

Download All 478 KB

Categories: Economics, Social Psychology

This dataset is supplement to


*provided by DataCite

Tags: , ,

figshare – a home for research outputs

Uma excelente fonte de dados e estudos

the repository built to showcase all of your institution’s research outputs in one place

Get in touch

Our data repository

Our IR platform

Tags: , , ,

zenodo – open science

Um excelente site com montes de dados de todos os tipos

Passionate about Open Science!

Built and developed by researchers, to ensure that everyone can join in Open Science.

The OpenAIRE project, in the vanguard of the open access and open data movements in Europe was commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013.

In support of its research programme CERN has developed tools for Big Data management and extended Digital Library capabilities for Open Data. Through Zenodo these Big Science tools could be effectively shared with the long­-tail of research.

Open Science knows no borders!

The need for a catch-all is not restricted to one funder, or one nation, so the concept caught on, and Zenodo rapidly started welcoming research from all over the world, and from every discipline.

The digital revolution has necessitated a re­tooling of the scholarly processes to handle data and software, but this is proceeding at varying speeds across different communities, disciplines, and nations. To ensure no one is left behind through lack of access to the necessary tools and resources, Zenodo makes the sharing, curation and publication of data and software a reality for all researchers.

Where is there more livestock than people?

Continuing my investigation of the USDA Quickstats site I first used here

Notes on inspiration

I was first inspired to do this piece when I saw these analogous maps for France:

I figured that the USDA data I’d already been digging into had to have the data for the USA, and in fact, it did!

The data has holes in it–a county may appear one year but not the next. I got around this by using the most recent post-2010 data available for each county+animal type. When comparing these values to the human population, I made sure to use the ACS data for that same year.

The aesthetics came together very quickly. I considered doing the thing as Jules Grandin and keeping the maps ultra simple, but ultimately couldn’t resist showing the animal:human ratios instead of just which counties had more animals.

The first map ended up scratching that “ultra simple” itch, but with a bit of a twist. I chose not to show ratios in that one because it already has so much going on–I think adding in gradients of color just would have made it hard to read. I’m also quite proud of my venn diagram legend there!

Tags: , ,

Beyond the Top 1000 Names

Base de dados sobre nomes dos norte-americanos ao longo do tempo

To provide popular names and maintain an acceptable performance level on our servers, we provide only the top 1000 names through our forms. However, we provide almost all names for researchers interested in naming trends.

To safeguard privacy, we exclude from these files certain names that would indicate, or would allow the ability to determine, names with fewer than 5 occurrences in any geographic area. We provide these data on both a national and state-specific basis, in two separate collections of files, each zipped into a single file. The format of the data in the three file collections is described in a “readme” file contained in the respective zip files.

Tags: ,


Uma boa fonte de dados norte-americanos sobre census e saúde

IPUMS USA collects, preserves and harmonizes U.S. census microdata and provides easy access to this data with enhanced documentation. Data includes decennial censuses from 1790 to 2010 and American Community Surveys (ACS) from 2000 to the present.

Use it for GOOD — never for EVIL

Tags: , ,