Datasets
Here are some datasets that you might consider using for your final project:
Kaggle has data on just about anything you can think of. Very usable, clean data. Just stick to the social science stuff for your project. You can easily download CSV files from Kaggle but you can also access the data through the Kaggle API.
fredr is an R package that provides access to the Federal Reserve Economic Data (FRED) API. FRED is a comprehensive database of economic data maintained by the Federal Reserve Bank of St. Louis. The package allows you to search for and download data from FRED directly into R.
The US Census (censusapi) is a low-level interface to the U.S. Census Bureau’s APIs.
Use this when you want direct, flexible access to any Census endpoint (ACS, Population Estimates, Economic Census, etc.) and prefer to write your own API queries.The US Census (tidycensus) is a tidyverse-friendly wrapper around Census APIs.
NOAA Climate & Weather (rnoaa) is a R package with data on historical weather, climate, and ocean data from NOAA services.
NASA POWER Climate Data (nasapower) is a R package with data on global solar radiation, temperature, and climate datasets from NASA.
Google Public Data Explorer contains information about dozens of databases related to governance and the economy. You cannot download the raw data from Google, but you can use the site to visualize the data and then follow the link to the original source.
ILOSTAT is the statistical database of the International Labour Organisation. It has data pertaining to labor, working conditions, industrial relations, poverty and inequality.
OECD DATA provides data related to the performance of high income countries.
Our World in Data is a good general resource for political economy data. The site is centered around blog posts but you can also search for a topic, view a visualization related to that topic and then download the data used to create it.
UN Data provides access to a wide range of international statistics collected by the United Nations and its agencies. You can browse by theme (e.g., education, environment, health, population), explore country profiles, and download tables in CSV format. It’s especially useful for finding official, globally comparable indicators across countries and years.
peacesciencer is an R package maintained by Steve Miller that compiles data from a number of sources that are useful for peace and conflict studies analysis
Statista is a good place to look for data on more niche topics.
UNCTADstat is the United Nations Conference on Trade and Development statistical database. It provides harmonized data on a range of topics related to economic performance, trade and statistics.
The UN Human Development Reports include a number of important indicators related to human development, gender and sustainable development goals (SDGs).
The unvotes package provides data on United Nations General Assembly voting patterns.
Varieties of Democracy (V-DEM) provides original measures of the quality of democracy for every country dating back to the 18th century. You can access vdem data through the vdemdata package.
World Bank Development Indicators (WDI) is the primary World Bank database for development data from officially-recognized international sources. You can access WB Development Indicators through the WDI package or the wbstats package.
The World Bank DataBank provides access to dozens of additional World Bank databases on topics such as regional development, governance, education, gender and the environment. You can access world bank data through the wbstats package.
Recommendation systems – example datasets: 10 open-source datasets
For information on more specific resources available, see this page on the Gelman Library website.