Finding Data

Finding data for your visualization projects and assignments should be easy. Here are 20 resources for finding rich datasets.

General Datasets

  1. UCI Machine Learning Repository: Consists of a diverse field of datasets (360 datasets currently and still growing) for the purpose of performing analytics and machine learning algorithms. http://archive.ics.uci.edu/ml/
  2. Kaggle datasets: Perfect for exploring data through visualization. https://www.kaggle.com/datasets
  3. Amazon Public Dataset: These are large datasets which deal with memory in gigabytes or terabytes. https://aws.amazon.com/public-datasets/
  4. Google Public Data: A dataset provided by Google, including Book corpus, US names, Genome dataset, BIgQuery dataset, and many more. https://cloud.google.com/public-datasets/
  5. Open Data by Socrata: Thousands of free datasets for exploration. https://opendata.socrata.com/
  6. gov: A website dedicated to supply datasets of different domains, eg. Education, Nutrient, Sports. https://catalog.data.gov/dataset?res_format=CSV
  7. Datahub: Just as its tagline states, “The easy way to get, share data”. https://datahub.io/dataset?tags=weather
  8. Harvard Dataverse:  Find most of the datasets used for research purpose, and cited in different publications. https://dataverse.harvard.edu/

Challenge based datasets

  1. KDD Data Center: Have a problem coming up with a problem statement? No worries, KDD provides you with the dataset and problem statements through its challenges. http://www.kdd.org/kdd-cup
  2. CrowdAnalytics: More challenges to solve with datasets. https://www.crowdanalytix.com/community
  3. DataDriven: Problems for data scientists to solve. https://www.drivendata.org/competitions/
  4. Big Data Innovation Challenge: Tackle real problem with these analytics, and also win a challenge. https://bigdatainnovationchallenge.org/challenges/food-security-nutrition/

Census Datasets

  1. Open Census Data: Details of population in different cities of countries is just a click away with this open data. http://census.okfn.org/en/latest/
  2. gov: Census data of United States. http://www.census.gov/data.html

Weather/Climate datasets

  1. Wunderground: Want to work with weather data? Use Wunderground’s API to get your own dataset. https://www.wunderground.com/weather/api/
  2. National Center for Environmental Information: Climate datasets available for analytics. https://www.ncdc.noaa.gov/cdo-web/datasets

News Datasets

  1. BBC Dataset: It consists of documents from the BBC news website corresponding to stories in five topical areas. http://mlg.ucd.ie/datasets/bbc.html
  2. The Guardian: A collection of news datasets from the guardian, which is updated regularly. https://www.theguardian.com/news/datablog/interactive/2013/jan/14/all-our-datasets-index

Food, and Nutrition Datasets

  1. United States Department of Agriculture: The data are provided by the Center of Nutritional Policy and Promotion giving details about food prices dataset, health eating index. https://www.cnpp.usda.gov/data
  2. Nutritional Science Blog: A blog listing some of dataset relating to the domain of nutrition. http://nutsci.org/open-nutrition-food-data/