List of Data Catalogs

 

List of data catalogs

 Let Google Search work for you!

 

Take a look at some more datasets here: Datasets to be included

if media types are described, the following legends holds:

a – audio
g – graphs
t – text (all kinds of)
p – images
v – video

Color code of catalog quality:    unknown, low, medium, high

(low means you should not blindly rely on the correctness of the data. It is not about the data format). The ranking is based on subjective assessment of Noam Cohen

Name link number of datasets media types comment
Open data index.okfn.org 15 topics t
Technion library.technion.ac.il/he/libraries-worldwide/
Semantic Scholar api.semanticscholar.org/ 1 g downloadable archive of (meta data) of scientific papers.

200M scientific papers

UCI machine learning repository https://archive.ics.uci.edu/ml//index.php 600 pt
Kaggle link ??pt? good quality datasets since they are verified by kaggle.
github github.com/datasets/awesome-data a list of interesting datasets, and particularly the “awsome-data” list.

Accessing the data is using https://datahub.io/collections which requires sign up and might be broken

FiveThirtyEight https://github.com/fivethirtyeight hundreds t news and sports platform , open data related to USA
AWS public Datasets https://registry.opendata.aws/ 300 ??pt high quality datasets, mainly from large organizations and governments.
Data.world data.world/datasets/open-data 134K the social network for data professionals.
Buzzfeed News https://github.com/BuzzFeedNews/everything 5 t
Google archive of datasets www.tensorflow.org/datasets/catalog/overview hundreds agtv high quality datasets
Academic Torrents https://academictorrents.com/browse.php 2400 agtpv for sharing datasets from scientific papers

Distributed system for sharing big datasets for researchers.

Our World in data https://ourworldindata.org
https://github.com/owid/owid-datasets
hundreds t  small datasets, mainly time series
NY city data opendata.cityofnewyork.us/
https://data.cityofnewyork.us/browse
3500 anything the city thinks is worth of keeping.
Updated daily
HuggingFace huggingface.co/datasets 5000 t  

MIMIC-III

https://www.nature.com/articles/sdata201635  50000 tp a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers[…] 

Zenodo.org

https://zenodo.org/  8000 Zenodo is a general-purpose open repository.

 It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts.

arxiv.org

local arxiv page 1 t arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science[…]

 

 

Government catalogs

State link number of data sets media types comment
UK https://en.wikipedia.org/wiki/Data.gov.uk
USA https://en.wikipedia.org/wiki/Data.gov
France www.data.gouv.fr/fr/datasets/ 37K
India data.gov.in 10K CATALOGS
Israel (למ”ס) https://www.cbs.gov.il/he/Statistics/Pages