Datasets for academic use
In this page you can find links to datasets for your projects. The data itself is not stored in this site; instead, there is a link to where it can be downloaded.
What is a Data Catalog?
A centralized place that keeps meta-data on data sets, and helps users find what can be done with a dataset.
For example, what are the allowed usage? who owns the dataset? what is the quality?
Finding a dataset
The hard way
Manually search in data catalogs. Some of them have good search engines, and some not.
The easy way
Each dataset is described by a formal set of parameters called dataset schema.
Using a standard description, we can use search (google has one for this ) to filter from the hundreds of thousands datasets and get high quality results.
Examples of schemas:
http://dataatwork.org/guides/data-package/ https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata