Data sets

Datasets for academic use

 

In this page you can find links to datasets for your projects. The data itself is not stored in this site; instead, there is a link to where it can be downloaded.

What is a Data Catalog?

A centralized place that keeps meta-data on data sets, and helps users find what can be done with a dataset.

For example,  what are the allowed usage? who owns the dataset? what is the quality?

Finding a dataset

The hard way

Manually search in data catalogs. Some of them have good search engines, and some not.

The easy way

Each dataset is described by a formal set of parameters called dataset schema.

Using a standard description, we can use search (google has one for this ) to filter from the hundreds of thousands datasets and get high quality results.

Examples of schemas:

http://dataatwork.org/guides/data-package/  https://github.com/Kaggle/kaggle-api/wiki/Dataset-Metadata