OpenDataology is an open source dataset license compliance analysis project. This project enables users of publicly available datasets and users who curate a dataset from multiple data sources (particularly for use as a part of machine learning models) identify the potential license compliance risks. Our project is primarily comprised of three key components.

  • A dataset license compliance analysis workflow that ascertains the final allowed rights and the required obligations associated with using a publicly avialable dataset or a dataset that is curated from multiple data sources for any purpose.
  • A growing database and a web portal that documents the final rights and obligations (after the license compliance analysis is conducted) associated with the datasets and the data sources analyzed in our project. The database also documents the metadata collected and used to conduct the compliance workflow.
  • An online license generation toolkit that creators of dataset to generate custom licenses depending on the exact rights and obligations that they want to allow (instead of having to rely of existing available and limited dataset specific licenses).