MetaCrowd: Crowdsourcing Biomedical Metadata Quality Assessment

In this project, MetaCrowd, we utilize the power of non-experts via crowdsourcing as a means for metadata quality assessment, specifically for the Gene Expression Omnibus (GEO) dataset.

Abstract

To reuse the enormous amounts of biomedical data available on the Web, there is an urgent need for good quality metadata. This is extremely important to ensure that data is maximally Findable, Accessible, Interoperable and Reusable.The Gene Expression Omnibus (GEO) allow users to specify metadata in the form of textual key: value pairs (e.g. sex: female). However, since there is no structured vocabulary or format available, the 44,000,000+ key: value pairs suffer from numerous quality issues. Using domain experts for the curation is not only time consuming but also unscalable. Thus, in our approach, MetaCrowd, we apply crowdsourcing as a means for GEO metadata quality assessment. Our results show crowdsourcing is a reliable and feasible way to identify similar as well as erroneous metadata in GEO.This is extremely useful for data consumers and producers for curating and providing good quality metadata.

Project Team