Tasks

Text mining

extraction of process-environment-organism associations buried in the scientific literature, free text descriptions of biological database records, and in dedicated community web pages
development of a named entity recognition module to identify environmental process mentions in text
association strength based on comention statistics

Knowledge gathering and association extraction

collection of process-environment-organism evidence from public data record metadata and computational analysis record annotations
confidence score assignment to each annotation based on its evidence source

Homology-based annotation transfer

prediction (based on sequence homology) of related processes and environments for novel sequences and/or sequences with insufficient metadata
sequence-based searches against the PREGO platform facilitation
in-house sequence upload (along with their metadata) in the PREGO platform

Association unification

unification of the calculated process-environment-organism associations
design and implementation of an overall confidence score for the PREGO associations
management and storing the unified associations in a database
periodic update of the PREGO associations and liaison to emerging data infrastructures

Association presentation

PREGO associations: made accessible to researchers via a web platform
facilitation of text and sequence searches against the PREGO process-environment-organism associations
confidence and supporting evidence display for retrieved associations
enrichment analysis and network-based assocation exploration
programmatic access and bulk download options made available