Content
Resources
The following data are retrieved at the backend of PREGO.
Source | # items | Data type | Metadata | License |
---|---|---|---|---|
MEDLINE and PubMed | 33 million | abstracts (text) | no | NLM Copyright |
PubMed Central OA Subset | 2.7 million | full article (text) | no | CC for Commercial, non-commercial |
JGI IMG | 9,644 | Isolates Annotated genomes | yes | JGI Data Policy |
Struo | 21,276 | Annotated genomes | no | MIT, CC BY-SA 4.0 |
BioProject | 18,752 | Annotated genomes with abstracts (text) | yes | INSDC policy |
MG-RAST | 16,096 | markergene samples | yes | CC0 |
7,965 | metagenomic samples | yes | CC0 | |
MGnify | 10,500 | markergene samples | yes | CC-BY, CC0 |
Channels
PREGO’s contents are mainly divided into three distinct channels of information based on data origin and format. The Literature channel exploits scientific publications, i.e. abstracts and full text open access scientific publications using text mining tools. Through the Annotated Genomes and Isolates channel, PREGO retrieves genome annotations and their accompanying metadata. Finally, the Environmental Samples channel supports the integration of metagenomic analyses (from both amplicon and shotgun studies); taxonomic and functional profiles along with their corresponding metadata.
Statistics
Entities
The entities of PREGO after the NER and mapping of every source. Counts of distinct entities of Taxa, Environments (ENVO terms), Biological Processes (Gene Ontology Biological process) and Molecular Function (Gene Ontology Molecular Function).
Channel | Source | Taxonomy | Environments | Biological Processes | Molecular Function | |
---|---|---|---|---|---|---|
Literature | MEDLINE PubMed - PMC OA | Strains | 8,929 | 1,077 | 15,079 | 7,318 |
Species | 240,377 | |||||
Total | 342,506 | |||||
Environmental samples | MG-RAST amplicon | Strains | 1,392 | 162 | - | - |
Species | 4,324 | |||||
Total | 5,859 | |||||
MG-RAST metagenome | Strains | 2,522 | 258 | - | 3,839 | |
Species | 4,406 | |||||
Total | 7,157 | |||||
MGnify amplicon | Strains | 2 | 216 | 11 | - | |
Species | 1,471 | |||||
Total | 2,955 | |||||
Annotated Genomes and Isolates | JGI IMGisolates | Strains | 2,398 | 241 | - | 3,670 |
Species | 11,203 | |||||
Total | 13,849 | |||||
STRUO | Strains | 6 | - | - | 2,789 | |
Species | 19,289 | |||||
Total | 19,325 | |||||
BioProject | Strains | 5,754 | 309 | 626 | - | |
Species | 3,373 | |||||
Total | 9,393 | |||||
Total | All | Strains | 12,840 | 1,090 | 15,091 | 7,971 |
Species | 258,352 | |||||
Total | 364,508 |
Associations
The associations between entities of PREGO after co-occurrence analysis. These supported entity types of associations are Environments - Biological Processes, Environments - Molecular Functions, Taxa - Environments, Taxa - Biological Processes, Taxa - Molecular Functions.
Channel | Source | Environments - Processes | Environments - Functions | Taxonomy | Taxa - Environments | Taxa - Processes | Taxa - Function |
---|---|---|---|---|---|---|---|
Literature | MEDLINE PubMed - PMC OA | 883,997 | 422,579 | Strains | 69,968 | 590,630 | 384,079 |
Species | 778,877 | 3,501,635 | 1,961,920 | ||||
Total | 1,669,608 | 7,969,310 | 4,613,827 | ||||
Environmental samples | MG-RAST amplicon | - | - | Strains | 13,645 | - | - |
Species | 39,007 | ||||||
Total | 53,439 | ||||||
MG-RAST metagenome | - | 620,846 | Strains | 262,106 | - | 8,626,328 | |
Species | 103,913 | 10,715,548 | |||||
Total | 372,301 | 19,950,096 | |||||
MGnify amplicon | - | - | Strains | 18 | - | ||
Species | 30,122 | 351 | - | ||||
Total | 111,976 | 2,097 | |||||
Annotated Genomes and Isolates | JGI IMGisolates | - | - | Strains | 8,229 | - | 3,461,693 |
Species | 42,141 | 13,216,559 | |||||
Total | 50,888 | 16,821,850 | |||||
STRUO | - | - | Strains | - | - | 1,803 | |
Species | 4,070,195 | ||||||
Total | 4,079,312 | ||||||
BioProject | - | - | Strains | 3,263 | 7,473 | ||
Species | 4,187 | 4,294 | |||||
Total | 7,641 | 12,169 | |||||
Total | All | 883,997 | 1,043,425 | Strains | 357,229 | 598,103 | 12,473,903 |
Species | 998,247 | 3,506,280 | 29,964,222 | ||||
Total | 2,265,853 | 7,983,576 | 45,465,085 |