Downloads
All data in Rhea is freely available and can be downloaded from our FTP site in different formats. The complete current and previous releases (starting from release 100) can also be downloaded as tar archives.
Content:
Reactions
Reactions are available in these data formats:
- RDF
This is a representation of the Rhea data in the Resource Description Framework (RDF) format, that can also be queried directly at the Rhea SPARQL endpoint.
- BioPAX level 3
This is a community standard data exchange format for biological pathway data in an OWL RDF/XML serialization. It covers the core Rhea data types, but some aspects of Rhea cannot be expressed in BioPAX (e.g. residues of Rhea macromolecules or polymerization indexes of Rhea polymers) and these are added as "bp:COMMENT".
- RXN
This is a MDL CT file format that represents unidirectional processes. For this reason, bidirectional reactions and reactions with undefined directions cannot be described in this format.
- RD
This is a MDL CT file format that consists of a set of records, each of them defining a reaction - in RXN format - and any associated data.
- TSV (tab-separated values)
- rhea-reaction-smiles.tsv: Reaction SMILES for Rhea directed reactions (left-to-right and right-to-left), computed with RDKit using the Rhea RXN file as input (beta release).
- rhea-directions.tsv: A mapping of undirected reactions with their corresponding left-to-right, right-to-left and bidirectional reactions.
- rhea-relationships.tsv: The Rhea hierarchical reaction classification.
- rhea-obsoletes.tsv: The list of obsolete reactions.
Reaction participants
Information about the chemical entities that participate in Rhea reactions is available in the following files:
- rhea-mol.tar.gz (and mol directory): A tar file (and directory with individual files) of the 2D structure of each participant in the MDL Molfile format.
- rhea.sdf.gz: A single file with the 2D structures of all participants in the MDL Molfile format.
- chebiId_name.tsv: A list of the participants (small molecules only) that are used in Rhea reactions as a tab-separated values file with the columns 1. participant ID, 2. participant name.
- chebi_pH7_3_mapping.tsv: A mapping of ChEBI entities to the major microspecies at pH 7.3 as a tab-separated values file with the columns 1. ChEBI ID, 2. ChEBI ID of major microspecies at pH 7.3, 3. origin of mapping ('computation' or 'curation').
- chebi.owl.gz: A representation of the ChEBI data in the Web Ontology Language (OWL) format, that can also be queried directly at the Rhea SPARQL endpoint. Please note that the file contains a snapshot of the ChEBI data that is synchronized with the Rhea (and UniProt) release cycle, and the OWL representation differs slightly from ChEBI's OWL model: It lacks
Axiom
about most synonyms, but has additional properties to faciliate queries by a) the participant names that are used in Rhea (http://purl.uniprot.org/core/name
) and b) compounds with different protonation states (http://purl.obolibrary.org/obo/chebi#has_major_microspecies_at_pH_7_3
).
- TSV (tab-separated values)
- rhea-chebi-smiles.tsv: Canonical SMILES for the subset of ChEBI used in Rhea, computed with RDKit using the ChEBI Molfile as input (beta release).
Cross-references
Rhea cross-references to other databases are available as tab-separated values (TSV) files:
- Enzymes
- rhea2ec.tsv (Enzyme Classification)
- rhea-ec-iubmb.tsv (Enzyme Classification)
- rhea2uniprot_sprot.tsv (UniProtKB/Swiss-Prot)
- rhea2uniprot_trembl.tsv.gz (UniProtKB/TrEMBL)
- rhea2go.tsv (Gene Ontology 'Molecular Function' concepts)
- Other reaction resources
- rhea2xrefs.tsv (all except UniProtKB)
All Rhea TSV files (except UniProtKB) can be downloaded in a single archive: rhea-tsv.tar.gz
NLP datasets
The curated EnzChemRED dataset is available as a set of BioC files, which can be downloaded in a single archive: EnzChemRED.tar.gz