This pipeline produces network objects and corresponding metabolites annotation for metabolite and lipid networks based on Rhea database which are used for analysis by Shiny GATOM. The pipeline uses Snakemake and Singularity for execution and uses the provided scripts while running.
- Snakemake
- Singularity v3.5.3
- At least 10 GB of disk space
- Clone the repository:
git clone git@github.com:ctlab/Rhea-network-pipeline.git
- Activate the environment with snakemake and singularity:
conda activate snakemake
- Execute the pipeline:
snakemake --use-singularity --cores 8
Execution will need at least 10 GB of empty space and can take from 2 to 10 days to run depending on the setup.
The pipeline is designed to work in specifically constructed docker image which can be found here. It contains all necessary dependencies for Python and R code as well as Reaction Decoder Tool, and Snakemake will use it automatically when executing.
Some steps of the pipeline will execute included scripts that are written in Python or R, while simpler steps will only run bash commands from Snakefile.
Pipeline takes as input files that are stored in pre_data
folder:
polymers_to_chebi.tsv
file contains mappings between ChEBI IDs and Rhea polymer IDs;mapFromSpecies.csv
file contains preprocessed mapping tables between lipid species and ChEBI IDs.
Network rds
object and metabolites annotation rds
object for two kinds of networks
-- metabolite network and lipid subnetwork -- are considered to be the output of the pipeline.
The files will be stored in network
folder.
-
List of undirected Rhea reactions is downloaded;
-
In order to construct atom network, we need to perform atom mapping with Reaction Decoder Tool. The Reaction Decoder Tool takes as input RXN files, however, undirected reactions do not have RXN files. Thus, the following steps are done:
a. All RXN-files for Rhea reactions are downloaded;
b. Mapping table between undirected, left-to-right and right-to-left reactions IDs is downloaded to distinguish left-to-right and right-to-left RXN files;
c. Only left-to-right reactions are kept;
-
The Reaction Decoder Tool is used for atom mapping of the RXN files;
-
Atom mapping tables are created with the use of ChemmineR from RXN files processed with the Reaction Decoder Tool;
-
The supplementary annotation files for the metabolite and lipid network and corresponding metabolites are created;
- Metabolites annotation includes metabolites ChEBI ID & link extracted from ChEBI, HMDB to ChEBI mapping obtained via metaboliteIDmapping R-package and ChEBI to KEGG mapping extracted from KEGG;
- Network annotation includes reactions Rhea ID & link extracted from Rhea, reaction-enzyme mapping obtained from Rhea & processed atom mapping data;
-
The network & metabolites annotation objects for the metabolite and lipid network are created:
a. Metabolites annotation object for metabolite network is created;
b. Network-object for metabolite network is created;
-
The network and corresponding metabolites annotation objects for the lipid subnetwork are created.