SlideShare a Scribd company logo
Bioinformatics Data Manipulation:
Molecular Online Tools & BioExtract Server
Theme: FXN Gene and Pancreatic Cancer.
Lab #1
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Etienne.gnimpieba@usd.edu
Context
0. Specification & Aims
.
Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart,
spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although
its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy
production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich
ataxia begin to experience the signs and symptoms of the disorder around puberty.
Bioinformatics Molecular Online Tools and Server
Keywords:
Bio: FXN, Frataxin, pancreatic cancer, CDKN4
Math: HMM,
Informatics: programing, bioinformatics tools, getting
and exporting data
Reduced expression of frataxin is
the cause of Friedrich's ataxia
(FRDA), a lethal neurodegenerative
disease, how about liver cancer?
Aim: The purpose of this lab is to initiate online
biological exploration tools of the human model large
scale data study (metabolic, proteic, genomic, …). We
simulated the application on FXN gene and pancreatic
cancer disease. Now we can understand how a
researcher can come to identify cross biological
knowledge available in data banks.
Acquired skills
Online and server tools:
- Query biological DB (fasta, Html, txt, figure formats)
- Sequence tools (protein and gene)
Alignment (showalign, clustalw2), similarity, …
- Manage data result (select, keep, map, export)
- Build and reuse workflow
Biological Hypothesis
FXN on chromosome 9
Frataxin molecule structure (pymol)
Pancreatic cancerPancreasanatomy
?
BiologicalDB
Tools
Resolution Process
T2. Genome exploration:
Objective: Use of Ensembl to localize the FXN on the human
genome and identify the genes implicate in pancreatic cancer
disease.
T3. Sequences manipulation
Objective: Find similar sequence using BLAST tools
and make an alignment on given sequences.
T2.1. Locate a given gene on human genome
T2.2. Get a genomic sequence from NCBI
T2.3. Get the protein data and sequence from EBI
T2.4. Save the export sequences data in data folder
T3.1. Find similar sequences using BLAST tool
T3.2. Align generated sequences with ClustalW tool
T3.3. Visualized result using phylogenic tree on
Jalview
T5. BioExtract server
Objective: used server tool to optimized data
manipulation process, apply on BioExtract server.
T5.1. Server Initialization
T5.2. Pancreatic cancer & Frataxin (FXN)
T5.3. Mapping, Alignment
T5.4. Workflow save & reused
T4. Protein Data and Structural
Biology Knowledge
Objective: To provide protein levels of frataxin study
and its connection with pancreatic cancer (functional ad
structural data)
T1. Metabolomics
Objective: Use metabolic data repository to
understand the frataxin protein mechanism
T1.1. Finding the Enzyme and Pathway related to
Frataxin using KEGG
T1.2. Finding the Reaction involved with Frataxin
using Reactome
T1.3. Using BRENDA for enzyme data on Frataxin
T1.4. Using Collected data for Analysis
T1.5. Redu the process with Pancreatic Cancer
Results
T4.1. Structural Knowledge on Frataxin using
SBKB
T4.2. Using Uniprot for Frataxin Protein Study
T4.3. Protein-Protein Interaction using STRING
T4.4. Using same method for Pancreatic Cancer
and compare
Data Manipulation Molecular Online Tools and BioExtract Server
T1. Metabolomics
Objective : Use metabolic data repository to understand the frataxin protein mechanism
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T1.1. Finding the Enzyme and Pathway related to Frataxin using KEGG
T1.2. Finding the Reaction involved with Frataxin using Reactome
T1.3. Using BRENDA to find information on Frataxin
On the Reactome website: http://www.reactome.org/ReactomeGWT/entrypoint.html
o Search frataxin and select the 4th result with Frataxin in the title. This shows you the pathway model related to frataxin
and how frataxin is involved in it.
On the BRENDA Database website: http://www.brenda-enzymes.org/
o Search using the E.C. number obtained in T1.1 and select the result given. This website gives multitudes of information on
the enzyme including the reaction, related species, and so on. At the very bottom of the webpage you can select other
databases that have infromation on the same compound or protein
On the KEGG Database website: http://www.genome.jp/kegg/
o Search frataxin, and select the first result under KEGG Gene Database (hsa:2395)
o Copy the E.C. number given in “Definition” (EC:1.16.3.1)
o In order to find the related pathway, search the E.C. number in the general KEGG Database search (click on the KEGG
logo on top)
o Select the result given in the KEGG Enzyme Database at the bottom. Here you can see how this enzyme is involved in the
metabolism given.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
T1.4. Using Collected Information to Analyze the Data
On the BioModels website: http://www.ebi.ac.uk/biomodels-main/
o Search using the E.C. number obtained in T1.1 and select the first result given. Here you can download the SMBL file (in
student folder) for this pathway (top left corner) and analyze it in the Sematic SBML website.
http://semanticsbml.org/semanticSBML/simple/index
o Click on the first box “Find Similar Models” and click “Browse” and select the file you just saved from BioModels. In this
website you can use multiple tools to analyze the model and compare with other models as well.
T1.5. Same Process Searching for Pancreatic Cancer Results (Optional)
o Use the same process searching instead for pancreatic cancer results.
Molecular Online Tools and BioExtract Server
T2. Genome Exploration
Objective: Use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreatic
cancer disease. Next, find an appropriate data (sequence) on FASTA format.
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
On the NCBI website: http://www.ncbi.nlm.nih.gov/guide/
o Pull down “All Databases” and select “Gene” database, then do a keyword search using term FXN
o Click the corresponding Homo-sapiens FXN gene (first result)
o Scroll down and look for the “NCBI Reference Sequences” title and go to subtitle “mRNA and Proteins”
o Click on the corresponding accession number of the first transcript variant (NM_000144.4)
o Get the same sequence in FASTA format by clicking on “FASTA” link
o Click Send on the top right in blue, select complete record, file, FASTA, and Create File – then save in
student folder if possible (will save in downloads automatically)
T2.1. Locate a given gene on human genome
T2.2. Get a genomic sequence from NCBI (42 DataBases)
The common protein name for FXN is Frataxin
On the EBI website: http://www.ebi.ac.uk/
o Type “FXN” in the search and click on “find”
o Select the Homo Sapien Frataxin to get all the information about the protein (function, domains, structure, gene expression..)
o Don’t close the window
T2.3. Get the protein information and sequence from EBI
On the Ensembl web site http://uswest.ensembl.org/index.html
o Select our species "human“
o Do a keyword search using the term "FXN“
o Follow the link of the “Gene” drop down feature
o Click the link for “Location”
o Export this gene by clicking “Export data” (left side bar) in html file as a FASTA sequence.
o Click Next
o Click the “HTML” link
o Do the same process by searching for “pancreatic cancer”. When you find the list of genes, select the CDKN2A gene
Data Manipulation
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Molecular Online Tools and BioExtract Server
T3. Sequences Manipulation
Objective : Find similar sequence using BLAST tools and make alignment on given sequences.
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T3.1. Find similar sequences using BLAST tool
T3.2. Align generated sequences with ClustalW tool
o Select about 10 different species then click on “Align” at the bottom of the screen. Selected sequences will be
directly inserted in ClustalW tool and the tool will run automatically.
o From the right menu, it is possible to select similarities, polar residues, aromatic residues, etc. if interested…
o Through the same page you may add further sequences to the same alignment if needed. You can also access
the phylogenetic tree. More details about the residues and the distances can be obtained by clicking on
“Jalview” on the top right in orange. (May have to open Jalview manually)
o In Jalview, click “file”, “add sequences”, “from file”, then select the sequence file you save earlier.
o Continuing from Task T2.3, select the “Protein” tab on the left and select “view sequence in Uniprot”
o You can get the Fasta format of the protein by clicking on “fasta” in the top right
o Go back to previous page (using browser’s back button) and check the box next to the first sequence under
“Sequences” title.
o Select the “Blast” tool in the drop down menu then click on “Go” .
o The best matched sequences will appear on the first page (green indicates a better match). To see other
sequences you can click on next. Blast parameters can be modified by clicking on “Options” at the top
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Data Manipulation Molecular Online Tools and BioExtract Server
T4. Protein Data and Structure Data
Objective : To provide protein levels of frataxin study and its connection with pancreatic cancer
(functional ad structural data)
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T4.1. Structural Knowledge on Frataxin using SBKB
T4.2. Using Uniprot for Frataxin Protein Study
T4.3. Protein-Protein Interaction using STRING
On Uniprot Database: http://www.uniprot.org/
o Search frataxin and select the first 3 results given and click “Download” in top right. You can then
“Open” or “Download” any of the results given
On the STRING Database: http://string-db.org/
o Search under “search by name” “FXN”.
oSelect the first result given and click “Continue”. Here you can look at the Protein-Protein
Interaction model and obtain more information on a given protein or integration by clicking on it
in the model, as well as use many other useful tools.
On Systems Biology Knowledgebase (SBKB): http://www.sbkb.org/
o Select “by text” (options on left) and search “frataxin”.
o For our example select the link next to “Structures and annotations…” Here you can obtain information
on all the different hits such a structure by looking under all the given tabs.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
T4.4. Using same method for Pancreatic Cancer and compare
o Go back to the STRING Database home page search under “multiple names” “frataxin” and
“pancreatic cancer”. Select the first result.
oSelect all three results given and click “Continue”. Here it shows the 3 proteins we have
selected, however there are no interaction shown between them in this database.
o Can widen the given result by change our search for cancer in general.
o (If previous step was skipped, then this step is skipped as well) Again go to the query tab and search “FXN”. Search and select a few listings.
Export them as done in T5.2 Go to the tools tab.
o Select similarity search tools, then select “blastp”. Select “use records on extract page formatted as “Fasta”. Under "choose search set" select the
database "swissprot"
o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keep
only selected records.” Again export the records.
o Go to the tools tab again, select “iPlant”, then “clustal w2”. Select “use records on extract page formatted as “Fasta”. Your 10 protein sequences
will be automatically incorporated as an input in clustalw2 tool. Execute the tool. Use the pull down for “Search Results” and select “clustalw2.fa”
before viewing the results.
Data Manipulation Molecular Online Tools and BioExtract Server
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
T5. Bioextract Server
Objective : Use Workflow Management Systems (WMS) to optimized data manipulation processes (BioExtract server).
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T5.4. Workflow save & reused
http://bioextract.orgT5.1. Server Initialization
T5.2. Pancreatic cancer & Frataxin (FXN) data
T5.3. Mapping, Alignment
o Register on BioExtract Server to be able to create and save your own workflows.
o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.”
o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click on “save records”.
o Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”. Click
on “Add Seach Line” and select “Species” and type “Human”. Submit the query.
o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapien
Frataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results can
be saved or extracted in Fasta or txt format (Export the records in FASTA format)
o Click to the "tools" tab. then click on “Alignment Tools”, and “showalign”. Select “Use records on extract page formatted in Fasta”.
o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking on “view results”.
o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ” (Optional)
o Go back to the “workflow” tab and click “create and import workflows”. Write a name and a description for your workflow then click on Save. All
the previous steps will be saved in this workflow.
o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of the workflow to have a schematic view of it.
Run the workflow by clicking on “start”.
o Get and verify all the results by clicking on “provenance”. The general report can be saved for later analysis. Results of each tool can be viewed or
saved by clicking on “view file”.
o The same workflow can be executed for another query by simply modifying the accession number of the protein. (Click save in the “create and
import workflows” section to temporarily save the new query)

More Related Content

PPTX
Session i overview bioinfo dm and app mmc
PPT
iEvobIO
PPT
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPTX
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PPTX
Introduction to the Proteomics Bioinformatics Course 2017
PPTX
Data Mining
PPTX
Introduction to the Proteomics Bioinformatics Course 2016
PDF
UniProt and the Semantic Web
Session i overview bioinfo dm and app mmc
iEvobIO
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PhoenixBio 2020 Stanford Workshop on PhyloGenes
Introduction to the Proteomics Bioinformatics Course 2017
Data Mining
Introduction to the Proteomics Bioinformatics Course 2016
UniProt and the Semantic Web

What's hot (20)

PPTX
Protein database
PPT
Protein database
PDF
Evolution Phylogenetic
PPTX
Proteomics repositories
PDF
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
PPT
UniProt & Ontologies
PDF
GPKB: Genomic and Proteomic Knowledge Base
PDF
Ontologies for life sciences: examples from the gene ontology
PPTX
Bioinformatics
PPTX
Bioinformatics
PPTX
Protein Data Bank
DOCX
Protein sequence databases
PPTX
Protein structure
PPT
PROTEIN STRUCTURE DATABANK
PPTX
02.databases slides
PDF
Curation Introduction - Apollo Workshop
PPTX
protein data bank
PDF
Role of genomics proteomics, and bioinformatics.
PPTX
Bioinformatics
Protein database
Protein database
Evolution Phylogenetic
Proteomics repositories
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
UniProt & Ontologies
GPKB: Genomic and Proteomic Knowledge Base
Ontologies for life sciences: examples from the gene ontology
Bioinformatics
Bioinformatics
Protein Data Bank
Protein sequence databases
Protein structure
PROTEIN STRUCTURE DATABANK
02.databases slides
Curation Introduction - Apollo Workshop
protein data bank
Role of genomics proteomics, and bioinformatics.
Bioinformatics
Ad

Viewers also liked (6)

PPTX
Session ii g1 lab genomics and gene expression mmc-corr
PPTX
Session ii g2 overview metabolic network modeling mcc
PDF
Huber brin pb1_f2_poster_2012
PPTX
Lab Gene Expression Data Analysis
PPTX
Session ii g1 overview genomics and gene expression mmc-good
PPTX
Visualization Tools
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g2 overview metabolic network modeling mcc
Huber brin pb1_f2_poster_2012
Lab Gene Expression Data Analysis
Session ii g1 overview genomics and gene expression mmc-good
Visualization Tools
Ad

Similar to Session i lab bioinfo dm and app mmc (20)

PPTX
Lab Online Molecular Tools and BioExtract Server
PDF
Bioinformatics complete manual
PPTX
Introduction to Gene Mining Part A: BLASTn-off!
DOCX
UniProt
PDF
PPTX
BioInformatics Tools -Genomics , Proteomics and metablomics
PDF
Genotype Phenotype Coupling 1st Edition Stefan Zielonka
PDF
Article
PPTX
Introduction to databases.pptx
DOCX
Major biological nucleotide databases
PPTX
Bioinformatics_1_ChenS.pptx
DOC
Practical 7 dna, rna and the flow of genetic information5
PPTX
Working with Chromosomes
DOC
Epigeneticsand methylation
PPTX
Sequencedatabases
PPTX
Bioinformatics
PPTX
Bioinformatics for beginners (exam point of view)
PDF
Principles Of Gene Manipulation 6th Edition Sandy B Primrose
DOCX
TaskDifferentiate the following terms and provide an image obtain.docx
Lab Online Molecular Tools and BioExtract Server
Bioinformatics complete manual
Introduction to Gene Mining Part A: BLASTn-off!
UniProt
BioInformatics Tools -Genomics , Proteomics and metablomics
Genotype Phenotype Coupling 1st Edition Stefan Zielonka
Article
Introduction to databases.pptx
Major biological nucleotide databases
Bioinformatics_1_ChenS.pptx
Practical 7 dna, rna and the flow of genetic information5
Working with Chromosomes
Epigeneticsand methylation
Sequencedatabases
Bioinformatics
Bioinformatics for beginners (exam point of view)
Principles Of Gene Manipulation 6th Edition Sandy B Primrose
TaskDifferentiate the following terms and provide an image obtain.docx

More from USD Bioinformatics (20)

PPTX
Clinical Application of RNA Sequencing - Bladder Cancer
PPTX
Clinical Application 1.0
PPTX
Clinical Application 2.0
PPTX
Bridge Amplification Part 2
PPTX
Bridge Amplification Part 1
PPTX
Basic Steps of the NGS Method
PPTX
True Single Molecule Sequencing
PPTX
Small Molecule Real Time Sequencing
PPTX
Sanger Dideoxy Method
PPTX
Pyrosequencing 454
PPTX
Ion Torrent Sequencing
PPTX
Next Generation Sequencing - the basics
PPTX
Illumina Sequencing
PPTX
Session ii g3 overview epidemiology modeling mmc
PPTX
Session ii g3 overview behavior science mmc
PPTX
Session ii g3 lab behavior science mmc
PPTX
Session ii g2 overview protein modeling mmc
PPTX
Session ii g2 overview chemical modeling mmc
PPTX
Session ii g2 lab modeling mmc
PDF
Swiss model evaluation
Clinical Application of RNA Sequencing - Bladder Cancer
Clinical Application 1.0
Clinical Application 2.0
Bridge Amplification Part 2
Bridge Amplification Part 1
Basic Steps of the NGS Method
True Single Molecule Sequencing
Small Molecule Real Time Sequencing
Sanger Dideoxy Method
Pyrosequencing 454
Ion Torrent Sequencing
Next Generation Sequencing - the basics
Illumina Sequencing
Session ii g3 overview epidemiology modeling mmc
Session ii g3 overview behavior science mmc
Session ii g3 lab behavior science mmc
Session ii g2 overview protein modeling mmc
Session ii g2 overview chemical modeling mmc
Session ii g2 lab modeling mmc
Swiss model evaluation

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
project resource management chapter-09.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Hybrid model detection and classification of lung cancer
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
TLE Review Electricity (Electricity).pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Heart disease approach using modified random forest and particle swarm optimi...
project resource management chapter-09.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Zenith AI: Advanced Artificial Intelligence
Hybrid model detection and classification of lung cancer
Encapsulation_ Review paper, used for researhc scholars
1 - Historical Antecedents, Social Consideration.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Group 1 Presentation -Planning and Decision Making .pptx
Web App vs Mobile App What Should You Build First.pdf
Getting Started with Data Integration: FME Form 101
TLE Review Electricity (Electricity).pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
SOPHOS-XG Firewall Administrator PPT.pptx

Session i lab bioinfo dm and app mmc

  • 1. Bioinformatics Data Manipulation: Molecular Online Tools & BioExtract Server Theme: FXN Gene and Pancreatic Cancer. Lab #1 Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013 Etienne.gnimpieba@usd.edu
  • 2. Context 0. Specification & Aims . Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty. Bioinformatics Molecular Online Tools and Server Keywords: Bio: FXN, Frataxin, pancreatic cancer, CDKN4 Math: HMM, Informatics: programing, bioinformatics tools, getting and exporting data Reduced expression of frataxin is the cause of Friedrich's ataxia (FRDA), a lethal neurodegenerative disease, how about liver cancer? Aim: The purpose of this lab is to initiate online biological exploration tools of the human model large scale data study (metabolic, proteic, genomic, …). We simulated the application on FXN gene and pancreatic cancer disease. Now we can understand how a researcher can come to identify cross biological knowledge available in data banks. Acquired skills Online and server tools: - Query biological DB (fasta, Html, txt, figure formats) - Sequence tools (protein and gene) Alignment (showalign, clustalw2), similarity, … - Manage data result (select, keep, map, export) - Build and reuse workflow Biological Hypothesis FXN on chromosome 9 Frataxin molecule structure (pymol) Pancreatic cancerPancreasanatomy ? BiologicalDB Tools Resolution Process T2. Genome exploration: Objective: Use of Ensembl to localize the FXN on the human genome and identify the genes implicate in pancreatic cancer disease. T3. Sequences manipulation Objective: Find similar sequence using BLAST tools and make an alignment on given sequences. T2.1. Locate a given gene on human genome T2.2. Get a genomic sequence from NCBI T2.3. Get the protein data and sequence from EBI T2.4. Save the export sequences data in data folder T3.1. Find similar sequences using BLAST tool T3.2. Align generated sequences with ClustalW tool T3.3. Visualized result using phylogenic tree on Jalview T5. BioExtract server Objective: used server tool to optimized data manipulation process, apply on BioExtract server. T5.1. Server Initialization T5.2. Pancreatic cancer & Frataxin (FXN) T5.3. Mapping, Alignment T5.4. Workflow save & reused T4. Protein Data and Structural Biology Knowledge Objective: To provide protein levels of frataxin study and its connection with pancreatic cancer (functional ad structural data) T1. Metabolomics Objective: Use metabolic data repository to understand the frataxin protein mechanism T1.1. Finding the Enzyme and Pathway related to Frataxin using KEGG T1.2. Finding the Reaction involved with Frataxin using Reactome T1.3. Using BRENDA for enzyme data on Frataxin T1.4. Using Collected data for Analysis T1.5. Redu the process with Pancreatic Cancer Results T4.1. Structural Knowledge on Frataxin using SBKB T4.2. Using Uniprot for Frataxin Protein Study T4.3. Protein-Protein Interaction using STRING T4.4. Using same method for Pancreatic Cancer and compare
  • 3. Data Manipulation Molecular Online Tools and BioExtract Server T1. Metabolomics Objective : Use metabolic data repository to understand the frataxin protein mechanism Theme: Frataxin (FXN) implication in the pancreatic cancer genesis T1.1. Finding the Enzyme and Pathway related to Frataxin using KEGG T1.2. Finding the Reaction involved with Frataxin using Reactome T1.3. Using BRENDA to find information on Frataxin On the Reactome website: http://www.reactome.org/ReactomeGWT/entrypoint.html o Search frataxin and select the 4th result with Frataxin in the title. This shows you the pathway model related to frataxin and how frataxin is involved in it. On the BRENDA Database website: http://www.brenda-enzymes.org/ o Search using the E.C. number obtained in T1.1 and select the result given. This website gives multitudes of information on the enzyme including the reaction, related species, and so on. At the very bottom of the webpage you can select other databases that have infromation on the same compound or protein On the KEGG Database website: http://www.genome.jp/kegg/ o Search frataxin, and select the first result under KEGG Gene Database (hsa:2395) o Copy the E.C. number given in “Definition” (EC:1.16.3.1) o In order to find the related pathway, search the E.C. number in the general KEGG Database search (click on the KEGG logo on top) o Select the result given in the KEGG Enzyme Database at the bottom. Here you can see how this enzyme is involved in the metabolism given. Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013 T1.4. Using Collected Information to Analyze the Data On the BioModels website: http://www.ebi.ac.uk/biomodels-main/ o Search using the E.C. number obtained in T1.1 and select the first result given. Here you can download the SMBL file (in student folder) for this pathway (top left corner) and analyze it in the Sematic SBML website. http://semanticsbml.org/semanticSBML/simple/index o Click on the first box “Find Similar Models” and click “Browse” and select the file you just saved from BioModels. In this website you can use multiple tools to analyze the model and compare with other models as well. T1.5. Same Process Searching for Pancreatic Cancer Results (Optional) o Use the same process searching instead for pancreatic cancer results.
  • 4. Molecular Online Tools and BioExtract Server T2. Genome Exploration Objective: Use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreatic cancer disease. Next, find an appropriate data (sequence) on FASTA format. Theme: Frataxin (FXN) implication in the pancreatic cancer genesis On the NCBI website: http://www.ncbi.nlm.nih.gov/guide/ o Pull down “All Databases” and select “Gene” database, then do a keyword search using term FXN o Click the corresponding Homo-sapiens FXN gene (first result) o Scroll down and look for the “NCBI Reference Sequences” title and go to subtitle “mRNA and Proteins” o Click on the corresponding accession number of the first transcript variant (NM_000144.4) o Get the same sequence in FASTA format by clicking on “FASTA” link o Click Send on the top right in blue, select complete record, file, FASTA, and Create File – then save in student folder if possible (will save in downloads automatically) T2.1. Locate a given gene on human genome T2.2. Get a genomic sequence from NCBI (42 DataBases) The common protein name for FXN is Frataxin On the EBI website: http://www.ebi.ac.uk/ o Type “FXN” in the search and click on “find” o Select the Homo Sapien Frataxin to get all the information about the protein (function, domains, structure, gene expression..) o Don’t close the window T2.3. Get the protein information and sequence from EBI On the Ensembl web site http://uswest.ensembl.org/index.html o Select our species "human“ o Do a keyword search using the term "FXN“ o Follow the link of the “Gene” drop down feature o Click the link for “Location” o Export this gene by clicking “Export data” (left side bar) in html file as a FASTA sequence. o Click Next o Click the “HTML” link o Do the same process by searching for “pancreatic cancer”. When you find the list of genes, select the CDKN2A gene Data Manipulation Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 5. Data Manipulation Molecular Online Tools and BioExtract Server T3. Sequences Manipulation Objective : Find similar sequence using BLAST tools and make alignment on given sequences. Theme: Frataxin (FXN) implication in the pancreatic cancer genesis T3.1. Find similar sequences using BLAST tool T3.2. Align generated sequences with ClustalW tool o Select about 10 different species then click on “Align” at the bottom of the screen. Selected sequences will be directly inserted in ClustalW tool and the tool will run automatically. o From the right menu, it is possible to select similarities, polar residues, aromatic residues, etc. if interested… o Through the same page you may add further sequences to the same alignment if needed. You can also access the phylogenetic tree. More details about the residues and the distances can be obtained by clicking on “Jalview” on the top right in orange. (May have to open Jalview manually) o In Jalview, click “file”, “add sequences”, “from file”, then select the sequence file you save earlier. o Continuing from Task T2.3, select the “Protein” tab on the left and select “view sequence in Uniprot” o You can get the Fasta format of the protein by clicking on “fasta” in the top right o Go back to previous page (using browser’s back button) and check the box next to the first sequence under “Sequences” title. o Select the “Blast” tool in the drop down menu then click on “Go” . o The best matched sequences will appear on the first page (green indicates a better match). To see other sequences you can click on next. Blast parameters can be modified by clicking on “Options” at the top Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013
  • 6. Data Manipulation Molecular Online Tools and BioExtract Server T4. Protein Data and Structure Data Objective : To provide protein levels of frataxin study and its connection with pancreatic cancer (functional ad structural data) Theme: Frataxin (FXN) implication in the pancreatic cancer genesis T4.1. Structural Knowledge on Frataxin using SBKB T4.2. Using Uniprot for Frataxin Protein Study T4.3. Protein-Protein Interaction using STRING On Uniprot Database: http://www.uniprot.org/ o Search frataxin and select the first 3 results given and click “Download” in top right. You can then “Open” or “Download” any of the results given On the STRING Database: http://string-db.org/ o Search under “search by name” “FXN”. oSelect the first result given and click “Continue”. Here you can look at the Protein-Protein Interaction model and obtain more information on a given protein or integration by clicking on it in the model, as well as use many other useful tools. On Systems Biology Knowledgebase (SBKB): http://www.sbkb.org/ o Select “by text” (options on left) and search “frataxin”. o For our example select the link next to “Structures and annotations…” Here you can obtain information on all the different hits such a structure by looking under all the given tabs. Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013 T4.4. Using same method for Pancreatic Cancer and compare o Go back to the STRING Database home page search under “multiple names” “frataxin” and “pancreatic cancer”. Select the first result. oSelect all three results given and click “Continue”. Here it shows the 3 proteins we have selected, however there are no interaction shown between them in this database. o Can widen the given result by change our search for cancer in general.
  • 7. o (If previous step was skipped, then this step is skipped as well) Again go to the query tab and search “FXN”. Search and select a few listings. Export them as done in T5.2 Go to the tools tab. o Select similarity search tools, then select “blastp”. Select “use records on extract page formatted as “Fasta”. Under "choose search set" select the database "swissprot" o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keep only selected records.” Again export the records. o Go to the tools tab again, select “iPlant”, then “clustal w2”. Select “use records on extract page formatted as “Fasta”. Your 10 protein sequences will be automatically incorporated as an input in clustalw2 tool. Execute the tool. Use the pull down for “Search Results” and select “clustalw2.fa” before viewing the results. Data Manipulation Molecular Online Tools and BioExtract Server Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24th 2013 T5. Bioextract Server Objective : Use Workflow Management Systems (WMS) to optimized data manipulation processes (BioExtract server). Theme: Frataxin (FXN) implication in the pancreatic cancer genesis T5.4. Workflow save & reused http://bioextract.orgT5.1. Server Initialization T5.2. Pancreatic cancer & Frataxin (FXN) data T5.3. Mapping, Alignment o Register on BioExtract Server to be able to create and save your own workflows. o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.” o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click on “save records”. o Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”. Click on “Add Seach Line” and select “Species” and type “Human”. Submit the query. o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapien Frataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results can be saved or extracted in Fasta or txt format (Export the records in FASTA format) o Click to the "tools" tab. then click on “Alignment Tools”, and “showalign”. Select “Use records on extract page formatted in Fasta”. o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking on “view results”. o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ” (Optional) o Go back to the “workflow” tab and click “create and import workflows”. Write a name and a description for your workflow then click on Save. All the previous steps will be saved in this workflow. o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of the workflow to have a schematic view of it. Run the workflow by clicking on “start”. o Get and verify all the results by clicking on “provenance”. The general report can be saved for later analysis. Results of each tool can be viewed or saved by clicking on “view file”. o The same workflow can be executed for another query by simply modifying the accession number of the protein. (Click save in the “create and import workflows” section to temporarily save the new query)

Editor's Notes

  • #2: Welcome to this bioinformatics lab on data manipulation using online and server tools.As the theme, we have chosen to study of the interaction between Frataxin and pancreatic cancer.
  • #3: This is the lab template: The context is a biological context based on a real biological problem. And a given hypothesisI don’t use computer science, strong word.When you read this template, you have a different view than an informatician.You want to understand the process to build the used tools.The architecture of the systemThe algorithm implementationThe quality of the resulting dataAnd so on