Session i lab bioinfo dm and app mmc

Bioinformatics Data Manipulation:
Molecular Online Tools & BioExtract Server
Theme: FXN Gene and Pancreatic Cancer.
Lab #1
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Etienne.gnimpieba@usd.edu

Context
0. Specification & Aims
.
Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart,
spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although
its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy
production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich
ataxia begin to experience the signs and symptoms of the disorder around puberty.
Bioinformatics Molecular Online Tools and Server
Keywords:
Bio: FXN, Frataxin, pancreatic cancer, CDKN4
Math: HMM,
Informatics: programing, bioinformatics tools, getting
and exporting data
Reduced expression of frataxin is
the cause of Friedrich's ataxia
(FRDA), a lethal neurodegenerative
disease, how about liver cancer?
Aim: The purpose of this lab is to initiate online
biological exploration tools of the human model large
scale data study (metabolic, proteic, genomic, …). We
simulated the application on FXN gene and pancreatic
cancer disease. Now we can understand how a
researcher can come to identify cross biological
knowledge available in data banks.
Acquired skills
Online and server tools:
- Query biological DB (fasta, Html, txt, figure formats)
- Sequence tools (protein and gene)
Alignment (showalign, clustalw2), similarity, …
- Manage data result (select, keep, map, export)
- Build and reuse workflow
Biological Hypothesis
FXN on chromosome 9
Frataxin molecule structure (pymol)
Pancreatic cancerPancreasanatomy
?
BiologicalDB
Tools
Resolution Process
T2. Genome exploration:
Objective: Use of Ensembl to localize the FXN on the human
genome and identify the genes implicate in pancreatic cancer
disease.
T3. Sequences manipulation
Objective: Find similar sequence using BLAST tools
and make an alignment on given sequences.
T2.1. Locate a given gene on human genome
T2.2. Get a genomic sequence from NCBI
T2.3. Get the protein data and sequence from EBI
T2.4. Save the export sequences data in data folder
T3.1. Find similar sequences using BLAST tool
T3.2. Align generated sequences with ClustalW tool
T3.3. Visualized result using phylogenic tree on
Jalview
T5. BioExtract server
Objective: used server tool to optimized data
manipulation process, apply on BioExtract server.
T5.1. Server Initialization
T5.2. Pancreatic cancer & Frataxin (FXN)
T5.3. Mapping, Alignment
T5.4. Workflow save & reused
T4. Protein Data and Structural
Biology Knowledge
Objective: To provide protein levels of frataxin study
and its connection with pancreatic cancer (functional ad
structural data)
T1. Metabolomics
Objective: Use metabolic data repository to
understand the frataxin protein mechanism
T1.1. Finding the Enzyme and Pathway related to
Frataxin using KEGG
T1.2. Finding the Reaction involved with Frataxin
using Reactome
T1.3. Using BRENDA for enzyme data on Frataxin
T1.4. Using Collected data for Analysis
T1.5. Redu the process with Pancreatic Cancer
Results
T4.1. Structural Knowledge on Frataxin using
SBKB
T4.2. Using Uniprot for Frataxin Protein Study
T4.3. Protein-Protein Interaction using STRING
T4.4. Using same method for Pancreatic Cancer
and compare

Data Manipulation Molecular Online Tools and BioExtract Server
T1. Metabolomics
Objective : Use metabolic data repository to understand the frataxin protein mechanism
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T1.1. Finding the Enzyme and Pathway related to Frataxin using KEGG
T1.2. Finding the Reaction involved with Frataxin using Reactome
T1.3. Using BRENDA to find information on Frataxin
On the Reactome website: http://www.reactome.org/ReactomeGWT/entrypoint.html
o Search frataxin and select the 4th result with Frataxin in the title. This shows you the pathway model related to frataxin
and how frataxin is involved in it.
On the BRENDA Database website: http://www.brenda-enzymes.org/
o Search using the E.C. number obtained in T1.1 and select the result given. This website gives multitudes of information on
the enzyme including the reaction, related species, and so on. At the very bottom of the webpage you can select other
databases that have infromation on the same compound or protein
On the KEGG Database website: http://www.genome.jp/kegg/
o Search frataxin, and select the first result under KEGG Gene Database (hsa:2395)
o Copy the E.C. number given in “Definition” (EC:1.16.3.1)
o In order to find the related pathway, search the E.C. number in the general KEGG Database search (click on the KEGG
logo on top)
o Select the result given in the KEGG Enzyme Database at the bottom. Here you can see how this enzyme is involved in the
metabolism given.
BRIN WS 2013
T1.4. Using Collected Information to Analyze the Data
On the BioModels website: http://www.ebi.ac.uk/biomodels-main/
o Search using the E.C. number obtained in T1.1 and select the first result given. Here you can download the SMBL file (in
student folder) for this pathway (top left corner) and analyze it in the Sematic SBML website.
http://semanticsbml.org/semanticSBML/simple/index
o Click on the first box “Find Similar Models” and click “Browse” and select the file you just saved from BioModels. In this
website you can use multiple tools to analyze the model and compare with other models as well.
T1.5. Same Process Searching for Pancreatic Cancer Results (Optional)
o Use the same process searching instead for pancreatic cancer results.

Molecular Online Tools and BioExtract Server
T2. Genome Exploration
Objective: Use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreatic
cancer disease. Next, find an appropriate data (sequence) on FASTA format.
On the NCBI website: http://www.ncbi.nlm.nih.gov/guide/
o Pull down “All Databases” and select “Gene” database, then do a keyword search using term FXN
o Click the corresponding Homo-sapiens FXN gene (first result)
o Scroll down and look for the “NCBI Reference Sequences” title and go to subtitle “mRNA and Proteins”
o Click on the corresponding accession number of the first transcript variant (NM_000144.4)
o Get the same sequence in FASTA format by clicking on “FASTA” link
o Click Send on the top right in blue, select complete record, file, FASTA, and Create File – then save in
student folder if possible (will save in downloads automatically)
T2.1. Locate a given gene on human genome
T2.2. Get a genomic sequence from NCBI (42 DataBases)
The common protein name for FXN is Frataxin
On the EBI website: http://www.ebi.ac.uk/
o Type “FXN” in the search and click on “find”
o Select the Homo Sapien Frataxin to get all the information about the protein (function, domains, structure, gene expression..)
o Don’t close the window
T2.3. Get the protein information and sequence from EBI
On the Ensembl web site http://uswest.ensembl.org/index.html
o Select our species "human“
o Do a keyword search using the term "FXN“
o Follow the link of the “Gene” drop down feature
o Click the link for “Location”
o Export this gene by clicking “Export data” (left side bar) in html file as a FASTA sequence.
o Click Next
o Click the “HTML” link
o Do the same process by searching for “pancreatic cancer”. When you find the list of genes, select the CDKN2A gene
Data Manipulation
BRIN WS 2013

T3. Sequences Manipulation
Objective : Find similar sequence using BLAST tools and make alignment on given sequences.
T3.1. Find similar sequences using BLAST tool
T3.2. Align generated sequences with ClustalW tool
o Select about 10 different species then click on “Align” at the bottom of the screen. Selected sequences will be
directly inserted in ClustalW tool and the tool will run automatically.
o From the right menu, it is possible to select similarities, polar residues, aromatic residues, etc. if interested…
o Through the same page you may add further sequences to the same alignment if needed. You can also access
the phylogenetic tree. More details about the residues and the distances can be obtained by clicking on
“Jalview” on the top right in orange. (May have to open Jalview manually)
o In Jalview, click “file”, “add sequences”, “from file”, then select the sequence file you save earlier.
o Continuing from Task T2.3, select the “Protein” tab on the left and select “view sequence in Uniprot”
o You can get the Fasta format of the protein by clicking on “fasta” in the top right
o Go back to previous page (using browser’s back button) and check the box next to the first sequence under
“Sequences” title.
o Select the “Blast” tool in the drop down menu then click on “Go” .
o The best matched sequences will appear on the first page (green indicates a better match). To see other
sequences you can click on next. Blast parameters can be modified by clicking on “Options” at the top
BRIN WS 2013

T4. Protein Data and Structure Data
Objective : To provide protein levels of frataxin study and its connection with pancreatic cancer
(functional ad structural data)
T4.1. Structural Knowledge on Frataxin using SBKB
T4.2. Using Uniprot for Frataxin Protein Study
T4.3. Protein-Protein Interaction using STRING
On Uniprot Database: http://www.uniprot.org/
o Search frataxin and select the first 3 results given and click “Download” in top right. You can then
“Open” or “Download” any of the results given
On the STRING Database: http://string-db.org/
o Search under “search by name” “FXN”.
oSelect the first result given and click “Continue”. Here you can look at the Protein-Protein
Interaction model and obtain more information on a given protein or integration by clicking on it
in the model, as well as use many other useful tools.
On Systems Biology Knowledgebase (SBKB): http://www.sbkb.org/
o Select “by text” (options on left) and search “frataxin”.
o For our example select the link next to “Structures and annotations…” Here you can obtain information
on all the different hits such a structure by looking under all the given tabs.
BRIN WS 2013
T4.4. Using same method for Pancreatic Cancer and compare
o Go back to the STRING Database home page search under “multiple names” “frataxin” and
“pancreatic cancer”. Select the first result.
oSelect all three results given and click “Continue”. Here it shows the 3 proteins we have
selected, however there are no interaction shown between them in this database.
o Can widen the given result by change our search for cancer in general.

o (If previous step was skipped, then this step is skipped as well) Again go to the query tab and search “FXN”. Search and select a few listings.
Export them as done in T5.2 Go to the tools tab.
o Select similarity search tools, then select “blastp”. Select “use records on extract page formatted as “Fasta”. Under "choose search set" select the
database "swissprot"
o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keep
only selected records.” Again export the records.
o Go to the tools tab again, select “iPlant”, then “clustal w2”. Select “use records on extract page formatted as “Fasta”. Your 10 protein sequences
will be automatically incorporated as an input in clustalw2 tool. Execute the tool. Use the pull down for “Search Results” and select “clustalw2.fa”
before viewing the results.
BRIN WS 2013
T5. Bioextract Server
Objective : Use Workflow Management Systems (WMS) to optimized data manipulation processes (BioExtract server).
T5.4. Workflow save & reused
http://bioextract.orgT5.1. Server Initialization
T5.2. Pancreatic cancer & Frataxin (FXN) data
T5.3. Mapping, Alignment
o Register on BioExtract Server to be able to create and save your own workflows.
o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.”
o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click on “save records”.
o Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”. Click
on “Add Seach Line” and select “Species” and type “Human”. Submit the query.
o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapien
Frataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results can
be saved or extracted in Fasta or txt format (Export the records in FASTA format)
o Click to the "tools" tab. then click on “Alignment Tools”, and “showalign”. Select “Use records on extract page formatted in Fasta”.
o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking on “view results”.
o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ” (Optional)
o Go back to the “workflow” tab and click “create and import workflows”. Write a name and a description for your workflow then click on Save. All
the previous steps will be saved in this workflow.
o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of the workflow to have a schematic view of it.
Run the workflow by clicking on “start”.
o Get and verify all the results by clicking on “provenance”. The general report can be saved for later analysis. Results of each tool can be viewed or
saved by clicking on “view file”.
o The same workflow can be executed for another query by simply modifying the accession number of the protein. (Click save in the “create and
import workflows” section to temporarily save the new query)

Session i lab bioinfo dm and app mmc

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Session i lab bioinfo dm and app mmc (20)

More from USD Bioinformatics (20)

Recently uploaded (20)

Session i lab bioinfo dm and app mmc

Editor's Notes