SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 20-22
www.iosrjournals.org
DOI: 10.9790/0661-1803012022 www.iosrjournals.org 20 | Page
Computational Analysis of Sequences to Determine Expectation
Value Commonly Used in Bioinformatics Database.
.Uma Kumari1
,Ashok Kumar Choudhary2
1
Department of Biotechnology, Jharkhand Rai University,Ranchi-835222,Jharkhand,India.
2
Department of Botany,Ranchi University,Ranchi-834008,Jharkhand,India.
Abstract: Solanum lycopersicum economically important crop world wide, intensively investigated and
model system for genetic studies in plant ,variability is a measure spread of data set. Genome analysis and
annotation using genome from the libraries, automatic annotation using the blast (basic local alignment
search tools )low complexity sequence have unusual composition that can create a problem in sequence
similarity searching the color bars in the graphic summarize the BLAST tools. Blast have been developed to
provided the sequence in the form of customized data extraction utilities for some of customized data
extraction utilities for some of commonly used database such as NCBI,FASTA ,BLAST,ORF ,NEB Cutter. Blast
has a bioinformatics algorithm to align sequences as if they were found in the database search .when expect -
value is increases from default value ,a larger list with more scoring hits can be reported. Ncbi thus provided a
common data extraction platform using sequence analysis when decrease exponentially as the score of the
match increased .when desired subset of the data compiled using Blast can be subsequently used for observed
the expectation value to analyses and knowledge discovery.
Keywords: Bioinformatics ---Biological database--- Customized data retrieved—Sequence analysis—Data
compiled---Expectation Value.
I. Introduction
The data available for biological system are diverse in nature and include various types such as
sequences,structures,expression data,interaction,pathway,system data.The rate at which the data are generated
has increased exponentially due to technological advances in field of genomics,transcriptomics,
proteomics,structural genomics, system biology etc.As the biological data constitutes an important component
of big data it demands to genome sequencing to curate ,compile,organize,archive,and query and analyse the
sequences( Higgins Des,Taylor Willie 2000). Various types of primary data along with annotation continue to
be useful for processing, analysis, interpretation of data so as to generate higher order information and
knowledge. The existing databases archiving molecular data and built around the focused them and after
relevant annotations. The tools facilitate analysis of the data and integration of diverse data type are therefore
the need of the hour. A number of online as well as offline tools and server are available for accessing and
retrieving large amount of data from public domain resources.NCBI E-utilities ,for an instance provide
customized data utilities for various databases available at NCBI.These utilities require generation of
URLs(Utility web services (http://www.cbi.nlm.nih.gov) can be generated for the user in a format specific for
respective database either by manually or by writing scripts. NCBI computational biology branch focus on
theoretical, analytical and applied computational approaches to a broad range of fundamental problem in
molecular biology. A sequence in FASTA format is represented as a series of lines, each of which should no
longer then 120 character and usually do not exceed 80 character. Although, multiple URL can be generated. As
the initial step of reach this goal, Casey, R. M. (2005). NCBI, BLAST, FASTA tools have been developed to
make the existing search utilities more effective and productive towards computational analysis of sequence.
II. Method
Computational analysis of sequence alignment computer programming for bioinformatics and data
management .NCBI focuses on theoretical, analytical and applied computational approaches and widely used
primary database such as European nucleotide archive, Uniport kb/Swiss –prot,a widely used method for
assignment of secondary structure. A fasta sequence alignment software package used to functional and
evolutionary relationship between sequences .Operating system—UNIX, LINUX,Ms-Windows.
2.1 Database And Corresponding Webservices
Database name Web services type: URL
NCBI E—Utility web services (http://www.cbi.nlm.nih.gov
BLAST www.ebi.ac.uk/tools/sss/ncbiblast
FASTA www.ebi.ac.uk/tools
Computational Analysis Of Sequences To Determining Expectation Value Commonly Used In
DOI: 10.9790/0661-1803012022 www.iosrjournals.org 21 | Page
EMBL/EBI EMBL-EBI web services (http://www.ebi.ac.uk/tools/
Uniprot KB Programmatic access services (http://www.uniprot.org)
III. Results And Discussion
Searching and browsing the database and generating curated datasets is an essential for processing
analysis and interpretation.NCBI,FASTA,BLAST,this need for searching subset of sequence/data from few
widely used database providing e-value in blast.occuring bychance with the observed the score/high score in E-
Value.NCBI-BLAST provide a platform with a user-friendly interface.some of the common features of all
utilities computational analysis of the sequence by using NCBI,BLAST,and FASTA.All utilities support data
from single entry as well as multiple entries. The data is exchanged among these database on the daily basis
.The Ncbi houses a series of databases relevant to bioinformatics tools and services. Major sequence include
gene bank for dna sequences and pubmed. Epigenomic database of the ncbi (National center of biotechnology
information) at NIH (National institute of health) means to collect the maps of epigenetic modification and the
occurrence across the human genome. List of accession number may be provided in an interactive mode of a
uploading a text file.
>gi|1002623395|ref|NM_001320673.1| Solanum lycopersicum cysteine proteinase inhibitor A (LOC543632),
mRNA
GCTTTAATCAAACGCGCTCCATTAAATTCGTTGATTGTGACTGACTATTCTTCTTCTTCTTCTTATAT
AT
CTCAAAAACCCCATTTACAGAGACTCAAAAATGGCGACATTAGGAGGAATTCGTGAAGCTGGAGG
ATCAG
AAAACAGCCTAGAGATCAACGATCTTGCTCGTTTTGCTGTTGATGAACACAATAAGAAACAGAAT
GCTCT
TTTGGAGTTTGGAAAGGTTGTGAATGTGAAGGAACAAGTGGTTGCTGGAACCATGTACTACATAA
CACTA
GAGGCGACTGAAGGTGGTAAGAAGAAAGCATACGAAGCCAAAGTCTGGGTGAAGCCATGGCAGA
ACTTCA
AGCAAGTTGAAGACTTCAAGCTTATTGGGGATGCTGCTACTGCTTAACAAGCGCTGAACGATGTAT
GACT
CTTATGTCCTGAAAATAAAGCTAAACATATTTTAGCTTGTTCGTATTTGAATATCATAAAGTAAGTT
CAT
AACTCTATCGTGGATCTAAATTACGGATAACTATAGCTTTACAACGTTCCTTTTTCGTTCTATGCTC
TTA
TCTTATATACGATTTTGCTTTTCTGTTGCTAATAATATCTGAGAAACACAAGC
(nucleotide sequence)
(Sources—Fasta sequence related to solanum lycopersicum retrived from NCBI.)
3.1 Advanced Features Of Sequence Analysis
The features table block is an important section in a nucleotide sequence entry .the sequence analysis of
database accession number and cross linking is carried out by using the base URLs to relevant entries.
An e-value of 1e-3 is annotate that there is a 0.001 chance that alignment would exist in the database by
chance..if the database 610 sequence, then might expect that alignment occur may be 7 times. the score is
measure of similarity between the sequences. It is a statical calculation based on the quality of alignment
obtained from one database. These soft links could be dynamically established using properties such as
homology, structural, functional similarities, membership to a certain biological process etc. An e-value of 1e-3
is saying that there is a 0.001 chance that that alignment would exist in the database by chance, that is, if the
database contains 10000 sequences. An e-value of 0 is actually a rounded down probability (maybe 1e-250 or
something), and is simply saying that there is (almost) no chance that alignment can occur by chance.
IV. Conclusion
Analysis of the sequence has been developed with the objective of providing a single platform for
customizable data from the some of the major biological database .E value is increased from default value,
larger list with more low scoring hits can be reported based on quality of alignment (the score) and size of the
database by applying the sequence alignment method and bioinformatics tools.. The closer the E-value is
towards 0, the better the alignment.
V. Acknowledgement
We extended our sincere thanks to Dr.Savita ―vice chancellor ―of Jharkhand rai university, Ranchi, India for kindly providing me the
platform to carry out the research.
Computational Analysis Of Sequences To Determining Expectation Value Commonly Used In
DOI: 10.9790/0661-1803012022 www.iosrjournals.org 22 | Page
References
[1]. Baxevanis D.Andreas,Quellete Fracis B.F. ,A Practical guide to the Analysis of gene and Proteins.,3rd
Eddition October
2004,Published by Wiley, john and Sons
[2]. Brudno M, Malde2.S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S (2003)."Glocal alignment: finding rearrangements
during alignment". Bioinformatics. 19. Suppl 1 (90001): i54–62. doi:10.1093/bioinformatics/btg1005. PMID 12855437.
[3]. Casey, R. M. (2005). "BLAST Sequences Aid in Genomics and Proteomics". Business Intelligence Network.
[4]. Eddy SR; Rost, Burkhard (2008). Rost, Burkhard, ed. "A probabilistic model of local sequence alignment that simplifies statistical
significance estimation". PLoS Comput Biol4 (5): e1000069. doi:10.1371/journal.pcbi.1000069. PMC 2396288.PMID 18516236.
[5]. Higgins Des,Taylor Willie 2000,Bioinformatics:Sequence structure and database practical approach‖ 1st
Eddition october
2000,Published by oxford University Press.
[6]. Jean-Michel Claverie & Cedric Notredame, Bioinformatics for Dummies, Wiley Publishing
[7]. Lipman, DJ; Pearson, WR (1985). "Rapid and sensitive protein similarity searches". Science 227 (4693): 1435–
41. doi:10.1126/science.2983426. PMID 2983426.
[8]. Mount .David 2004,Bioinformatics:-sequence $ Genome Analysis‖, published by Cold spring Harbour laboratory press.
[9]. Oehmen, C.; Nieplocha, J. (2006). "ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive
Bioinformatics Analysis". IEEE Transactions on Parallel and Distributed Systems 17 (8): 740. doi:10.1109/TPDS.2006.112.
[10]. "Program Selection Tables of the Blast NCBI web site".
[11]. Pearson, WR; Lipman, DJ (1988). "Improved tools for biological sequence comparisons". Proceedings of the National Academy of
Sciences of the United States of America 85 (8): 2444–8.doi:10.1073/pnas.85.8.2444. PMC 280013. PMID 3162770.
[12]. Rick CM.Yader JT,Annu Rev Genet 1988 Classical and Molecular Genetics of tomato.
[13]. http://www.ncbi.nlm.nih.gov/BLAST/full_options.html
[14]. Scott Jw,Harbaugh Bk:Micro Tom: A miniature dwarf tomato. Florid Agar Experiment 1989.
[15]. Taylor Willie, Higgins Des 2000,Bioinformatics :Sequence structure and database practical approach ―,1st
Edition October 2000
,Published by Oxford university press.
[16]. Whitworth, W.A. (1901) Choice and Chance with One Thousand Exercises. Fifth edition. Deighton Bell, Cambridge. [Reprinted by
Hafner Publishing Co., New York, 1959.]
[17]. Zhao,K.;Chu,X.(2014). "G-
BLASTN:acceleratingnucleotidealignmentbygraphicsprocessors". Bioinformatics 30 (10):138491. doi:10.1093/bioinformatics/btu0
47.PMID 24463183.

More Related Content

PDF
H1803014347
PDF
K1803015864
PDF
IRJET- Retrieval of Images & Text using Data Mining Techniques
PDF
F1803013034
PDF
[IJET-V1I3P11] Authors : Hemangi Bhalekar, Swati Kumbhar, Hiral Mewada, Prati...
PDF
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
PDF
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
PDF
A unified approach for spatial data query
H1803014347
K1803015864
IRJET- Retrieval of Images & Text using Data Mining Techniques
F1803013034
[IJET-V1I3P11] Authors : Hemangi Bhalekar, Swati Kumbhar, Hiral Mewada, Prati...
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
A unified approach for spatial data query

What's hot (20)

PDF
Evaluating the efficiency of rule techniques for file classification
PDF
Predicting students' performance using id3 and c4.5 classification algorithms
PDF
Fn3110961103
PDF
Vertical intent prediction approach based on Doc2vec and convolutional neural...
PDF
IRJET - Encoded Polymorphic Aspect of Clustering
PDF
Evaluating the efficiency of rule techniques for file
PDF
Applying Soft Computing Techniques in Information Retrieval
PDF
O1803017981
PDF
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
PDF
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
PDF
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
PDF
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
PDF
IRJET- A Review of Data Cleaning and its Current Approaches
PDF
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...
PDF
1105.1950
PDF
Optimised Kd-Tree Approach with Dimension Reduction for Efficient Indexing an...
PDF
Generic Algorithm based Data Retrieval Technique in Data Mining
PDF
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
PPTX
Introduction to Data warehousiing and Mining
Evaluating the efficiency of rule techniques for file classification
Predicting students' performance using id3 and c4.5 classification algorithms
Fn3110961103
Vertical intent prediction approach based on Doc2vec and convolutional neural...
IRJET - Encoded Polymorphic Aspect of Clustering
Evaluating the efficiency of rule techniques for file
Applying Soft Computing Techniques in Information Retrieval
O1803017981
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
IRJET- A Review of Data Cleaning and its Current Approaches
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...
1105.1950
Optimised Kd-Tree Approach with Dimension Reduction for Efficient Indexing an...
Generic Algorithm based Data Retrieval Technique in Data Mining
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
Introduction to Data warehousiing and Mining
Ad

Viewers also liked (17)

PPT
Ian Dewson Enjoys Golfing
PPTX
Design patterns for scaling web applications
PDF
SpiritIT_mCT_Petrolchimico_Final_rev1
PDF
Μουσείο και ψηφιακές τεχνολογίες (εφαρμογή)
PPT
Ian dewson marries at lavish parkland country club
PPTX
Top 8 quality assurance engineer resume samples
PDF
Kiley Julia 4.4
PPTX
Raffaella 10 luglio
PPTX
Il mestiere del traduttore editoriale
PDF
Những trải nghiệm không nên bỏ lỡ khi đến phan thiết (p2)
PDF
Marsh Analytics - CFO com
PPTX
Weather2020 Forecast School Chapter 1
PPTX
20150801 新人秘笈講座 slideshare
PPTX
Vishal_Final_Evaluation_PPT
PDF
F017524044
PPTX
Aviaries
Ian Dewson Enjoys Golfing
Design patterns for scaling web applications
SpiritIT_mCT_Petrolchimico_Final_rev1
Μουσείο και ψηφιακές τεχνολογίες (εφαρμογή)
Ian dewson marries at lavish parkland country club
Top 8 quality assurance engineer resume samples
Kiley Julia 4.4
Raffaella 10 luglio
Il mestiere del traduttore editoriale
Những trải nghiệm không nên bỏ lỡ khi đến phan thiết (p2)
Marsh Analytics - CFO com
Weather2020 Forecast School Chapter 1
20150801 新人秘笈講座 slideshare
Vishal_Final_Evaluation_PPT
F017524044
Aviaries
Ad

Similar to D1803012022 (20)

PDF
Bioinformatics مي.pdf
PPT
Bioinformatics MiRON
PDF
Bioinformatics seminar
PPTX
Informal presentation on bioinformatics
PDF
database retrival.pdf
PDF
Different Modes of Data Storage and Phylogenetic Analysis
PPTX
Gen bank databases
PPTX
bioinformatics presentation in the master presentation
PPTX
Bioinformaatics for M.Sc. Biotecchnology.pptx
PPTX
blast bioinformatics
PPTX
Bioinformatics_1_ChenS.pptx
PPTX
Bioinformatics Final Presentation
PPT
Introduction to Bioinformatics and DatabasesDay1.ppt
PDF
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
PDF
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
PDF
BITS: Basics of sequence databases
PPTX
Biological Databases | Access to sequence data and related information
PDF
Article
PPTX
BioInformatics Tools -Genomics , Proteomics and metablomics
PPT
Basic Local Alignment Tool (BLAST) bioinformatics
Bioinformatics مي.pdf
Bioinformatics MiRON
Bioinformatics seminar
Informal presentation on bioinformatics
database retrival.pdf
Different Modes of Data Storage and Phylogenetic Analysis
Gen bank databases
bioinformatics presentation in the master presentation
Bioinformaatics for M.Sc. Biotecchnology.pptx
blast bioinformatics
Bioinformatics_1_ChenS.pptx
Bioinformatics Final Presentation
Introduction to Bioinformatics and DatabasesDay1.ppt
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
BITS: Basics of sequence databases
Biological Databases | Access to sequence data and related information
Article
BioInformatics Tools -Genomics , Proteomics and metablomics
Basic Local Alignment Tool (BLAST) bioinformatics

More from IOSR Journals (20)

PDF
A011140104
PDF
M0111397100
PDF
L011138596
PDF
K011138084
PDF
J011137479
PDF
I011136673
PDF
G011134454
PDF
H011135565
PDF
F011134043
PDF
E011133639
PDF
D011132635
PDF
C011131925
PDF
B011130918
PDF
A011130108
PDF
I011125160
PDF
H011124050
PDF
G011123539
PDF
F011123134
PDF
E011122530
PDF
D011121524
A011140104
M0111397100
L011138596
K011138084
J011137479
I011136673
G011134454
H011135565
F011134043
E011133639
D011132635
C011131925
B011130918
A011130108
I011125160
H011124050
G011123539
F011123134
E011122530
D011121524

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
A Presentation on Artificial Intelligence
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Review of recent advances in non-invasive hemoglobin estimation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
A Presentation on Artificial Intelligence
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

D1803012022

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 20-22 www.iosrjournals.org DOI: 10.9790/0661-1803012022 www.iosrjournals.org 20 | Page Computational Analysis of Sequences to Determine Expectation Value Commonly Used in Bioinformatics Database. .Uma Kumari1 ,Ashok Kumar Choudhary2 1 Department of Biotechnology, Jharkhand Rai University,Ranchi-835222,Jharkhand,India. 2 Department of Botany,Ranchi University,Ranchi-834008,Jharkhand,India. Abstract: Solanum lycopersicum economically important crop world wide, intensively investigated and model system for genetic studies in plant ,variability is a measure spread of data set. Genome analysis and annotation using genome from the libraries, automatic annotation using the blast (basic local alignment search tools )low complexity sequence have unusual composition that can create a problem in sequence similarity searching the color bars in the graphic summarize the BLAST tools. Blast have been developed to provided the sequence in the form of customized data extraction utilities for some of customized data extraction utilities for some of commonly used database such as NCBI,FASTA ,BLAST,ORF ,NEB Cutter. Blast has a bioinformatics algorithm to align sequences as if they were found in the database search .when expect - value is increases from default value ,a larger list with more scoring hits can be reported. Ncbi thus provided a common data extraction platform using sequence analysis when decrease exponentially as the score of the match increased .when desired subset of the data compiled using Blast can be subsequently used for observed the expectation value to analyses and knowledge discovery. Keywords: Bioinformatics ---Biological database--- Customized data retrieved—Sequence analysis—Data compiled---Expectation Value. I. Introduction The data available for biological system are diverse in nature and include various types such as sequences,structures,expression data,interaction,pathway,system data.The rate at which the data are generated has increased exponentially due to technological advances in field of genomics,transcriptomics, proteomics,structural genomics, system biology etc.As the biological data constitutes an important component of big data it demands to genome sequencing to curate ,compile,organize,archive,and query and analyse the sequences( Higgins Des,Taylor Willie 2000). Various types of primary data along with annotation continue to be useful for processing, analysis, interpretation of data so as to generate higher order information and knowledge. The existing databases archiving molecular data and built around the focused them and after relevant annotations. The tools facilitate analysis of the data and integration of diverse data type are therefore the need of the hour. A number of online as well as offline tools and server are available for accessing and retrieving large amount of data from public domain resources.NCBI E-utilities ,for an instance provide customized data utilities for various databases available at NCBI.These utilities require generation of URLs(Utility web services (http://www.cbi.nlm.nih.gov) can be generated for the user in a format specific for respective database either by manually or by writing scripts. NCBI computational biology branch focus on theoretical, analytical and applied computational approaches to a broad range of fundamental problem in molecular biology. A sequence in FASTA format is represented as a series of lines, each of which should no longer then 120 character and usually do not exceed 80 character. Although, multiple URL can be generated. As the initial step of reach this goal, Casey, R. M. (2005). NCBI, BLAST, FASTA tools have been developed to make the existing search utilities more effective and productive towards computational analysis of sequence. II. Method Computational analysis of sequence alignment computer programming for bioinformatics and data management .NCBI focuses on theoretical, analytical and applied computational approaches and widely used primary database such as European nucleotide archive, Uniport kb/Swiss –prot,a widely used method for assignment of secondary structure. A fasta sequence alignment software package used to functional and evolutionary relationship between sequences .Operating system—UNIX, LINUX,Ms-Windows. 2.1 Database And Corresponding Webservices Database name Web services type: URL NCBI E—Utility web services (http://www.cbi.nlm.nih.gov BLAST www.ebi.ac.uk/tools/sss/ncbiblast FASTA www.ebi.ac.uk/tools
  • 2. Computational Analysis Of Sequences To Determining Expectation Value Commonly Used In DOI: 10.9790/0661-1803012022 www.iosrjournals.org 21 | Page EMBL/EBI EMBL-EBI web services (http://www.ebi.ac.uk/tools/ Uniprot KB Programmatic access services (http://www.uniprot.org) III. Results And Discussion Searching and browsing the database and generating curated datasets is an essential for processing analysis and interpretation.NCBI,FASTA,BLAST,this need for searching subset of sequence/data from few widely used database providing e-value in blast.occuring bychance with the observed the score/high score in E- Value.NCBI-BLAST provide a platform with a user-friendly interface.some of the common features of all utilities computational analysis of the sequence by using NCBI,BLAST,and FASTA.All utilities support data from single entry as well as multiple entries. The data is exchanged among these database on the daily basis .The Ncbi houses a series of databases relevant to bioinformatics tools and services. Major sequence include gene bank for dna sequences and pubmed. Epigenomic database of the ncbi (National center of biotechnology information) at NIH (National institute of health) means to collect the maps of epigenetic modification and the occurrence across the human genome. List of accession number may be provided in an interactive mode of a uploading a text file. >gi|1002623395|ref|NM_001320673.1| Solanum lycopersicum cysteine proteinase inhibitor A (LOC543632), mRNA GCTTTAATCAAACGCGCTCCATTAAATTCGTTGATTGTGACTGACTATTCTTCTTCTTCTTCTTATAT AT CTCAAAAACCCCATTTACAGAGACTCAAAAATGGCGACATTAGGAGGAATTCGTGAAGCTGGAGG ATCAG AAAACAGCCTAGAGATCAACGATCTTGCTCGTTTTGCTGTTGATGAACACAATAAGAAACAGAAT GCTCT TTTGGAGTTTGGAAAGGTTGTGAATGTGAAGGAACAAGTGGTTGCTGGAACCATGTACTACATAA CACTA GAGGCGACTGAAGGTGGTAAGAAGAAAGCATACGAAGCCAAAGTCTGGGTGAAGCCATGGCAGA ACTTCA AGCAAGTTGAAGACTTCAAGCTTATTGGGGATGCTGCTACTGCTTAACAAGCGCTGAACGATGTAT GACT CTTATGTCCTGAAAATAAAGCTAAACATATTTTAGCTTGTTCGTATTTGAATATCATAAAGTAAGTT CAT AACTCTATCGTGGATCTAAATTACGGATAACTATAGCTTTACAACGTTCCTTTTTCGTTCTATGCTC TTA TCTTATATACGATTTTGCTTTTCTGTTGCTAATAATATCTGAGAAACACAAGC (nucleotide sequence) (Sources—Fasta sequence related to solanum lycopersicum retrived from NCBI.) 3.1 Advanced Features Of Sequence Analysis The features table block is an important section in a nucleotide sequence entry .the sequence analysis of database accession number and cross linking is carried out by using the base URLs to relevant entries. An e-value of 1e-3 is annotate that there is a 0.001 chance that alignment would exist in the database by chance..if the database 610 sequence, then might expect that alignment occur may be 7 times. the score is measure of similarity between the sequences. It is a statical calculation based on the quality of alignment obtained from one database. These soft links could be dynamically established using properties such as homology, structural, functional similarities, membership to a certain biological process etc. An e-value of 1e-3 is saying that there is a 0.001 chance that that alignment would exist in the database by chance, that is, if the database contains 10000 sequences. An e-value of 0 is actually a rounded down probability (maybe 1e-250 or something), and is simply saying that there is (almost) no chance that alignment can occur by chance. IV. Conclusion Analysis of the sequence has been developed with the objective of providing a single platform for customizable data from the some of the major biological database .E value is increased from default value, larger list with more low scoring hits can be reported based on quality of alignment (the score) and size of the database by applying the sequence alignment method and bioinformatics tools.. The closer the E-value is towards 0, the better the alignment. V. Acknowledgement We extended our sincere thanks to Dr.Savita ―vice chancellor ―of Jharkhand rai university, Ranchi, India for kindly providing me the platform to carry out the research.
  • 3. Computational Analysis Of Sequences To Determining Expectation Value Commonly Used In DOI: 10.9790/0661-1803012022 www.iosrjournals.org 22 | Page References [1]. Baxevanis D.Andreas,Quellete Fracis B.F. ,A Practical guide to the Analysis of gene and Proteins.,3rd Eddition October 2004,Published by Wiley, john and Sons [2]. Brudno M, Malde2.S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S (2003)."Glocal alignment: finding rearrangements during alignment". Bioinformatics. 19. Suppl 1 (90001): i54–62. doi:10.1093/bioinformatics/btg1005. PMID 12855437. [3]. Casey, R. M. (2005). "BLAST Sequences Aid in Genomics and Proteomics". Business Intelligence Network. [4]. Eddy SR; Rost, Burkhard (2008). Rost, Burkhard, ed. "A probabilistic model of local sequence alignment that simplifies statistical significance estimation". PLoS Comput Biol4 (5): e1000069. doi:10.1371/journal.pcbi.1000069. PMC 2396288.PMID 18516236. [5]. Higgins Des,Taylor Willie 2000,Bioinformatics:Sequence structure and database practical approach‖ 1st Eddition october 2000,Published by oxford University Press. [6]. Jean-Michel Claverie & Cedric Notredame, Bioinformatics for Dummies, Wiley Publishing [7]. Lipman, DJ; Pearson, WR (1985). "Rapid and sensitive protein similarity searches". Science 227 (4693): 1435– 41. doi:10.1126/science.2983426. PMID 2983426. [8]. Mount .David 2004,Bioinformatics:-sequence $ Genome Analysis‖, published by Cold spring Harbour laboratory press. [9]. Oehmen, C.; Nieplocha, J. (2006). "ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis". IEEE Transactions on Parallel and Distributed Systems 17 (8): 740. doi:10.1109/TPDS.2006.112. [10]. "Program Selection Tables of the Blast NCBI web site". [11]. Pearson, WR; Lipman, DJ (1988). "Improved tools for biological sequence comparisons". Proceedings of the National Academy of Sciences of the United States of America 85 (8): 2444–8.doi:10.1073/pnas.85.8.2444. PMC 280013. PMID 3162770. [12]. Rick CM.Yader JT,Annu Rev Genet 1988 Classical and Molecular Genetics of tomato. [13]. http://www.ncbi.nlm.nih.gov/BLAST/full_options.html [14]. Scott Jw,Harbaugh Bk:Micro Tom: A miniature dwarf tomato. Florid Agar Experiment 1989. [15]. Taylor Willie, Higgins Des 2000,Bioinformatics :Sequence structure and database practical approach ―,1st Edition October 2000 ,Published by Oxford university press. [16]. Whitworth, W.A. (1901) Choice and Chance with One Thousand Exercises. Fifth edition. Deighton Bell, Cambridge. [Reprinted by Hafner Publishing Co., New York, 1959.] [17]. Zhao,K.;Chu,X.(2014). "G- BLASTN:acceleratingnucleotidealignmentbygraphicsprocessors". Bioinformatics 30 (10):138491. doi:10.1093/bioinformatics/btu0 47.PMID 24463183.