SlideShare a Scribd company logo
2
Most read
3
Most read
20
Most read
DATA RETRRIEVAL
FASTA format
FASTA format
Definition:
The FASTA format is a text-based format for representing nucleotides
sequence or protein sequence . It is commonly used in bioinformatics for
storing and exchanging sequence information between databases and
software.
Purpose:
The primary purpose of this format is to facilitate data manipulation and
analysis,allowing researchers to easily access and process biological
sequence.
Structure and components
A typical FASTA file contains sequence identifiers and the
sequence themselves. The first line begins with a
‘>’symbol,followed by the identifier and optional description.
Subsequent line contains the actual sequence data , which can
be represented in multiple lines for longsequences. The format
is designed to be simple and human-readable ,ensuring easy
compatibility with a range of bioinformatics tools.
Structure of a FASTA file:
1.Header Line :
Starts with a ‘>’(greater than)symbol followed by a description or
identifier .
2.Scequence Lines :
One or more Lines containing the actual sequence (DNA,RNA,OR
PROTEIN)without spaces or numbers .
Example
For a DNA sequence:
>sequence1 description
ATGCGTAATAGCTAGCTAGCTAATCG
CGATCGATCGATCGTAGCTAGCTA
For a protein sequence:
>protein1 hypothetical protein
MKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLG
Notes :
1-Scequence can be split into multiple lines for readability.
2-No strict limit on line length,thought 60--80 characters per
line is common.
3-Used in many bioinformatics tools for tasks like alignment
(e.g;BLAST) ,sequence assembly and annotations.
Application in bioinformatics:
The FASTA format plays a vital role in various aspects of
bioinformatics,including sequence alignment,similarity searches
and genome assembly.
It is commonly used in tools such as BLAST (Basic Local
Alignment Search Tool)which allows researchers to compare their
sequence against a large database to identify homologous
sequence and functional elements. Additionally,the format is used
for input data in software for phylogenetic analyses ,where it help
determine evolutionary relationship between species .
Methods for Accessing FASTA Data
Accessing FASTA data can be achieved through several methods
including command-line tool such as wget or curl,which can
download files directly from public databases. Web-based APIs are
also popular,enabling researchers to programmatically retrieve
sequences using specific queries.Local databases can be
maintained for larger projects,where bulk downloads can be
performed to ensure quick access to frequently used sequences.
Tools for processing FASTA files
Various bioinformatics software tools are designed for processing
FASTA files,allowing the manipulation and analysis of sequence
data. Example include Seqkit,a command-line toolkit for FASTA /Q
file processing and bioconductor packages in R that provide
extensive functionalities for data analysis.
Additionally,python libraries such as biopython offer
comprehensive capabilities for parsing,searching,and analysing
FASTA sequences,facilitating the extraction of meaningful insights.
Integration with bioinformatics databases:
FASTA files can be effectively integrated with major
bioinformatics databases such as NCBI,Ensembl,and
Uniprot .These databases allow users to upload FASTA files
for sequence,search,annotations, and comparative analysis.
The interoperability of FASTA format ensures that data can be
cross-referenced with genomic annotations,protein structures
and molecular functionality, enhancing the overall
effectiveness of bioinformatics research.
Example for a specific species/
Analysis related to FASTA format
Here's is a real world FASTA format example for a gene from Homo-sapiens (human)-the BRCA1 gene,which is
commonly analysed in cancer research.
Example:BRCA1 (partial sequence)
>NM_007294.4 Homo sapiens BRCA1, DNA repair associated (BRCA1), mRNA
ATGAAAAGCTCAGAGGAGGAAGAGGAAAGGAGGAAGAGGAGGAGGAAGAGGAAGAGGAAGAGGAA
AGAGGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAG
TGTTTATTTTTTATTTTGATTTTTTTTTTTGAGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGT
GCAGTGGCACGATCTTGGCTCACTGCAAGCTCCGCCTCCCAGGTTCAAGCAATTCTCCTGCCTCA
GCCTCCCGAGTAGCTGGGACTACAGGCACCCGCCACCACGCCTGGCTAATTTTTGTATTTTTAGT
AGAGATAGGGTTTCACCATGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCGTGATCCACCCGC
Example(to be continued)
How this is used:
● This FASTA file could be input to tools like BALST, Clustal
Omega or MAFFT to compare the sequence against
others.
● Researchers use this for mutation analysis,gene
expression, or to design primers for PCR
● The accession number NM-007294.4 tells you this is an
NCBI RefSeq mRNA sequence.
Alignment of sequence In FASTA FORMAT
There are several ways to align or analyse a sequence in FASTA format
depending On your goals (e.g;comparing With other sequences,identifying
mutations,or finding similar sequences.)
Here's a breakdown of the most common methods:
●BLAST (Basic Local Alignment Search Tool)
Purpose:
Find regions Of similarity between your sequence and others in a
database.
BLAST(TO BE CONTINUED)
How:
a. Go to
https://blast.ncbi.nlm.nih.gov/Blast.cgi
b• choose a tool (e.g;nucleotide BLAST if your sequence is DNA)
c• Paste your FASTA sequence into the query box.
d. Chose the database (e.g;human genome or all organisms)
e• click BLAST and view results like similar sequences,identify %,E-values,etc
● Multiple sequence Alignment (MSA)
Purpose:
Align multiple sequence to find conserved
Tools:
○Clustal Omega
○MAFFT
○MUSCLE
MSA (TO BE CONTINUED)
How:
a• collect multiple FASTA sequence you want to align.
b• Paste them into the tool.
c• Run the alignment
d• Download or view the alignment file, often used for evolutionary
analysis or primer design.
●Primer design / Mutation Analysis
TOOLS:
○Primer BLAST
○SnapGene viewer
○Benchling
●you can input your FASTA and look for ;
○specific mutations (SNPs, insertions , deletions)
○Potential Primers
○Protein Translation
Command-line tools(for bioinformatics scripting)
If you are comfortable with coding or scripting :
●use Biopython (python library) to parse and analyse FASTA files.
●use tools like MAAFT, BALST+ ,or Samtools in a terminal for bulk analysis.
Conclusion:
In summary,the FASTA format is critical for the representation and
manipulation of biological data sequence in bioinformatics. It's simplicity
facilitate wide-ranging application.
Conclusion (TO BE CONTINUED)
While various methods Nd tools allow for effective data retrieval and processing.
THE integration with prominent bioinformatics databases further enhances its
utility,making it an indispensable format in the global bioinformatics community.
FASTA FORMAT .pptx fasta fasta bioinformatics

More Related Content

PPTX
Unlocking-Bioinformatics-with-FASTA-Tools.pptx
PPTX
BLAST AND FASTA.pptx
PPTX
BLAST AND FASTA.pptx12345789999987544321234
PDF
Different Modes of Data Storage and Phylogenetic Analysis
PPTX
Main bioinfomatics alignment tools.pptx
Unlocking-Bioinformatics-with-FASTA-Tools.pptx
BLAST AND FASTA.pptx
BLAST AND FASTA.pptx12345789999987544321234
Different Modes of Data Storage and Phylogenetic Analysis
Main bioinfomatics alignment tools.pptx

Similar to FASTA FORMAT .pptx fasta fasta bioinformatics (20)

PPT
Basic Local Alignment Tool (BLAST) bioinformatics
PDF
Article
PDF
FastA HOMOLOGY SEARCH ALGORITHM
PDF
Presentation on FASTA
PDF
Rna rocket demo
PPTX
Data base searching tool
PDF
Bioinformatics مي.pdf
PPT
Mayank
PPTX
Species identification.pptx
PPTX
Sequence similarity tools.pptx
PDF
D1803012022
PDF
A file in fasta format is probably the most common way to store sequence info...
PPT
Similarity
PPTX
Sequencedatabases
PPTX
Bioinformatics t2-databases wim-vancriekinge_v2013
PPTX
BioInformatics Tools -Genomics , Proteomics and metablomics
PPTX
Gen bank databases
Basic Local Alignment Tool (BLAST) bioinformatics
Article
FastA HOMOLOGY SEARCH ALGORITHM
Presentation on FASTA
Rna rocket demo
Data base searching tool
Bioinformatics مي.pdf
Mayank
Species identification.pptx
Sequence similarity tools.pptx
D1803012022
A file in fasta format is probably the most common way to store sequence info...
Similarity
Sequencedatabases
Bioinformatics t2-databases wim-vancriekinge_v2013
BioInformatics Tools -Genomics , Proteomics and metablomics
Gen bank databases
Ad

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Business Analytics and business intelligence.pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
.pdf is not working space design for the following data for the following dat...
PDF
annual-report-2024-2025 original latest.
PDF
Mega Projects Data Mega Projects Data
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Business Analytics and business intelligence.pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Reliability_Chapter_ presentation 1221.5784
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
.pdf is not working space design for the following data for the following dat...
annual-report-2024-2025 original latest.
Mega Projects Data Mega Projects Data
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ISS -ESG Data flows What is ESG and HowHow
climate analysis of Dhaka ,Banglades.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction to Knowledge Engineering Part 1
IB Computer Science - Internal Assessment.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Supervised vs unsupervised machine learning algorithms
oil_refinery_comprehensive_20250804084928 (1).pptx
Ad

FASTA FORMAT .pptx fasta fasta bioinformatics

  • 2. FASTA format Definition: The FASTA format is a text-based format for representing nucleotides sequence or protein sequence . It is commonly used in bioinformatics for storing and exchanging sequence information between databases and software. Purpose: The primary purpose of this format is to facilitate data manipulation and analysis,allowing researchers to easily access and process biological sequence.
  • 3. Structure and components A typical FASTA file contains sequence identifiers and the sequence themselves. The first line begins with a ‘>’symbol,followed by the identifier and optional description. Subsequent line contains the actual sequence data , which can be represented in multiple lines for longsequences. The format is designed to be simple and human-readable ,ensuring easy compatibility with a range of bioinformatics tools.
  • 4. Structure of a FASTA file: 1.Header Line : Starts with a ‘>’(greater than)symbol followed by a description or identifier . 2.Scequence Lines : One or more Lines containing the actual sequence (DNA,RNA,OR PROTEIN)without spaces or numbers .
  • 5. Example For a DNA sequence: >sequence1 description ATGCGTAATAGCTAGCTAGCTAATCG CGATCGATCGATCGTAGCTAGCTA For a protein sequence: >protein1 hypothetical protein MKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLG
  • 6. Notes : 1-Scequence can be split into multiple lines for readability. 2-No strict limit on line length,thought 60--80 characters per line is common. 3-Used in many bioinformatics tools for tasks like alignment (e.g;BLAST) ,sequence assembly and annotations.
  • 7. Application in bioinformatics: The FASTA format plays a vital role in various aspects of bioinformatics,including sequence alignment,similarity searches and genome assembly. It is commonly used in tools such as BLAST (Basic Local Alignment Search Tool)which allows researchers to compare their sequence against a large database to identify homologous sequence and functional elements. Additionally,the format is used for input data in software for phylogenetic analyses ,where it help determine evolutionary relationship between species .
  • 8. Methods for Accessing FASTA Data Accessing FASTA data can be achieved through several methods including command-line tool such as wget or curl,which can download files directly from public databases. Web-based APIs are also popular,enabling researchers to programmatically retrieve sequences using specific queries.Local databases can be maintained for larger projects,where bulk downloads can be performed to ensure quick access to frequently used sequences.
  • 9. Tools for processing FASTA files Various bioinformatics software tools are designed for processing FASTA files,allowing the manipulation and analysis of sequence data. Example include Seqkit,a command-line toolkit for FASTA /Q file processing and bioconductor packages in R that provide extensive functionalities for data analysis. Additionally,python libraries such as biopython offer comprehensive capabilities for parsing,searching,and analysing FASTA sequences,facilitating the extraction of meaningful insights.
  • 10. Integration with bioinformatics databases: FASTA files can be effectively integrated with major bioinformatics databases such as NCBI,Ensembl,and Uniprot .These databases allow users to upload FASTA files for sequence,search,annotations, and comparative analysis. The interoperability of FASTA format ensures that data can be cross-referenced with genomic annotations,protein structures and molecular functionality, enhancing the overall effectiveness of bioinformatics research.
  • 11. Example for a specific species/ Analysis related to FASTA format Here's is a real world FASTA format example for a gene from Homo-sapiens (human)-the BRCA1 gene,which is commonly analysed in cancer research. Example:BRCA1 (partial sequence) >NM_007294.4 Homo sapiens BRCA1, DNA repair associated (BRCA1), mRNA ATGAAAAGCTCAGAGGAGGAAGAGGAAAGGAGGAAGAGGAGGAGGAAGAGGAAGAGGAAGAGGAA AGAGGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAGAGGAAG TGTTTATTTTTTATTTTGATTTTTTTTTTTGAGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGT GCAGTGGCACGATCTTGGCTCACTGCAAGCTCCGCCTCCCAGGTTCAAGCAATTCTCCTGCCTCA GCCTCCCGAGTAGCTGGGACTACAGGCACCCGCCACCACGCCTGGCTAATTTTTGTATTTTTAGT AGAGATAGGGTTTCACCATGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCGTGATCCACCCGC
  • 12. Example(to be continued) How this is used: ● This FASTA file could be input to tools like BALST, Clustal Omega or MAFFT to compare the sequence against others. ● Researchers use this for mutation analysis,gene expression, or to design primers for PCR ● The accession number NM-007294.4 tells you this is an NCBI RefSeq mRNA sequence.
  • 13. Alignment of sequence In FASTA FORMAT There are several ways to align or analyse a sequence in FASTA format depending On your goals (e.g;comparing With other sequences,identifying mutations,or finding similar sequences.) Here's a breakdown of the most common methods: ●BLAST (Basic Local Alignment Search Tool) Purpose: Find regions Of similarity between your sequence and others in a database.
  • 14. BLAST(TO BE CONTINUED) How: a. Go to https://blast.ncbi.nlm.nih.gov/Blast.cgi b• choose a tool (e.g;nucleotide BLAST if your sequence is DNA) c• Paste your FASTA sequence into the query box. d. Chose the database (e.g;human genome or all organisms) e• click BLAST and view results like similar sequences,identify %,E-values,etc
  • 15. ● Multiple sequence Alignment (MSA) Purpose: Align multiple sequence to find conserved Tools: ○Clustal Omega ○MAFFT ○MUSCLE
  • 16. MSA (TO BE CONTINUED) How: a• collect multiple FASTA sequence you want to align. b• Paste them into the tool. c• Run the alignment d• Download or view the alignment file, often used for evolutionary analysis or primer design.
  • 17. ●Primer design / Mutation Analysis TOOLS: ○Primer BLAST ○SnapGene viewer ○Benchling ●you can input your FASTA and look for ; ○specific mutations (SNPs, insertions , deletions) ○Potential Primers ○Protein Translation
  • 18. Command-line tools(for bioinformatics scripting) If you are comfortable with coding or scripting : ●use Biopython (python library) to parse and analyse FASTA files. ●use tools like MAAFT, BALST+ ,or Samtools in a terminal for bulk analysis. Conclusion: In summary,the FASTA format is critical for the representation and manipulation of biological data sequence in bioinformatics. It's simplicity facilitate wide-ranging application.
  • 19. Conclusion (TO BE CONTINUED) While various methods Nd tools allow for effective data retrieval and processing. THE integration with prominent bioinformatics databases further enhances its utility,making it an indispensable format in the global bioinformatics community.