SlideShare a Scribd company logo
8
Most read
10
Most read
14
Most read
Applications and Trends in
       Data Mining


        Data Mining
             For
   Biological Data Analysis
Factors that led for the
           development
• The past decade has seen an explosive growth in:
     1.Genomics
     2.Proteomics
     3.Functional genomics
     4.Biomedical research

• Identification and comparative analysis of genomes of humans
  and other species for investigation of genetic networks.

• Development of new Pharmaceuticals and advances in cancer
  therapies.
• DNA sequences form the foundation of genetic codes of all
  living organisms.

• DNA sequences are comprised of four basic building blocks
  called nucleotides:
      1.adenine (A)
      2.cytosine (C)
      3.guanine (G)
      4.thymine (T)

• These four nucleotides (or bases) are combined to form long
  chains that resemble a twisted ladder.
Data mining ppt
• DNA sequence      … CTA CAC ACG TGT AAC …

• A gene usually comprises hundreds of individual nucleotides
  arranged in particular order.

• A genome is the complete set of genes of an organism.

• Genomics is the analysis of genome sequences.

• A proteome is the complete set of protein molecules present
  in a cell, tissue, or organism.

• Proteomics is the study of proteome sequences.
Data mining may contribute to
the biological data analysis in
    the following aspects.
Biological data mining has
become an essential part of
 new research field called
     bioinformatics.
1)Semantic integration of
heterogeneous, distributed genomic and
proteomic data bases.
• Genomic and proteomic data sets are often generated at
  different labs and by different methods.

• They are distributed, heterogeneous, and of wide variety.

• Integration of such data is essential to cross-site analysis of
  biological data .

• Such integration and linkage analysis would facilitate the
  systematic and coordinated analysis of genome and biological
  data.
• This has promoted the development of integrated data
  warehouses to store and manage derived biological data.

• Data cleaning, data integration, reference
  reconciliation, classification, and clustering methods will
  facilitate the integration of biological data and the
  construction of data warehouses for biological data analysis.
2)Alignment, indexing, similarity search, and
comparative analysis of multiple nucleotide/protein
sequences.
• BLAST and FASTA, in particular, are the tools for the systematic
  analysis of genomic and proteomic data.

• Biological sequence analysis methods differ from many
  sequential pattern analysis algorithms proposed in data
  mining.

• For protein sequences, two amino acids should also be
  considered a “match” if one can be derived from the other by
  substitutions that are likely to occur in nature.
• There is a combinatorial number of ways to approximately
  align multiple sequences:
  1)reducing a multiple alignment to a series of pair wise
  alignments and then combining the result.
   2)using Hidden Markow Models or HMMs.

• Multiple alignment can be used to identify highly conserved
  residues among genomes and they can be used to build
  phylogenetic trees to infer evolutionary relationships among
  species.

• Genomic and proteomic sequences isolated from diseased
  and healthy tissues can be compared to identify critical
  differences between them.

• Sequences occurring in the diseased samples may indicate the
  genetic factor of the disease.
3)Discovery of structural patterns and analysis of
genetic networks and protein pathways.
• Protein sequences are folded into 3D structures, and such
  structures interact with each other based on the relative
  position and distances between them.

• Such complex interactions lead to the formation of genetic
  networks and protein pathways.

• It is important to develop powerful and scalable data mining
  to discover patterns and to study about regularities and
  irregularities among complex biological network.
4)Association and path analysis: identifying co-
occurring gene sequences and linking genes to
different stages of disease development .
• Many studies have been focused on comparison of one gene
  to another.

• Most diseases are not triggered by a single gene but by a
  combination of genes acting together.

• Association analysis methods can be used to determine the
  kinds of genes that are likely to co-occur in target samples.

• A group of genes may contribute to a disease process, here
  path analysis is expected to play an important role.
5)Visualization tools in genetic data analysis.

• Alignments among genomic or proteomic sequences and
  interactions between them can be expressed in
     1)Graphic forms.
     2)Transformed into various kinds of easy-to-understand
        visual displays.
• They facilitate pattern understanding, knowledge
  discovery, and interactive data exploration.
Thank you

More Related Content

PPTX
GENOMICS AND BIOINFORMATICS
PPTX
Proteins databases
PDF
Genomic Data Analysis
PPTX
Genomics
PPTX
Genomics(functional genomics)
PPTX
Scop database
PPTX
Web based servers and softwares for genome analysis
GENOMICS AND BIOINFORMATICS
Proteins databases
Genomic Data Analysis
Genomics
Genomics(functional genomics)
Scop database
Web based servers and softwares for genome analysis

What's hot (20)

PPTX
Kegg
PPT
Systems biology & Approaches of genomics and proteomics
PPTX
Pathways and genomes databases in bioinformatics
PPTX
Protein data bank
PDF
Sequence analysis - Bioinformatics
PPT
Pubchem
PPTX
Genetic and Physical map of Genome
PDF
sequence alignment
PPTX
Introduction to databases.pptx
PDF
Phylogenetics an overview
PPTX
Bioinformatics
PPTX
Comparative genomics
PPT
PPTX
Bioinformatics
PDF
Gene prediction method
PPT
Phylogenetic analysis
PPTX
Sequence Alignment
PPTX
How to submit a sequence in NCBI
PPTX
Dynamic programming and pairwise sequence alignment
Kegg
Systems biology & Approaches of genomics and proteomics
Pathways and genomes databases in bioinformatics
Protein data bank
Sequence analysis - Bioinformatics
Pubchem
Genetic and Physical map of Genome
sequence alignment
Introduction to databases.pptx
Phylogenetics an overview
Bioinformatics
Comparative genomics
Bioinformatics
Gene prediction method
Phylogenetic analysis
Sequence Alignment
How to submit a sequence in NCBI
Dynamic programming and pairwise sequence alignment
Ad

Viewers also liked (20)

PPTX
Data mining
PPT
Data mining slides
 
PPT
Data Warehousing and Data Mining
PDF
Data mining (lecture 1 & 2) conecpts and techniques
PPTX
Data Mining: Application and trends in data mining
PPT
Data Mining Concepts
PPT
DATA WAREHOUSING AND DATA MINING
PPT
introduction to data mining tutorial
PDF
Introduction to Data Mining / Bioinformatics
PPT
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
Introduction data mining
PPT
Bioinformatics
PPTX
Knowledge Discovery and Data Mining
PPT
Bioinformatics
PPT
5.4 mining sequence patterns in biological data
PPTX
Bioinformatics
PPTX
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
PPTX
BIOLOGICAL SEQUENCE DATABASES
PPTX
Biological databases
DOCX
Open Reading Frames
Data mining
Data mining slides
 
Data Warehousing and Data Mining
Data mining (lecture 1 & 2) conecpts and techniques
Data Mining: Application and trends in data mining
Data Mining Concepts
DATA WAREHOUSING AND DATA MINING
introduction to data mining tutorial
Introduction to Data Mining / Bioinformatics
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Introduction data mining
Bioinformatics
Knowledge Discovery and Data Mining
Bioinformatics
5.4 mining sequence patterns in biological data
Bioinformatics
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
BIOLOGICAL SEQUENCE DATABASES
Biological databases
Open Reading Frames
Ad

Similar to Data mining ppt (20)

PPTX
Data Mining
PPTX
Informal presentation on bioinformatics
PPTX
Bioinformatics
PPTX
Bioinformatic, and tools by kk sahu
PPTX
Introduction to bioinformatics
PDF
57 bio infomark
PDF
Bioinformatics data mining
PDF
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
PPT
Project report-on-bio-informatics
PPTX
Bioinformatics
PPTX
Bioinformatics_1_ChenS.pptx
PDF
Bioinformatics - Exam_Materials.pdf by uos
PPTX
Genomics and proteomics by shreeman
PDF
BITS: Overview of important biological databases beyond sequences
PPT
SooryaKiran Bioinformatics
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
PDF
Bioinformatics مي.pdf
PPTX
DNA Sequence Data in Big Data Perspective
PDF
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
Data Mining
Informal presentation on bioinformatics
Bioinformatics
Bioinformatic, and tools by kk sahu
Introduction to bioinformatics
57 bio infomark
Bioinformatics data mining
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
Project report-on-bio-informatics
Bioinformatics
Bioinformatics_1_ChenS.pptx
Bioinformatics - Exam_Materials.pdf by uos
Genomics and proteomics by shreeman
BITS: Overview of important biological databases beyond sequences
SooryaKiran Bioinformatics
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
Bioinformatics مي.pdf
DNA Sequence Data in Big Data Perspective
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...

Data mining ppt

  • 1. Applications and Trends in Data Mining Data Mining For Biological Data Analysis
  • 2. Factors that led for the development • The past decade has seen an explosive growth in: 1.Genomics 2.Proteomics 3.Functional genomics 4.Biomedical research • Identification and comparative analysis of genomes of humans and other species for investigation of genetic networks. • Development of new Pharmaceuticals and advances in cancer therapies.
  • 3. • DNA sequences form the foundation of genetic codes of all living organisms. • DNA sequences are comprised of four basic building blocks called nucleotides: 1.adenine (A) 2.cytosine (C) 3.guanine (G) 4.thymine (T) • These four nucleotides (or bases) are combined to form long chains that resemble a twisted ladder.
  • 5. • DNA sequence … CTA CAC ACG TGT AAC … • A gene usually comprises hundreds of individual nucleotides arranged in particular order. • A genome is the complete set of genes of an organism. • Genomics is the analysis of genome sequences. • A proteome is the complete set of protein molecules present in a cell, tissue, or organism. • Proteomics is the study of proteome sequences.
  • 6. Data mining may contribute to the biological data analysis in the following aspects.
  • 7. Biological data mining has become an essential part of new research field called bioinformatics.
  • 8. 1)Semantic integration of heterogeneous, distributed genomic and proteomic data bases. • Genomic and proteomic data sets are often generated at different labs and by different methods. • They are distributed, heterogeneous, and of wide variety. • Integration of such data is essential to cross-site analysis of biological data . • Such integration and linkage analysis would facilitate the systematic and coordinated analysis of genome and biological data.
  • 9. • This has promoted the development of integrated data warehouses to store and manage derived biological data. • Data cleaning, data integration, reference reconciliation, classification, and clustering methods will facilitate the integration of biological data and the construction of data warehouses for biological data analysis.
  • 10. 2)Alignment, indexing, similarity search, and comparative analysis of multiple nucleotide/protein sequences. • BLAST and FASTA, in particular, are the tools for the systematic analysis of genomic and proteomic data. • Biological sequence analysis methods differ from many sequential pattern analysis algorithms proposed in data mining. • For protein sequences, two amino acids should also be considered a “match” if one can be derived from the other by substitutions that are likely to occur in nature.
  • 11. • There is a combinatorial number of ways to approximately align multiple sequences: 1)reducing a multiple alignment to a series of pair wise alignments and then combining the result. 2)using Hidden Markow Models or HMMs. • Multiple alignment can be used to identify highly conserved residues among genomes and they can be used to build phylogenetic trees to infer evolutionary relationships among species. • Genomic and proteomic sequences isolated from diseased and healthy tissues can be compared to identify critical differences between them. • Sequences occurring in the diseased samples may indicate the genetic factor of the disease.
  • 12. 3)Discovery of structural patterns and analysis of genetic networks and protein pathways. • Protein sequences are folded into 3D structures, and such structures interact with each other based on the relative position and distances between them. • Such complex interactions lead to the formation of genetic networks and protein pathways. • It is important to develop powerful and scalable data mining to discover patterns and to study about regularities and irregularities among complex biological network.
  • 13. 4)Association and path analysis: identifying co- occurring gene sequences and linking genes to different stages of disease development . • Many studies have been focused on comparison of one gene to another. • Most diseases are not triggered by a single gene but by a combination of genes acting together. • Association analysis methods can be used to determine the kinds of genes that are likely to co-occur in target samples. • A group of genes may contribute to a disease process, here path analysis is expected to play an important role.
  • 14. 5)Visualization tools in genetic data analysis. • Alignments among genomic or proteomic sequences and interactions between them can be expressed in 1)Graphic forms. 2)Transformed into various kinds of easy-to-understand visual displays. • They facilitate pattern understanding, knowledge discovery, and interactive data exploration.