SlideShare a Scribd company logo
The application of
                    artificial intelligence
 Presented by:
Pallavi Vashistha
                        techniques in
                       bioinformatics
Outline
• Bioinformatics Today
• Artificial Intelligence application
• Examples:
 Symbolic machine learning
 Nearest neighbour approach
 Clustering
 Identification trees


• Major Challenge and Research Issues
History of Bioinformatics
    Year        Subject Name                 MBP
                                     (Millions of base pairs)

    1995    Haemophilus Influenza             1.8


    1996         Bakers Yeast                12.1
    1997             E.Coli                   4.7
    2000    Pseudomonas aeruginosa            6.3
            A. Thaliana                       100
            D. Melonagaster                   180


    2001       Human Genome                 3,000
    2002        House Mouse                 2,500
Bioinformatics Today
  Sequence analysis
   Sequence alignment
   Structure and function prediction
   Gene finding


  Structure analysis
   Protein structure comparison
   Protein structure prediction
   RNA structure modeling

  Expression analysis
    Gene expression
     analysis
    Gene clustering

  Pathway analysis
   Metabolic pathway
   Regulatory networks
                                        4
Artificial Intelligence application
There are several important problems where AI
approaches are particularly promising

  • Prediction of Protein Structure
  • Semiautomatic drug design
  • Knowledge acquisition from genetic data
Artificial Intelligence application

How to classify biological sequences
• SVM(support vector machine ), Neural Nets,
  Decision Trees, Rules
How to cluster biological entities
• Bi-clustering, K-means, hierarchical
How to select features
• LDR (Linear Discriminant Analysis), PCA (Principal
  Components Analysis), SVM-RFE (recursive feature
  elimination)
Nearest neighbour approach
0 Decision tree:
• each node is connected to a set of possible answers,
• each non-leaf node is connected to a test which splits
  its set of possible answers into subsets corresponding
  to different test results,
• each branch carries a particular test result’s subset
  to another node.
Nearest neighbour approach
         Example:                         Solution:
0 Example: To see how             0 To answer this question,
  decision trees are useful for     we need to assume a
  nearest neighbour                 consistency heuristic, as
  calculations, consider 8          follows. Find the most
  blocks of known width,
  height and colour (Winston,       similar case, as
  1992). A new block then           measured by known
  appears of known size but         properties, for which the
  unknown colour. On the            property is known; then
  basis of existing                 guess that the unknown
  information, can we make          property is the same as
  an informed guess as to           the known property. This
  what the colour of the new        is the basis of all nearest
  block is?                         neighbour calculations.
The application of artificial intelligence
The application of artificial intelligence
The application of artificial intelligence
Clustering
0 Clustering follows the principles of nearest neighbour
  calculations but attempts to look at all the attributes
  (positions) of biosequences rather than just one
  attribute (position) for identifying similarities.
0 This is achieved typically by averaging the amount of
  similarity between two biosequences across all
  attributes.
0 For example, imagine that we have a table of
  information concerning four organisms with five
  characteristics:
• Given this table, can we calculate how similar each organism is to every other
  organism?

• The nearest neighbour approach described earlier would work through the
  attributes(‘characteristics’) one at a time. For short bio sequences this may be
  feasible, but for bio sequences with hundreds of attributes (e.g. DNA bases) this is
  not desirable, since we could probably classify all the samples with just the first
  few attributes
Clustering can be demonstrated in the following way:

0 The first step is to calculate a simplematching coefficient for
  every pair of organisms in the table across all attributes.
0 For instance, the matching coefficient for A and B is the
  number of identical characteristics divided by the total
  number of characteristics,
0 4/5 = 0.8 (1+0+1+1+1=4/5=0.8). Similarly,
0 A and C = 0.4 (0+0+0+1+1 =2/5 = 0.4)
0 A and D = 0.2 (0+0+0+0+1 = 1/5 = 0.2)
0 B and C = 0.6 (0+1+0+1+1 = 3/5 = 0.6)
0 B and D = 0.4 (0+1+0+0+1 = 2/5 = 0.4)
0 C and D = 0.8 (1+1+1+0+1 = 4/5 = 0.8)
• We then find the first highest matching coefficient to form the first 'cluster'of
  similar bacteria. Since we have two candidates here (AB and CD both have
  0.8), we randomly choose one cluster to start the process: AB.

• The steps are then repeated, using AB as one ‘organism’ and taking partial
  matches into account.

• the average matching coefficient for
   AB and C = 0.5 (0+0.5+0+1+1 = 2.5/5 = 0.5)
   where the 0.5 second match within the parentheses refers to C sharing its
second
   feature with B but not A.


• The matching coefficients for AB and D = 0.3 (0+0.5+0+0+1 = 1.5/5 = 0.3)
   and for C and D = 0.8 (as before).

•   Since C and D have the highest cooefficient, they form the second cluster.
Finally, we calculate the average matching coefficients for the new 'clusters'of
organism taking AB as one organism and CD as another organism = 0.4
(0+0.5+0+0.5+1 = 2/5 = 0.4)
again taking partial matches into account. We can then construct a similarity tree
using these coefficients, as follows:
Identification tree




The task now is to determine which of the attributes contribute towards someone
being sunburned or not. First, we need to introduce a disorder formula and
associated log values to rank attributes in terms of their influence on who is and
who isn’t sunburned.
where nb is the number of samples in branch b, nt is the total number of samples in all
branches, and nbc is the total of samples in branch b of class c.



• The idea is to divide samples into subsets in which as many of the samples have
  the same classification as possible (as homogeneous subsets as possible). The
  higher the disorder value, the less homogeneous the classification.

•   We now work through each attribute in turn, identifying which of the samples fall
    within the branches (attribute values) of that attribute, and signify into which
    class each of the samples falls
The application of artificial intelligence
The application of artificial intelligence
The application of artificial intelligence
The application of artificial intelligence
The application of artificial intelligence
Given the full identification tree, we can then derive rules by following
          all paths from the root to the leaf nodes, as follows:

0 (a) If a person’s hair colour is brown, then the person is not
 sunburned.

0 (b) If a person’s hair colour is red, then the person is
 sunburned.

0 (c) If a person’s hair colour is blond and that person has used
 sun tan lotion, then the person is not sunburned.

0 (d) If a person’s hair colour is blond and that person has not
 used sun tan lotion, then the person is sunburned.
Major Challenges and Research Issues




•   Requires individuals with knowledge of both
    disciplines
•   Requires collaboration of individuals from diverse
    disciplines
Major Challenges and Research Issues

• Data generation in biology/bioinformatics is
  outpacing methods of data analysis
• Data interpretation and generation of
  hypotheses requires intelligence
• AI offers established methods for knowledge
  representation and “intelligent” data
  interpretation

More Related Content

PDF
Adam Ashenfelter - Finding the Oddballs
PPTX
Artificial intelligence and its application
PPT
Artificial Intelligence
PPTX
Advanced applications of artificial intelligence and neural networks
PDF
Ai med1
PDF
The triump of artificial intelligence in search of intelligence devesh
PPTX
Artificial Intelligence
PPT
Artificial intelligence
Adam Ashenfelter - Finding the Oddballs
Artificial intelligence and its application
Artificial Intelligence
Advanced applications of artificial intelligence and neural networks
Ai med1
The triump of artificial intelligence in search of intelligence devesh
Artificial Intelligence
Artificial intelligence

Viewers also liked (13)

PPTX
An Introduction to Artificial Intelligence
PPTX
Natural resources of Bangladesh by capt Robin amc
PPTX
Artificial Intelligence
 
PPTX
artificial intelligence and its applications
PDF
Big Data & Artificial Intelligence
PPTX
Social problem of Bangladesh and It’s solution
PDF
Key Expert Systems Concepts
PPT
Artificial Intelligence Master at UPC: some experience on applying AI to real...
PPTX
What is artificial intelligence
PPTX
Uses of Artificial Intelligence in Bioinformatics
PPTX
Artificial Intelligence
PPS
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
PDF
Lecture1 AI1 Introduction to artificial intelligence
An Introduction to Artificial Intelligence
Natural resources of Bangladesh by capt Robin amc
Artificial Intelligence
 
artificial intelligence and its applications
Big Data & Artificial Intelligence
Social problem of Bangladesh and It’s solution
Key Expert Systems Concepts
Artificial Intelligence Master at UPC: some experience on applying AI to real...
What is artificial intelligence
Uses of Artificial Intelligence in Bioinformatics
Artificial Intelligence
Clinical Decision Support Systems - Sunil Nair Health Informatics Dalhousie U...
Lecture1 AI1 Introduction to artificial intelligence
Ad

Similar to The application of artificial intelligence (20)

PPTX
Islamic University Pattern Recognition & Neural Network 2019
PDF
Introduction to 16S rRNA gene multivariate analysis
PPTX
Pm m23 & pmnm06 week 3 lectures 2015
PPT
Phylogenetics2
PDF
NLP - Sentiment Analysis
PPT
The Semantics of Genomic Analysis
PDF
Soft Computing- Dr. H.s. Hota 28.08.14.pdf
PPTX
02 Network Data Collection
PPTX
02 Network Data Collection (2016)
PPT
Cornell Pbsb 20090126 Nets
PDF
A data-intensive assessment of the species abundance distribution
PDF
Graphical Models 4dummies
PPTX
Knowledge extraction and visualisation using rule-based machine learning
PPT
32_Nov07_MachineLear..
PPTX
Jillian ms defense-4-14-14-ja
PPTX
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
PDF
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
PPTX
2013 ucdavis-smbe-eukaryotes
PPT
Machine Learning presentation.
PPTX
Classification Continued
Islamic University Pattern Recognition & Neural Network 2019
Introduction to 16S rRNA gene multivariate analysis
Pm m23 & pmnm06 week 3 lectures 2015
Phylogenetics2
NLP - Sentiment Analysis
The Semantics of Genomic Analysis
Soft Computing- Dr. H.s. Hota 28.08.14.pdf
02 Network Data Collection
02 Network Data Collection (2016)
Cornell Pbsb 20090126 Nets
A data-intensive assessment of the species abundance distribution
Graphical Models 4dummies
Knowledge extraction and visualisation using rule-based machine learning
32_Nov07_MachineLear..
Jillian ms defense-4-14-14-ja
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model
2013 ucdavis-smbe-eukaryotes
Machine Learning presentation.
Classification Continued
Ad

Recently uploaded (20)

PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Lesson notes of climatology university.
PDF
Insiders guide to clinical Medicine.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Cell Structure & Organelles in detailed.
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Institutional Correction lecture only . . .
PDF
TR - Agricultural Crops Production NC III.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Lesson notes of climatology university.
Insiders guide to clinical Medicine.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Sports Quiz easy sports quiz sports quiz
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Complications of Minimal Access Surgery at WLH
Pharma ospi slides which help in ospi learning
Cell Structure & Organelles in detailed.
STATICS OF THE RIGID BODIES Hibbelers.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Anesthesia in Laparoscopic Surgery in India
Basic Mud Logging Guide for educational purpose
Final Presentation General Medicine 03-08-2024.pptx
GDM (1) (1).pptx small presentation for students
human mycosis Human fungal infections are called human mycosis..pptx
Computing-Curriculum for Schools in Ghana
Institutional Correction lecture only . . .
TR - Agricultural Crops Production NC III.pdf

The application of artificial intelligence

  • 1. The application of artificial intelligence Presented by: Pallavi Vashistha techniques in bioinformatics
  • 2. Outline • Bioinformatics Today • Artificial Intelligence application • Examples:  Symbolic machine learning  Nearest neighbour approach  Clustering  Identification trees • Major Challenge and Research Issues
  • 3. History of Bioinformatics Year Subject Name MBP (Millions of base pairs) 1995 Haemophilus Influenza 1.8 1996 Bakers Yeast 12.1 1997 E.Coli 4.7 2000 Pseudomonas aeruginosa 6.3 A. Thaliana 100 D. Melonagaster 180 2001 Human Genome 3,000 2002 House Mouse 2,500
  • 4. Bioinformatics Today Sequence analysis  Sequence alignment  Structure and function prediction  Gene finding Structure analysis  Protein structure comparison  Protein structure prediction  RNA structure modeling Expression analysis  Gene expression analysis  Gene clustering Pathway analysis  Metabolic pathway  Regulatory networks 4
  • 5. Artificial Intelligence application There are several important problems where AI approaches are particularly promising • Prediction of Protein Structure • Semiautomatic drug design • Knowledge acquisition from genetic data
  • 6. Artificial Intelligence application How to classify biological sequences • SVM(support vector machine ), Neural Nets, Decision Trees, Rules How to cluster biological entities • Bi-clustering, K-means, hierarchical How to select features • LDR (Linear Discriminant Analysis), PCA (Principal Components Analysis), SVM-RFE (recursive feature elimination)
  • 7. Nearest neighbour approach 0 Decision tree: • each node is connected to a set of possible answers, • each non-leaf node is connected to a test which splits its set of possible answers into subsets corresponding to different test results, • each branch carries a particular test result’s subset to another node.
  • 8. Nearest neighbour approach Example: Solution: 0 Example: To see how 0 To answer this question, decision trees are useful for we need to assume a nearest neighbour consistency heuristic, as calculations, consider 8 follows. Find the most blocks of known width, height and colour (Winston, similar case, as 1992). A new block then measured by known appears of known size but properties, for which the unknown colour. On the property is known; then basis of existing guess that the unknown information, can we make property is the same as an informed guess as to the known property. This what the colour of the new is the basis of all nearest block is? neighbour calculations.
  • 12. Clustering 0 Clustering follows the principles of nearest neighbour calculations but attempts to look at all the attributes (positions) of biosequences rather than just one attribute (position) for identifying similarities. 0 This is achieved typically by averaging the amount of similarity between two biosequences across all attributes. 0 For example, imagine that we have a table of information concerning four organisms with five characteristics:
  • 13. • Given this table, can we calculate how similar each organism is to every other organism? • The nearest neighbour approach described earlier would work through the attributes(‘characteristics’) one at a time. For short bio sequences this may be feasible, but for bio sequences with hundreds of attributes (e.g. DNA bases) this is not desirable, since we could probably classify all the samples with just the first few attributes
  • 14. Clustering can be demonstrated in the following way: 0 The first step is to calculate a simplematching coefficient for every pair of organisms in the table across all attributes. 0 For instance, the matching coefficient for A and B is the number of identical characteristics divided by the total number of characteristics, 0 4/5 = 0.8 (1+0+1+1+1=4/5=0.8). Similarly, 0 A and C = 0.4 (0+0+0+1+1 =2/5 = 0.4) 0 A and D = 0.2 (0+0+0+0+1 = 1/5 = 0.2) 0 B and C = 0.6 (0+1+0+1+1 = 3/5 = 0.6) 0 B and D = 0.4 (0+1+0+0+1 = 2/5 = 0.4) 0 C and D = 0.8 (1+1+1+0+1 = 4/5 = 0.8)
  • 15. • We then find the first highest matching coefficient to form the first 'cluster'of similar bacteria. Since we have two candidates here (AB and CD both have 0.8), we randomly choose one cluster to start the process: AB. • The steps are then repeated, using AB as one ‘organism’ and taking partial matches into account. • the average matching coefficient for AB and C = 0.5 (0+0.5+0+1+1 = 2.5/5 = 0.5) where the 0.5 second match within the parentheses refers to C sharing its second feature with B but not A. • The matching coefficients for AB and D = 0.3 (0+0.5+0+0+1 = 1.5/5 = 0.3) and for C and D = 0.8 (as before). • Since C and D have the highest cooefficient, they form the second cluster.
  • 16. Finally, we calculate the average matching coefficients for the new 'clusters'of organism taking AB as one organism and CD as another organism = 0.4 (0+0.5+0+0.5+1 = 2/5 = 0.4) again taking partial matches into account. We can then construct a similarity tree using these coefficients, as follows:
  • 17. Identification tree The task now is to determine which of the attributes contribute towards someone being sunburned or not. First, we need to introduce a disorder formula and associated log values to rank attributes in terms of their influence on who is and who isn’t sunburned.
  • 18. where nb is the number of samples in branch b, nt is the total number of samples in all branches, and nbc is the total of samples in branch b of class c. • The idea is to divide samples into subsets in which as many of the samples have the same classification as possible (as homogeneous subsets as possible). The higher the disorder value, the less homogeneous the classification. • We now work through each attribute in turn, identifying which of the samples fall within the branches (attribute values) of that attribute, and signify into which class each of the samples falls
  • 24. Given the full identification tree, we can then derive rules by following all paths from the root to the leaf nodes, as follows: 0 (a) If a person’s hair colour is brown, then the person is not sunburned. 0 (b) If a person’s hair colour is red, then the person is sunburned. 0 (c) If a person’s hair colour is blond and that person has used sun tan lotion, then the person is not sunburned. 0 (d) If a person’s hair colour is blond and that person has not used sun tan lotion, then the person is sunburned.
  • 25. Major Challenges and Research Issues • Requires individuals with knowledge of both disciplines • Requires collaboration of individuals from diverse disciplines
  • 26. Major Challenges and Research Issues • Data generation in biology/bioinformatics is outpacing methods of data analysis • Data interpretation and generation of hypotheses requires intelligence • AI offers established methods for knowledge representation and “intelligent” data interpretation