SlideShare a Scribd company logo
3
Most read
4
Most read
5
Most read
STATISTICAL METHODS OF QSAR
Rani T. Bhagat
M . Pharmacy,
(Pharmaceutical Chemistry)
1
CONTENT
INTRODUCTION
METHOD
CHEMOMETRIC TOOLS
QUALITY METRICS
IMPORTANCE
REFERANCES
2
3
Statistical method are mathematical formula, model and technique that are
used in statistical analysis of research data.
QSAR model represent the mathematical equation correlating the
response of chemical (activity or property ) with their structural and
physicochemical information in form of numerical quantities i,e
descriptor.
Regression based approach are employed data of chemical are
entirely numerical i, e quantitative or semi-quantitative chemical
response are modulated using classification technique
Developed QSAR model are also subjected to several validation test
to check for reliability of developed correlation method.
After it’s development ,QSAR model is usually verified by multiple
statistical validation strategies estimation of predictivity and stability.
Statistical tools used for data pre treatment feature selection , model
development , validation of QSAR .
Computer machine learning based method are also useful in developing
QSAR model.
INTRODUCTION
METHODS
1) Chemometric tools:
Various chemometric tools in QSAR
Pre-treatment of data table
Features selection
Multiple linear regression
Partial least square
Cluster analysis
2) Quality metrics:
Important metrics for determination quality model QSAR
Types of validation
Validation metrics for regression based QSAR model
Validation metrics employ in classification based QSAR
Parameter for receiver operating (ROC) characteristic analysis
4
1) Chemometric tools
Various chemometric tool used in QSAR
1) regression based approach
a)Multiple Linear Regression (MLR)
b)Partial Least Square (PLS)
2) classification based approach
a)Linear Descriminant Analysis (LDA)
b)Cluster analysis (CA)
 Pre-treatment of data table
 molecular str. Correctly draw
Biological activity or other activity have been taken from authentic source
Descriptor value have been computed using validate software
Response data for QSAR pattern modelling normal distribution pattern
Care shoud also taken to avoid duplicate in data set
Computation 3D descriptor optimization carried out 5
Features selection:
• Selection of appropriate descriptor for model development from pool of
large no. of descriptor is an imp.step in QSAR modelling.
• Selection done by variety of ways
Stepwise selection –
partial F- statistic = ‘F’ for inclusion and ‘F’ for exclusion
Multiple Linear Regression:
It is used in QSAR due to its simplicity ,trasparency, reproducibility,
interpretability.
Y= a0 + a1 × X1 + a2 × X2 + a3 × X3 +…………+an× Xn
Where, Y-response Dependent variable
a0-constant term
X1,X2,Xn-descriptorindependent variable
a1,a2,a3-regression coefficient
6
 Partial Least Square:
• It is better choice over MLR , PLS being generalization of MLR.
• It is used for predicting the pharmacokinetic, Pharmacodynamic ,
Toxicological property from structure derived physicochemical and
structural features.
• These method developed using the regression analysis.
Linear Descriminant Analysis
• LDA separate two more classes of object used for classification problem.
• LDA show the diff between classes of data predicted membership is
calculated by computing a discriminant function (DF) score.
• DF value smaller than cutoff value
DF= C1× X1 + C2 × X2 +……….+ CM × XM+ 0
Where , DF- Discriminant function
C-Discriminant coefficient
X- responding score foe variables
a- constant
m-No. of predictor variables 7
 Cluster Analysis:
• Cluster defined through analysis of data.
• Cluster analysis maximizes the similarity of cases within each
cluster .
• And maximizes the desimilarity between groups that initially
known.
• It is start with each case separate cluster and then combines the
cluster sequentially reducing no. of cluster at each step only one
cluster is left.
DENDOGRAM
Cluster 2
Cluster 3
Cluster 3
Cluster 1
8
2) Qualitymetrics
Important of metrics for determination of quality of QSAR models
• Advancement in fast and economical computational resources make it feasible to
compute large no. of descriptor using bvarious software.
• QSAR model used to check its predictivity for new untested molecule .
Types of validation
• OECD Principle – Principle 1
Principle 2
Principle 3
Principle 4
Principle 5
• Internal validation
• External validation
9
10
 Validation Metrics For Regression Based QSAR
1)Metrics for Internal Validation =
• Leave –one-out (LOO) Cross Validation
• Leave –many-out (LMO) Cross Validation
2)Metrics for External Validation
Validation Metrics Employed in Classification Based QSAR
Validation Metrics can access the performance of classification – based
model in terms of accurate quantitative prediction of dependent variables.
Parameters for = 1) Goodness of fit quality determination
2) Model Performance Parameter
a)True Positive (TP)
b) False Negative (FN)
c) False Positive (FP)
d)True Negative (TN)
11
 Parameter for Receiver Operating Characteristic
(ROC) Analysis
1) ROC Curve
TP rate- True Positive Rate on Y-axis
FP rate-False Positive Rate on X-axis
2) Metrics for pharmacological Distribution Diagram (PDD)
a) Activity Expectancy
b) Inactivity Expectancy
Activity Expectancy= Ea = % of actives
% of inactive + 100
Inactivity Expectancy= Ei = % of inactives
% of actives + 100
12
13
IMPORTANCE
It is used in
Computational
Chemistry represent
molecular structure as
numerical model
stimulate their
behaviour with the
help of quantum
mechanics .
It can Compute
energy related
properties such as
electronic ,
spectroscopic
properties for
molecule.
It is used for prediction
of Constitutional
Descriptor , molecular
weight , counts of
atom,bonds and rings
,topological descriptors,
connectivity of
molecule.
One of most
significant and
widely used
method is using
software computed
descriptor in
QSAR technique.
14
Equation generatedestablished in
QSAR studies are linear regression
equation.
A number of equation may be
generated or established for one
problem case under study. Statistic
also help in selecting one suitable best
fit equation out of them.
This may be done by checking std.
deviation or variance and other related
statistical parameter for data set used
for QSAR studies series of compound.
Correlation coefficient computed for
data set under study also help in
selecting appropriate QSAR equation.
Application of Statistics
15

More Related Content

PPTX
QSAR statistical methods for drug discovery(pharmacology m.pharm2nd sem)
PPTX
Statistical method used in QSAR.pptx
PPTX
docking
PPTX
Nitration
PPTX
De novo drug design
PPTX
Stereochemistry-Organic Chemistry
PPT
Thermal method of analysis
PPTX
Continuous flow reaction/ Chemistry
QSAR statistical methods for drug discovery(pharmacology m.pharm2nd sem)
Statistical method used in QSAR.pptx
docking
Nitration
De novo drug design
Stereochemistry-Organic Chemistry
Thermal method of analysis
Continuous flow reaction/ Chemistry

What's hot (20)

PPTX
Pharmacophore Mapping and Virtual Screening (Computer aided Drug design)
PPTX
Structure based in silico virtual screening
PPTX
(Kartik Tiwari) Denovo Drug Design.pptx
PPTX
PHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENT
PPTX
Pharmacophore mapping.pptx
PPTX
PREDICTION AND ANALYSIS OF ADMET PROPERTIES OF NEW.pptx
PPTX
3 D QSAR Approaches and Contour Map Analysis
PPTX
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
PPTX
De novo Drug Design By Yogesh Chaudhari.pptx
PPTX
De Novo Drug Design
PPTX
Virtual screening techniques
PPTX
in silico drug design and virtual screening technique
PPTX
Denovo Drug Design
PPTX
3 d qsar approaches structure
PPTX
Molecular and Quantum Mechanics in drug design
PPTX
CoMFA CoMFA Comparative Molecular Field Analysis)
PPTX
Pharmacophore mapping and virtual screening
PPTX
Virtual sreening
PPTX
Pharmacophore mapping
PPTX
3D QSAR
Pharmacophore Mapping and Virtual Screening (Computer aided Drug design)
Structure based in silico virtual screening
(Kartik Tiwari) Denovo Drug Design.pptx
PHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENT
Pharmacophore mapping.pptx
PREDICTION AND ANALYSIS OF ADMET PROPERTIES OF NEW.pptx
3 D QSAR Approaches and Contour Map Analysis
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
De novo Drug Design By Yogesh Chaudhari.pptx
De Novo Drug Design
Virtual screening techniques
in silico drug design and virtual screening technique
Denovo Drug Design
3 d qsar approaches structure
Molecular and Quantum Mechanics in drug design
CoMFA CoMFA Comparative Molecular Field Analysis)
Pharmacophore mapping and virtual screening
Virtual sreening
Pharmacophore mapping
3D QSAR
Ad

Similar to STATISTICAL METHOD OF QSAR (20)

PDF
In-silico structure activity relationship study of toxicity endpoints by QSAR...
PPTX
Analytical chemistry_Instrumentation_Introduction
PPTX
Summer 2015 Internship
PDF
2007 Pharmasug, Promotion Response Analysis
PDF
An introduction to variable and feature selection
PPTX
Analytical control strategy 1
PPTX
ADMET.pptx
PDF
Machine learning Mind Map
PDF
Probabilistic Collaborative Filtering with Negative Cross Entropy
PPT
Analytical method validation
PPT
Vanderbilt b
PDF
06-00-ACA-Evaluation.pdf
PPTX
How predictive models help Medicinal Chemists design better drugs_webinar
PPT
Analytical method validation
PPTX
Feature Selection Techniques for Software Fault Prediction (Summary)
PPTX
Response surface designs.Statistics/pptx
PPTX
0 introduction
PPTX
Classification Assessment Methods.pptx
PPTX
Method Selection - Selection of a prospective method for day to day laborator...
PDF
Parameter Optimisation for Automated Feature Point Detection
In-silico structure activity relationship study of toxicity endpoints by QSAR...
Analytical chemistry_Instrumentation_Introduction
Summer 2015 Internship
2007 Pharmasug, Promotion Response Analysis
An introduction to variable and feature selection
Analytical control strategy 1
ADMET.pptx
Machine learning Mind Map
Probabilistic Collaborative Filtering with Negative Cross Entropy
Analytical method validation
Vanderbilt b
06-00-ACA-Evaluation.pdf
How predictive models help Medicinal Chemists design better drugs_webinar
Analytical method validation
Feature Selection Techniques for Software Fault Prediction (Summary)
Response surface designs.Statistics/pptx
0 introduction
Classification Assessment Methods.pptx
Method Selection - Selection of a prospective method for day to day laborator...
Parameter Optimisation for Automated Feature Point Detection
Ad

Recently uploaded (20)

PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
System and Network Administraation Chapter 3
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
history of c programming in notes for students .pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Introduction to Artificial Intelligence
PDF
AI in Product Development-omnex systems
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
medical staffing services at VALiNTRY
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Digital Strategies for Manufacturing Companies
PDF
Nekopoi APK 2025 free lastest update
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Essential Infomation Tech presentation.pptx
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Design an Analysis of Algorithms II-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
System and Network Administraation Chapter 3
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
history of c programming in notes for students .pptx
How to Migrate SBCGlobal Email to Yahoo Easily
Operating system designcfffgfgggggggvggggggggg
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Introduction to Artificial Intelligence
AI in Product Development-omnex systems
Which alternative to Crystal Reports is best for small or large businesses.pdf
medical staffing services at VALiNTRY
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Digital Strategies for Manufacturing Companies
Nekopoi APK 2025 free lastest update
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Essential Infomation Tech presentation.pptx

STATISTICAL METHOD OF QSAR

  • 1. STATISTICAL METHODS OF QSAR Rani T. Bhagat M . Pharmacy, (Pharmaceutical Chemistry) 1
  • 3. 3 Statistical method are mathematical formula, model and technique that are used in statistical analysis of research data. QSAR model represent the mathematical equation correlating the response of chemical (activity or property ) with their structural and physicochemical information in form of numerical quantities i,e descriptor. Regression based approach are employed data of chemical are entirely numerical i, e quantitative or semi-quantitative chemical response are modulated using classification technique Developed QSAR model are also subjected to several validation test to check for reliability of developed correlation method. After it’s development ,QSAR model is usually verified by multiple statistical validation strategies estimation of predictivity and stability. Statistical tools used for data pre treatment feature selection , model development , validation of QSAR . Computer machine learning based method are also useful in developing QSAR model. INTRODUCTION
  • 4. METHODS 1) Chemometric tools: Various chemometric tools in QSAR Pre-treatment of data table Features selection Multiple linear regression Partial least square Cluster analysis 2) Quality metrics: Important metrics for determination quality model QSAR Types of validation Validation metrics for regression based QSAR model Validation metrics employ in classification based QSAR Parameter for receiver operating (ROC) characteristic analysis 4
  • 5. 1) Chemometric tools Various chemometric tool used in QSAR 1) regression based approach a)Multiple Linear Regression (MLR) b)Partial Least Square (PLS) 2) classification based approach a)Linear Descriminant Analysis (LDA) b)Cluster analysis (CA)  Pre-treatment of data table  molecular str. Correctly draw Biological activity or other activity have been taken from authentic source Descriptor value have been computed using validate software Response data for QSAR pattern modelling normal distribution pattern Care shoud also taken to avoid duplicate in data set Computation 3D descriptor optimization carried out 5
  • 6. Features selection: • Selection of appropriate descriptor for model development from pool of large no. of descriptor is an imp.step in QSAR modelling. • Selection done by variety of ways Stepwise selection – partial F- statistic = ‘F’ for inclusion and ‘F’ for exclusion Multiple Linear Regression: It is used in QSAR due to its simplicity ,trasparency, reproducibility, interpretability. Y= a0 + a1 × X1 + a2 × X2 + a3 × X3 +…………+an× Xn Where, Y-response Dependent variable a0-constant term X1,X2,Xn-descriptorindependent variable a1,a2,a3-regression coefficient 6
  • 7.  Partial Least Square: • It is better choice over MLR , PLS being generalization of MLR. • It is used for predicting the pharmacokinetic, Pharmacodynamic , Toxicological property from structure derived physicochemical and structural features. • These method developed using the regression analysis. Linear Descriminant Analysis • LDA separate two more classes of object used for classification problem. • LDA show the diff between classes of data predicted membership is calculated by computing a discriminant function (DF) score. • DF value smaller than cutoff value DF= C1× X1 + C2 × X2 +……….+ CM × XM+ 0 Where , DF- Discriminant function C-Discriminant coefficient X- responding score foe variables a- constant m-No. of predictor variables 7
  • 8.  Cluster Analysis: • Cluster defined through analysis of data. • Cluster analysis maximizes the similarity of cases within each cluster . • And maximizes the desimilarity between groups that initially known. • It is start with each case separate cluster and then combines the cluster sequentially reducing no. of cluster at each step only one cluster is left. DENDOGRAM Cluster 2 Cluster 3 Cluster 3 Cluster 1 8
  • 9. 2) Qualitymetrics Important of metrics for determination of quality of QSAR models • Advancement in fast and economical computational resources make it feasible to compute large no. of descriptor using bvarious software. • QSAR model used to check its predictivity for new untested molecule . Types of validation • OECD Principle – Principle 1 Principle 2 Principle 3 Principle 4 Principle 5 • Internal validation • External validation 9
  • 10. 10
  • 11.  Validation Metrics For Regression Based QSAR 1)Metrics for Internal Validation = • Leave –one-out (LOO) Cross Validation • Leave –many-out (LMO) Cross Validation 2)Metrics for External Validation Validation Metrics Employed in Classification Based QSAR Validation Metrics can access the performance of classification – based model in terms of accurate quantitative prediction of dependent variables. Parameters for = 1) Goodness of fit quality determination 2) Model Performance Parameter a)True Positive (TP) b) False Negative (FN) c) False Positive (FP) d)True Negative (TN) 11
  • 12.  Parameter for Receiver Operating Characteristic (ROC) Analysis 1) ROC Curve TP rate- True Positive Rate on Y-axis FP rate-False Positive Rate on X-axis 2) Metrics for pharmacological Distribution Diagram (PDD) a) Activity Expectancy b) Inactivity Expectancy Activity Expectancy= Ea = % of actives % of inactive + 100 Inactivity Expectancy= Ei = % of inactives % of actives + 100 12
  • 13. 13 IMPORTANCE It is used in Computational Chemistry represent molecular structure as numerical model stimulate their behaviour with the help of quantum mechanics . It can Compute energy related properties such as electronic , spectroscopic properties for molecule. It is used for prediction of Constitutional Descriptor , molecular weight , counts of atom,bonds and rings ,topological descriptors, connectivity of molecule. One of most significant and widely used method is using software computed descriptor in QSAR technique.
  • 14. 14 Equation generatedestablished in QSAR studies are linear regression equation. A number of equation may be generated or established for one problem case under study. Statistic also help in selecting one suitable best fit equation out of them. This may be done by checking std. deviation or variance and other related statistical parameter for data set used for QSAR studies series of compound. Correlation coefficient computed for data set under study also help in selecting appropriate QSAR equation. Application of Statistics
  • 15. 15