SlideShare a Scribd company logo
Rohan Jagdale
Pharmaceutical Analysis II
T. Y. B. Pharm
YTIP, UNIVERSITY OF MUMBAI
STATISTICAL DATA
HANDLING
Contents
❏ Introduction
❏ Normal distribution
❏ Confidence limits
❏ F-test
❏ T-test (paired & unpaired)
❏ Linear regression analysis
❏ Correlation coefficient
❏ Rejection of data (Q test)
Introduction
Pharmaceutical statistics is the application of statistics to matters concerning the
pharmaceutical industry. This can be from issues of design of experiments, to analysis of
drug trials, to issues of commercialization of a medicine.
▪Evaluate the activity of a drug; e.g.; effect of caffeine on attention; compare the analgesic
effect of a plant extract and NSAID
▪To explore whether the changes produced by the drug are due to the action of drug or by
chance NOR
▪To compare the action of two or more different drugs or different dosages of the same
drug are studied using statistical methods.
▪To find an association between disease and risk factors such as Coronary artery disease
and smoking
Normal distribution
Normal distribution, also known as the Gaussian distribution, is
a probability distribution that is symmetric about the mean,
showing that data near the mean are more frequent in
occurrence than data far from the mean. In graph form, normal
distribution will appear as a bell curve.
For example, heights, blood pressure, measurement error, and
IQ scores follow the normal distribution. It is also known as the
Gaussian distribution and the bell curve.
Confidence limit
● Two extreme measurements within which an
observation lies
● End points of the confidence interval
● Laraer confidence-
● A measure of the reliability (Re)
● The reliability of a mean ( x) increases as more
measurements are taken
● R=k(n)1/2
● Reliability increases with square root of number of
measurements
● Quickly reach a condition of limiting return
A point estimate is a single number
A confidence interval contains a certain set of possible values of
the parameter.
Lower confidence limit Point estimate Upper confidence limit
Width of confidence
Confidence Intervals and the Normal Distribution
A confidence interval is a range of values that gives the user a sense of how
precisely a statistic estimates a parameter. The most familiar use of a confidence
interval is likely the "margin of error" reported in news stories about polls: "The
margin of error is plus or minus 3 percentage points." But confidence intervals are
useful in contexts that go well beyond that simple situation.
Confidence intervals can be used with distributions that aren't normal—that are
highly skewed or in some other way non-normal. But it's easiest to understand what
they're about in symmetric distributions, so the topic is introduced here. Don't let that
get you thinking that you can use confidence intervals with normal distributions only.
F - Test or Analysis of variance (ANOVA)
An “F Test” is a catch-all term for any test that uses the F-distribution. In
most cases, when people talk about the F-Test, what they are actually
talking about is The F-Test to Compare Two Variances. However, the
f-statistic is used in a variety of tests including regression analysis, the
Chow test and the Scheffe Test (a post-hoc ANOVA test).
An F-test is any statistical test in which the test statistic has an
F-distribution under the null hypothesis. It is most often used when
comparing statistical models that have been fitted to a data set, in order to
identify the model that best fits the population from which the data were
sampled.
Why do we use F-test?
Because we want to find out if there is a significant difference between and among
the means of the two ore more independent groups.
When do we use F-Test?
when there is normal distribution and when the level of measurement
is expressed in interval or ratio data just like t-test and the z-test.
Statistical data handling
Statistical data handling
F-test in one way ANOVA
F-tests for Equality of Two Variances
Student’s t-test / t-test
The Student's t-Test was formulated by W.Gossett in the early 1900's.
His employer (brewery) had regulations concerning trade secrets that
prevented him from publishing his discovery, but in light of the
importance of theet distribution, Gossett was allowed to publish under
the pseudonym "Student".
The t-Test is typically used to compare the means of two populations
t = (Xi - u) / s
● t depends on desired confidence limit
● degrees of freedom (N-1)
● One uses this test when the population variance is
unknown, as is usually the case in the social sciences.
● The standard error of the sampling distribution of the
sample mean is estimated.
● At distribution is used to create confidence intervals, like
critical values.
Statistical data handling
Statistical data handling
Statistical data handling
The t - formula
Paired t test
Samples happens to be small
Variances of the two populations need not be equal
Populations are nomal
may be one sided or two sided
Unpaired t- test
The unpaired t method tests the null hypothesis that the population means related
to two independent, random samples from an approximately normal distribution
are equal (Altman, 1991; Armitage and Berry, 1994).
Assuming equal variances, the test statistic is calculated as:
t test applications
The T-test is used to compare the mean of two samples, dependent or
independent.
It can also be used to determine if the sample mean is different from the assumed
mean.
T-test has an application in determining the confidence interval for a sample mean.
Regression
A statistical measure that attempts to determine the strength of
the relationship between one dependent variable (usually
denoted by Y) and series of other changing variables (known as
independent variables).
Forecast value of a dependent variable (Y) from the value of
independent variables (X1, X2,...).
Regression Analysis
In statistics, regression analysis includes many techniques for
modeling and analyzing several variables, when the focus is on
the relationship between a dependent variable and one or more
independent variables.
Regression analysis is widely used for prediction and
forecasting,
Dependant and independant variable
▪Independent variables are regarded as inputs to a system and may take
on different values freely.
▪Dependent variables are those values that change as a consequence of
changes in other values in the system.
▪Independent variable is also called as predictor or explanatory variable
and it is denoted by X.
▪Dependent variable is also called as response variable and it is denoted
by Y.
Linear regression
▪The simplest mathematical relationship between two variables x
and y is a linear relationship.
▪In a cause and effect relationship, the independent variable is the
cause, and the dependent variable is the effect.
▪Least squares linear regression is a method for predicting the
value of a dependent variable Y, based on the value of an
independent variable X.
Statistical data handling
Example of simple linear regression which has one
independent variable
Correlation Coefficient
Defination
Correlation refers to technique used to measure the relafionship between
two or more variables.
A correlation coefficient is a statistical measure of the degree to which
changes to the valUe of one variable predict change to the value of
another.
A corelation can only indicate the presence or absence of a relationship,
not the nature of the relationship. Correlation is not causation.
Correlation Coefficient formula overview
Correlation Coefficient formula are used to find how strong a
relationship is between data. The formulas return a value
between-1 and 1, where:
▪1 indicates a strong positive relationship.
▪-1 indicates a strong negative relationship.
▪A result of zero indicates no relationship at all
Statistical data handling
Positive correlation
▪Association between variables such that high scores on one variable tend to have
high scores on the other variable
▪A direct relation between the variables
Negative correction
▪Association between variables such that high scores on one variable tend to have
low scores on the other variable.
▪An inverse relation between the variables
Correlation Coefficient formula
▪One of the most commonly used formulas in statistic is Pearson's correlation
coefficient formula.
Rejection of data /Q test / Dixon’s Q test
It is a statistical test for deciding if an outlier can be
removed from a set of data. It is used for small data
sets.
It is simpler to apply, as it does not require
calculation of the mean and standard deviation.
Rejection of result (Q test)
● Used for small data sets
● 90% CL is typically used
● Arrange data in increasing order
● Calculate range = highest value - lowest value
● Calculate gap |suspected value - nearest valuel
● Calculate Q ratio = gap/range
● Reject outlier if Qcal> Qtab
● Q tables are available
Statistical data handling
Example:: Is 167 an outlier in this set of data? Test at the 95%
confidence Level (i.e. at an alpha level of 5%).
167, 180, 188, 177, 181, 185, 189
Step 1: Sort your data into ascending order (smallest to largest).
167, 177, 180, 181, 185, 188, 189.
Step 2 :Find the Q statistic using the following formula:
dixon's q test statistic
Where:
x1 is the smallest (suspect) value,
x2 is the second smallest value,
and xn is the largest value.
Inserting the values into the formula, we get:
Q = (177 – 167) / 189 – 167 = 10/22 = 0.455.
Step 3: Find the Q critical value in the Q table (scroll to the bottom of the article for the
table). For a sample size of 7 and an alpha level of 5%, the critical value is 0.568.
Step 4: Compare the Q statistic from Step 2 with the Q critical value in Step 3. If the Q
statistic is greater than the Q critical value, the point is an outlier.
Qstatistic = 0.455.
Qcritical value = 0.568.
Solution: 0.455 is not greater than 0.568, so this point is not an outlier at an alpha level of
5%.
Statistical data handling

More Related Content

PDF
Regression analysis made easy
PPT
Nonparametric hypothesis testing methods
PPTX
AB TESTING
PDF
Multivariate data analysis regression, cluster and factor analysis on spss
PPTX
T test, independant sample, paired sample and anova
PPTX
Factor Analysis (Marketing Research)
PPTX
Factor Analysis in Research
PPTX
Statistical tests of significance and Student`s T-Test
Regression analysis made easy
Nonparametric hypothesis testing methods
AB TESTING
Multivariate data analysis regression, cluster and factor analysis on spss
T test, independant sample, paired sample and anova
Factor Analysis (Marketing Research)
Factor Analysis in Research
Statistical tests of significance and Student`s T-Test

What's hot (15)

PPTX
Tests of significance
PPTX
The Sign Test
PPTX
Parametric tests seminar
PPT
Freq distribution
PPT
Tests of Significance: The Basics Concepts
PPTX
Nonparametric tests
PPTX
Parametric tests
PPTX
Factor analysis
PPTX
Correlation
PPTX
Inferential statistics correlations
PPTX
Exploratory factor analysis
DOCX
Annova test
PPTX
Inferential statistics quantitative data - single sample and 2 groups
PDF
Exploratory Factor Analysis
PPTX
Logistic regression analysis
Tests of significance
The Sign Test
Parametric tests seminar
Freq distribution
Tests of Significance: The Basics Concepts
Nonparametric tests
Parametric tests
Factor analysis
Correlation
Inferential statistics correlations
Exploratory factor analysis
Annova test
Inferential statistics quantitative data - single sample and 2 groups
Exploratory Factor Analysis
Logistic regression analysis
Ad

Similar to Statistical data handling (20)

PPTX
s.analysis
PPTX
STATISTICAL TESTS USED IN VARIOUS STUDIES
PPTX
Basic of Statistical Inference Part-V: Types of Hypothesis Test (Parametric)
DOCX
Parametric vs non parametric test
PPT
Quantitative_analysis.ppt
PPT
PDF
Artificial Intelligence (Unit - 8).pdf
PPT
Statistics
PPTX
A statistical test with SPSS steps paired sample t test.pptx
PPTX
F unit 5.pptx
PPTX
Meta analysis with R
PPTX
Descriptive Analysis.pptx
PPTX
Data Processing and Statistical Treatment.pptx
PDF
Selection of appropriate data analysis technique
DOCX
© 2014 Laureate Education, Inc. Page 1 of 5 Week 4 A.docx
PPT
Emil Pulido on Quantitative Research: Inferential Statistics
PPT
Quantitative Data analysis
PPTX
BIOSTATISTICS SLIDESHARE.pptx
PDF
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdf
DOCX
2 or more samples
s.analysis
STATISTICAL TESTS USED IN VARIOUS STUDIES
Basic of Statistical Inference Part-V: Types of Hypothesis Test (Parametric)
Parametric vs non parametric test
Quantitative_analysis.ppt
Artificial Intelligence (Unit - 8).pdf
Statistics
A statistical test with SPSS steps paired sample t test.pptx
F unit 5.pptx
Meta analysis with R
Descriptive Analysis.pptx
Data Processing and Statistical Treatment.pptx
Selection of appropriate data analysis technique
© 2014 Laureate Education, Inc. Page 1 of 5 Week 4 A.docx
Emil Pulido on Quantitative Research: Inferential Statistics
Quantitative Data analysis
BIOSTATISTICS SLIDESHARE.pptx
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdf
2 or more samples
Ad

More from Rohan Jagdale (20)

PDF
My Fantasy world and You ♡. A silent way towards reality
PDF
Curriculum Vitae -- Rohan Ramesh Jagdale.
PDF
Hidden Job market : The invisible path to your next opportunity
PDF
Stranger Things Science (part 1) .......
PDF
mesentric teratoma .pdf
PDF
Spray Bandage -1.pdf
PDF
Equipments for Drying of Herbal extracts.pdf
PDF
Emulgel
PDF
Nanomedicine
PDF
Constipation
PDF
Antibiotic resistance a global concern part ii
PDF
Morphological characters & marketed formulations of herbal plants
PDF
Alkaloids marketed products & uses
PPTX
Phenylpropanoids ( Flavonoids, coumarin, lignan )
PDF
Aloe emodin
PDF
Liquorice (glycyrrhiza glabra linn.)
PDF
Black mustard
PDF
Glycosides
PDF
Concepts of chromatography
PDF
Advances in lung cancer research
My Fantasy world and You ♡. A silent way towards reality
Curriculum Vitae -- Rohan Ramesh Jagdale.
Hidden Job market : The invisible path to your next opportunity
Stranger Things Science (part 1) .......
mesentric teratoma .pdf
Spray Bandage -1.pdf
Equipments for Drying of Herbal extracts.pdf
Emulgel
Nanomedicine
Constipation
Antibiotic resistance a global concern part ii
Morphological characters & marketed formulations of herbal plants
Alkaloids marketed products & uses
Phenylpropanoids ( Flavonoids, coumarin, lignan )
Aloe emodin
Liquorice (glycyrrhiza glabra linn.)
Black mustard
Glycosides
Concepts of chromatography
Advances in lung cancer research

Recently uploaded (20)

PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Types and Its function , kingdom of life
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPH.pptx obstetrics and gynecology in nursing
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Cell Structure & Organelles in detailed.
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Classroom Observation Tools for Teachers
Cell Types and Its function , kingdom of life
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
O5-L3 Freight Transport Ops (International) V1.pdf
Basic Mud Logging Guide for educational purpose
Renaissance Architecture: A Journey from Faith to Humanism
Final Presentation General Medicine 03-08-2024.pptx
01-Introduction-to-Information-Management.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf

Statistical data handling

  • 1. Rohan Jagdale Pharmaceutical Analysis II T. Y. B. Pharm YTIP, UNIVERSITY OF MUMBAI STATISTICAL DATA HANDLING
  • 2. Contents ❏ Introduction ❏ Normal distribution ❏ Confidence limits ❏ F-test ❏ T-test (paired & unpaired) ❏ Linear regression analysis ❏ Correlation coefficient ❏ Rejection of data (Q test)
  • 3. Introduction Pharmaceutical statistics is the application of statistics to matters concerning the pharmaceutical industry. This can be from issues of design of experiments, to analysis of drug trials, to issues of commercialization of a medicine. ▪Evaluate the activity of a drug; e.g.; effect of caffeine on attention; compare the analgesic effect of a plant extract and NSAID ▪To explore whether the changes produced by the drug are due to the action of drug or by chance NOR ▪To compare the action of two or more different drugs or different dosages of the same drug are studied using statistical methods. ▪To find an association between disease and risk factors such as Coronary artery disease and smoking
  • 5. Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell curve. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.
  • 6. Confidence limit ● Two extreme measurements within which an observation lies ● End points of the confidence interval ● Laraer confidence- ● A measure of the reliability (Re) ● The reliability of a mean ( x) increases as more measurements are taken ● R=k(n)1/2 ● Reliability increases with square root of number of measurements ● Quickly reach a condition of limiting return
  • 7. A point estimate is a single number A confidence interval contains a certain set of possible values of the parameter. Lower confidence limit Point estimate Upper confidence limit Width of confidence
  • 8. Confidence Intervals and the Normal Distribution A confidence interval is a range of values that gives the user a sense of how precisely a statistic estimates a parameter. The most familiar use of a confidence interval is likely the "margin of error" reported in news stories about polls: "The margin of error is plus or minus 3 percentage points." But confidence intervals are useful in contexts that go well beyond that simple situation. Confidence intervals can be used with distributions that aren't normal—that are highly skewed or in some other way non-normal. But it's easiest to understand what they're about in symmetric distributions, so the topic is introduced here. Don't let that get you thinking that you can use confidence intervals with normal distributions only.
  • 9. F - Test or Analysis of variance (ANOVA) An “F Test” is a catch-all term for any test that uses the F-distribution. In most cases, when people talk about the F-Test, what they are actually talking about is The F-Test to Compare Two Variances. However, the f-statistic is used in a variety of tests including regression analysis, the Chow test and the Scheffe Test (a post-hoc ANOVA test). An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled.
  • 10. Why do we use F-test? Because we want to find out if there is a significant difference between and among the means of the two ore more independent groups. When do we use F-Test? when there is normal distribution and when the level of measurement is expressed in interval or ratio data just like t-test and the z-test.
  • 13. F-test in one way ANOVA
  • 14. F-tests for Equality of Two Variances
  • 15. Student’s t-test / t-test The Student's t-Test was formulated by W.Gossett in the early 1900's. His employer (brewery) had regulations concerning trade secrets that prevented him from publishing his discovery, but in light of the importance of theet distribution, Gossett was allowed to publish under the pseudonym "Student". The t-Test is typically used to compare the means of two populations
  • 16. t = (Xi - u) / s ● t depends on desired confidence limit ● degrees of freedom (N-1) ● One uses this test when the population variance is unknown, as is usually the case in the social sciences. ● The standard error of the sampling distribution of the sample mean is estimated. ● At distribution is used to create confidence intervals, like critical values.
  • 20. The t - formula
  • 21. Paired t test Samples happens to be small Variances of the two populations need not be equal Populations are nomal may be one sided or two sided
  • 22. Unpaired t- test The unpaired t method tests the null hypothesis that the population means related to two independent, random samples from an approximately normal distribution are equal (Altman, 1991; Armitage and Berry, 1994). Assuming equal variances, the test statistic is calculated as:
  • 23. t test applications The T-test is used to compare the mean of two samples, dependent or independent. It can also be used to determine if the sample mean is different from the assumed mean. T-test has an application in determining the confidence interval for a sample mean.
  • 24. Regression A statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and series of other changing variables (known as independent variables). Forecast value of a dependent variable (Y) from the value of independent variables (X1, X2,...).
  • 25. Regression Analysis In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. Regression analysis is widely used for prediction and forecasting,
  • 26. Dependant and independant variable ▪Independent variables are regarded as inputs to a system and may take on different values freely. ▪Dependent variables are those values that change as a consequence of changes in other values in the system. ▪Independent variable is also called as predictor or explanatory variable and it is denoted by X. ▪Dependent variable is also called as response variable and it is denoted by Y.
  • 27. Linear regression ▪The simplest mathematical relationship between two variables x and y is a linear relationship. ▪In a cause and effect relationship, the independent variable is the cause, and the dependent variable is the effect. ▪Least squares linear regression is a method for predicting the value of a dependent variable Y, based on the value of an independent variable X.
  • 29. Example of simple linear regression which has one independent variable
  • 31. Defination Correlation refers to technique used to measure the relafionship between two or more variables. A correlation coefficient is a statistical measure of the degree to which changes to the valUe of one variable predict change to the value of another. A corelation can only indicate the presence or absence of a relationship, not the nature of the relationship. Correlation is not causation.
  • 32. Correlation Coefficient formula overview Correlation Coefficient formula are used to find how strong a relationship is between data. The formulas return a value between-1 and 1, where: ▪1 indicates a strong positive relationship. ▪-1 indicates a strong negative relationship. ▪A result of zero indicates no relationship at all
  • 34. Positive correlation ▪Association between variables such that high scores on one variable tend to have high scores on the other variable ▪A direct relation between the variables
  • 35. Negative correction ▪Association between variables such that high scores on one variable tend to have low scores on the other variable. ▪An inverse relation between the variables
  • 36. Correlation Coefficient formula ▪One of the most commonly used formulas in statistic is Pearson's correlation coefficient formula.
  • 37. Rejection of data /Q test / Dixon’s Q test It is a statistical test for deciding if an outlier can be removed from a set of data. It is used for small data sets. It is simpler to apply, as it does not require calculation of the mean and standard deviation.
  • 38. Rejection of result (Q test) ● Used for small data sets ● 90% CL is typically used ● Arrange data in increasing order ● Calculate range = highest value - lowest value ● Calculate gap |suspected value - nearest valuel ● Calculate Q ratio = gap/range ● Reject outlier if Qcal> Qtab ● Q tables are available
  • 40. Example:: Is 167 an outlier in this set of data? Test at the 95% confidence Level (i.e. at an alpha level of 5%). 167, 180, 188, 177, 181, 185, 189 Step 1: Sort your data into ascending order (smallest to largest). 167, 177, 180, 181, 185, 188, 189. Step 2 :Find the Q statistic using the following formula: dixon's q test statistic Where: x1 is the smallest (suspect) value, x2 is the second smallest value, and xn is the largest value.
  • 41. Inserting the values into the formula, we get: Q = (177 – 167) / 189 – 167 = 10/22 = 0.455. Step 3: Find the Q critical value in the Q table (scroll to the bottom of the article for the table). For a sample size of 7 and an alpha level of 5%, the critical value is 0.568. Step 4: Compare the Q statistic from Step 2 with the Q critical value in Step 3. If the Q statistic is greater than the Q critical value, the point is an outlier. Qstatistic = 0.455. Qcritical value = 0.568. Solution: 0.455 is not greater than 0.568, so this point is not an outlier at an alpha level of 5%.