CENTRAL LIMIT THEOREM
INFERENCE
Critical Regions
Neha Jain
School of Biotechnology
Types of Biological Variables
• There are three main types of variables: measurement variables,
which are expressed as numbers (such as 3.7 mm); nominal
variables, which are expressed as names (such as "female"); and
ranked variables, which are expressed as positions (such as
"third").
• Measurement variables
• Measurement variables are, as the name implies, things you can
measure. An individual observation of a measurement variable is
always a number. Examples include length, weight, pH, and bone
density. Other names for them include "numeric" or
"quantitative" variables.
• Nominal variables
• Nominal variables classify observations into discrete categories. Examples of nominal
variables include sex (the possible values are male or female), genotype (values
are AA, Aa, or aa), or ankle condition (values are normal, sprained, torn ligament, or
broken).
• A good rule of thumb is that an individual observation of a nominal variable can be
expressed as a word, not a number. If you have just two values of what would
normally be a measurement variable, it's nominal instead: think of it as "present" vs.
"absent" or "low" vs. "high”.
• Nominal variables are often used to divide individuals up into categories, so that
other variables may be compared among the categories. In the comparison of head
width in male vs. female isopods, the isopods are classified by sex, a nominal
variable, and the measurement variable head width is compared between the sexes.
• Independent and dependent variables
• Another way to classify variables is as independent or dependent variables. An independent
variable (also known as a predictor, explanatory, or exposure variable) is a variable that you
think may cause a change in a dependent variable (also known as an outcome or response
variable).
• For example, if you grow isopods with 10 different mannose concentrations in their food
and measure their growth rate, the mannose concentration is an independent variable and
the growth rate is a dependent variable, because you think that different mannose
concentrations may cause different growth rates. Any of the three variable types
(measurement, nominal or ranked) can be either independent or dependent.
• For example, if you want to know whether sex affects body temperature in mice, sex would
be an independent variable and temperature would be a dependent variable. If you wanted
to know whether the incubation temperature of eggs affects sex in turtles, temperature
would be the independent variable and sex would be the dependent variable.
• The normal distribution
• Many measurement variables in biology fit the normal distribution fairly
well.
• According to the central limit theorem, if you have several different
variables that each have some distribution of values and add them
together, the sum follows the normal distribution fairly well. It doesn't
matter what the shape of the distribution of the individual variables is, the
sum will still be normal. The distribution of the sum fits the normal
distribution more closely as the number of variables increases. The graphs
below are frequency histograms of 5,000 numbers. The first graph shows
the distribution of a single number with a uniform distribution between 0
and 1. The other graphs show the distributions of the sums of two, three, or
four random numbers.
• As you can see, as more random numbers are added together, the frequency
distribution of the sum quickly approaches a bell-shaped curve.
• This is analogous to a biological variable that is the result of several different factors. For
example, let's say that you've captured 100 lizards and measured their maximum
running speed. The running speed of an individual lizard would be a function of its
genotype at many genes; its nutrition as it was growing up; the diseases it's had; how
full its stomach is now; how much water it's drunk; and how motivated it is to run fast
on a lizard racetrack. Each of these variables might not be normally distributed; the
effect of disease might be to either subtract 10 cm/sec if it has had lizard-slowing
disease, or add 20 cm/sec if it has not; the effect of gene A might be to add 25 cm/sec
for genotype AA, 20 cm/sec for genotype Aa, or 15 cm/sec for genotype aa. Even though
the individual variables might not have normally distributed effects, the running speed
that is the sum of all the effects would be normally distributed.
CENTRAL LIMIT THEOREM
• In probability theory, the central limit theorem (CLT) states that, given certain
conditions, “the mean of a sufficiently large number of independent random
variables, each with finite mean and variance, will be approximately normally
distributed.”
• The Central Limit Theorem states that if you draw a sample from a population and
calculate the mean of the sample, and then repeat it several times, the means will
form a normal distribution around the true mean of the original population. This
means that even if the original population has a wild distribution, repeated
samples of the population come closer and closer to the true mean.
• When sampling is from a normal population, the means of samples drawn from
such a population are themselves normally distributed.
• But when sampling is not from a normal population, the size of the sample plays a
critical role.
• When n is small, the shape of the distribution will depend largely on the shape of
the parent population, but as n gets large (n> 30), the shape of the sampling
distribution will become more and more like a normal distribution, irrespective of
the shape of the parent population.
• The theorem which explains this sort of
relationship between the shape of the population
distribution and the sampling distribution of the
mean is known as the central limit theorem.
• This theorem is by far the most important
theorem in statistical inference.
• It assures that the sampling distribution of the
mean approaches normal distribution as the
sample size increases.
• In formal terms, the central limit theorem states that “the
distribution of means of random samples taken from a
population having mean µ and finite variance σ2
approaches the normal distribution with mean µ and
variance σ2/n as n goes to infinity.”
• The significance of the central limit theorem lies in the
fact that it permits us to use sample statistics to make
inferences about population parameters without knowing
anything about the shape of the frequency distribution of
that population other than what we can get from the
sample.”
INFERENCE
INFERENCE
• An inference is the act of coming to a logical
conclusion without actually eye witnessing or
having first hand knowledge of certain events.
• Biological network inference is the process of
making inferences and predictions about
biological networks.
• Authors don’t always tell every detail or give
every bit of information in nonfiction or in
fiction stories.
INFERENCE
• Readers make inferences to supply information
that authors leave out.
• When you make an inference, you add what you
already know to what an author has told you.
• Sometimes you have to come to a conclusion
when you don’t have all the facts.
• You can use the clues given to help you make an
inference (a guess based on known facts).
Examples
• . What the author said + what I know = my inference
The
weather
had been
scorching
for
weeks.
Summer
is the
hottest
time of
the year.
It is
summer.
.
• .
What the author said + what I know = my inference
Alvin took
out a
pitcher of
cold
lemonade
.
You keep
things cold
in a
refrigerato
r
Alvin took
the
lemonade
out of the
refrigerato
r
Inference Vs Hypothesis
• Inference is a logical conclusion based on
experiments, and a hypothesis is what one thinks is
going to happen (an educated guess).
• In science, an inference refers to reasonable
conclusions or possible hypotheses drawn from a
small sampling of data.
• Scientists make inferences all the time, which may
prove correlations, but don’t prove cause. In fact most
“known” scientific facts, are inferences since it would
be impossible to fully gather all material on a subject.
Critical Regions
Critical Value(s)
• The critical value(s) for a hypothesis test is a
threshold to which the value of the test
statistic in a sample is compared to determine
whether or not the null hypothesis is rejected.
• The critical value for any hypothesis test
depends on the significance level at which the
test is carried out, and whether the test is
one-sided or two-sided.
Critical Region
Set of all values of the test statistic that
would cause a rejection of the
null hypothesis
Critical
Regions
• The sample space for the test statistic is
partitioned into two regions; one region (the
critical region) will lead us to reject the null
hypothesis H0, the other will not. So, if the
observed value of the test statistic is a
member of the critical region, we conclude
"Reject H0"; if it is not a member of the critical
region then we conclude "Do not reject H0".

More Related Content

PPTX
Introduction to Statistics
PPTX
Scales of measurement
PPT
Class lecture notes #1 (statistics for research)
PPTX
Introduction statistics
PPTX
Statistics for IB Biology
PPTX
RESEARCH Design and its VARIABLES.pptx
PPTX
Tests of significance Periodontology
PPT
Stats-Review-Maie-St-John-5-20-2009.ppt
Introduction to Statistics
Scales of measurement
Class lecture notes #1 (statistics for research)
Introduction statistics
Statistics for IB Biology
RESEARCH Design and its VARIABLES.pptx
Tests of significance Periodontology
Stats-Review-Maie-St-John-5-20-2009.ppt

Similar to Central limit. of data bioinformatic studies (20)

PPTX
REVIEWCOMPREHENSIVE-EXAM. BY bjohn MBpptx
PPTX
Topic Two Biostatistics.pptx
PPTX
PARAMETRIC TESTS.pptx
PPTX
COM 201_Inferential Statistics_18032022.pptx
PPTX
Biostatistics
PPTX
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
PPTX
Statistics
PPT
Scientific method terms_-_heart_rates_lab
PPTX
Basic of Biostatisticsin the field of healthcare research.pptx
PPTX
Biostatistics PowerPoint Presentation...
PPTX
Statistics-3 : Statistical Inference - Core
PPT
Overview-of-Biostatistics-Jody-Krieman-5-6-09 (1).ppt
PPT
Overview-of-Biostatistics-Jody-Kriemanpt
PPT
Intro_BiostatPG.ppt
PPTX
11 APR_NR_Hypothesis _I_FN.ppt-- nursing x
PPTX
Research objective,variable hypothesis .
PPTX
Introduction to basics of bio statistics.
PPTX
BIOSTATISTICS + EXERCISES
PPTX
I. Chap1 Introduction to Biostatistics .pptx
PPTX
Machine learning pre requisite
REVIEWCOMPREHENSIVE-EXAM. BY bjohn MBpptx
Topic Two Biostatistics.pptx
PARAMETRIC TESTS.pptx
COM 201_Inferential Statistics_18032022.pptx
Biostatistics
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
Statistics
Scientific method terms_-_heart_rates_lab
Basic of Biostatisticsin the field of healthcare research.pptx
Biostatistics PowerPoint Presentation...
Statistics-3 : Statistical Inference - Core
Overview-of-Biostatistics-Jody-Krieman-5-6-09 (1).ppt
Overview-of-Biostatistics-Jody-Kriemanpt
Intro_BiostatPG.ppt
11 APR_NR_Hypothesis _I_FN.ppt-- nursing x
Research objective,variable hypothesis .
Introduction to basics of bio statistics.
BIOSTATISTICS + EXERCISES
I. Chap1 Introduction to Biostatistics .pptx
Machine learning pre requisite
Ad

Recently uploaded (20)

PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Microsoft 365 products and services descrption
PPTX
Business_Capability_Map_Collection__pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Introduction to Inferential Statistics.pptx
PPTX
Managing Community Partner Relationships
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PDF
Transcultural that can help you someday.
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPT
Image processing and pattern recognition 2.ppt
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Microsoft 365 products and services descrption
Business_Capability_Map_Collection__pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
CYBER SECURITY the Next Warefare Tactics
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to Inferential Statistics.pptx
Managing Community Partner Relationships
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
SET 1 Compulsory MNH machine learning intro
Pilar Kemerdekaan dan Identi Bangsa.pptx
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Transcultural that can help you someday.
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
DU, AIS, Big Data and Data Analytics.ppt
Image processing and pattern recognition 2.ppt
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Ad

Central limit. of data bioinformatic studies

  • 1. CENTRAL LIMIT THEOREM INFERENCE Critical Regions Neha Jain School of Biotechnology
  • 2. Types of Biological Variables • There are three main types of variables: measurement variables, which are expressed as numbers (such as 3.7 mm); nominal variables, which are expressed as names (such as "female"); and ranked variables, which are expressed as positions (such as "third"). • Measurement variables • Measurement variables are, as the name implies, things you can measure. An individual observation of a measurement variable is always a number. Examples include length, weight, pH, and bone density. Other names for them include "numeric" or "quantitative" variables.
  • 3. • Nominal variables • Nominal variables classify observations into discrete categories. Examples of nominal variables include sex (the possible values are male or female), genotype (values are AA, Aa, or aa), or ankle condition (values are normal, sprained, torn ligament, or broken). • A good rule of thumb is that an individual observation of a nominal variable can be expressed as a word, not a number. If you have just two values of what would normally be a measurement variable, it's nominal instead: think of it as "present" vs. "absent" or "low" vs. "high”. • Nominal variables are often used to divide individuals up into categories, so that other variables may be compared among the categories. In the comparison of head width in male vs. female isopods, the isopods are classified by sex, a nominal variable, and the measurement variable head width is compared between the sexes.
  • 4. • Independent and dependent variables • Another way to classify variables is as independent or dependent variables. An independent variable (also known as a predictor, explanatory, or exposure variable) is a variable that you think may cause a change in a dependent variable (also known as an outcome or response variable). • For example, if you grow isopods with 10 different mannose concentrations in their food and measure their growth rate, the mannose concentration is an independent variable and the growth rate is a dependent variable, because you think that different mannose concentrations may cause different growth rates. Any of the three variable types (measurement, nominal or ranked) can be either independent or dependent. • For example, if you want to know whether sex affects body temperature in mice, sex would be an independent variable and temperature would be a dependent variable. If you wanted to know whether the incubation temperature of eggs affects sex in turtles, temperature would be the independent variable and sex would be the dependent variable.
  • 5. • The normal distribution • Many measurement variables in biology fit the normal distribution fairly well. • According to the central limit theorem, if you have several different variables that each have some distribution of values and add them together, the sum follows the normal distribution fairly well. It doesn't matter what the shape of the distribution of the individual variables is, the sum will still be normal. The distribution of the sum fits the normal distribution more closely as the number of variables increases. The graphs below are frequency histograms of 5,000 numbers. The first graph shows the distribution of a single number with a uniform distribution between 0 and 1. The other graphs show the distributions of the sums of two, three, or four random numbers.
  • 6. • As you can see, as more random numbers are added together, the frequency distribution of the sum quickly approaches a bell-shaped curve. • This is analogous to a biological variable that is the result of several different factors. For example, let's say that you've captured 100 lizards and measured their maximum running speed. The running speed of an individual lizard would be a function of its genotype at many genes; its nutrition as it was growing up; the diseases it's had; how full its stomach is now; how much water it's drunk; and how motivated it is to run fast on a lizard racetrack. Each of these variables might not be normally distributed; the effect of disease might be to either subtract 10 cm/sec if it has had lizard-slowing disease, or add 20 cm/sec if it has not; the effect of gene A might be to add 25 cm/sec for genotype AA, 20 cm/sec for genotype Aa, or 15 cm/sec for genotype aa. Even though the individual variables might not have normally distributed effects, the running speed that is the sum of all the effects would be normally distributed.
  • 7. CENTRAL LIMIT THEOREM • In probability theory, the central limit theorem (CLT) states that, given certain conditions, “the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.” • The Central Limit Theorem states that if you draw a sample from a population and calculate the mean of the sample, and then repeat it several times, the means will form a normal distribution around the true mean of the original population. This means that even if the original population has a wild distribution, repeated samples of the population come closer and closer to the true mean. • When sampling is from a normal population, the means of samples drawn from such a population are themselves normally distributed. • But when sampling is not from a normal population, the size of the sample plays a critical role. • When n is small, the shape of the distribution will depend largely on the shape of the parent population, but as n gets large (n> 30), the shape of the sampling distribution will become more and more like a normal distribution, irrespective of the shape of the parent population.
  • 8. • The theorem which explains this sort of relationship between the shape of the population distribution and the sampling distribution of the mean is known as the central limit theorem. • This theorem is by far the most important theorem in statistical inference. • It assures that the sampling distribution of the mean approaches normal distribution as the sample size increases.
  • 9. • In formal terms, the central limit theorem states that “the distribution of means of random samples taken from a population having mean µ and finite variance σ2 approaches the normal distribution with mean µ and variance σ2/n as n goes to infinity.” • The significance of the central limit theorem lies in the fact that it permits us to use sample statistics to make inferences about population parameters without knowing anything about the shape of the frequency distribution of that population other than what we can get from the sample.”
  • 11. INFERENCE • An inference is the act of coming to a logical conclusion without actually eye witnessing or having first hand knowledge of certain events. • Biological network inference is the process of making inferences and predictions about biological networks. • Authors don’t always tell every detail or give every bit of information in nonfiction or in fiction stories.
  • 12. INFERENCE • Readers make inferences to supply information that authors leave out. • When you make an inference, you add what you already know to what an author has told you. • Sometimes you have to come to a conclusion when you don’t have all the facts. • You can use the clues given to help you make an inference (a guess based on known facts).
  • 13. Examples • . What the author said + what I know = my inference The weather had been scorching for weeks. Summer is the hottest time of the year. It is summer.
  • 14. . • . What the author said + what I know = my inference Alvin took out a pitcher of cold lemonade . You keep things cold in a refrigerato r Alvin took the lemonade out of the refrigerato r
  • 15. Inference Vs Hypothesis • Inference is a logical conclusion based on experiments, and a hypothesis is what one thinks is going to happen (an educated guess). • In science, an inference refers to reasonable conclusions or possible hypotheses drawn from a small sampling of data. • Scientists make inferences all the time, which may prove correlations, but don’t prove cause. In fact most “known” scientific facts, are inferences since it would be impossible to fully gather all material on a subject.
  • 17. Critical Value(s) • The critical value(s) for a hypothesis test is a threshold to which the value of the test statistic in a sample is compared to determine whether or not the null hypothesis is rejected. • The critical value for any hypothesis test depends on the significance level at which the test is carried out, and whether the test is one-sided or two-sided.
  • 18. Critical Region Set of all values of the test statistic that would cause a rejection of the null hypothesis Critical Regions
  • 19. • The sample space for the test statistic is partitioned into two regions; one region (the critical region) will lead us to reject the null hypothesis H0, the other will not. So, if the observed value of the test statistic is a member of the critical region, we conclude "Reject H0"; if it is not a member of the critical region then we conclude "Do not reject H0".