Guessing the Unknown
The Quest
                                  What can we
                                  say about this
                                    black box?
                                           E.g., What is the
                                          probability that it
                                         generates a number
               5   12 3   9 28
                                            bigger than 5?




Observations
Distributions

What if we had many many
      observations?

                           Value   Frequency
                           -1      0.3
                           0       0.2
                                                  Sum of
                           1       0.1         frequencies
  This table is the        2       0.1              is 1
    distribution           3       0.1
  associated with
                           4       0.1
   this black box
                           5       0.1
Distributions Graphically




                             -1 0 1 2 3 4 5




            Area under the
              curve is 1
The Challenge
We do not have many many
      observations


                So we cannot infer the
                 distribution from the
                     observations



                                 What can we do then?
What can we do with few
             observations?
Assume distribution is known
                                                 E.g., Normal, Binomial
  (from prior knowledge or
                                                           etc
        other means)


           I.e., model approximately using
                 a canonical distribution


                               But the parameters are not
                                         known


                                          Can these parameters be
                                            determined from the
                                               observations?
Why Canonical Distributions

 Value   Frequency
 -1      0.3
 0       0.2
 1       0.1                        Too verbose a
                                  description for the
 2       0.1                         distribution
 3       0.1
 4       0.1
 5       0.1


                   Can the entire distribution be
               described (even approximately) by just
               a few parameters, while modeling the
                           data accurately
Example: Binomial Distribution
       A coin that yields 1 with                           Observations
       probability p and 0 with
       probability 1- p, tossed n           1 0 1 1 1 ….
         times, independently

                                    Value    Frequency
            Number of 1’s?          0

                                    1

    Distribution,                   2
  μ=np,σ2=np(1-p)

                                    n-1
Can one determine p
   from the (few)                   n
   observations?
Other Canonical Distributions
               Normal μ, σ2




                         Poisson μ =r,σ2=r



                                  Negative Binomial μ =rp/(1-p),
                                          σ2= rp/(1-p)2
  What are
these? Later
    talk
                                                 Gamma μ=kθ, σ2 =kθ2
Back to the Quest

We have few observations

     Assume these are from a
     known distribution family

                 But with unknown
                    parameters

                     How do we determine the
                          parameters?

                             How do we determine μ,
                                      σ2?
Estimating Mean


          μ, σ2




Estimate for the
 mean; a good
  estimate??
μ, σ2




               What is the mean and variance
Normal!! For       of this distribution?
 modest n.
μ, σ2




 Unbiased




 Tight as n
grows larger
Estimating Variance


            μ, σ2




Estimate for the
variance; a good
   estimate??
μ, σ2
           μ, σ2




Bias
Estimating Variance Correctly


         μ, σ2




Unbiased!!
A Mind Reading Game
• Your friend chooses a number (one of 1,3,5) in his/her
  mind
   – Call this i

• He/She then rolls a 6-faced die 30 times, privately
   – For each roll, he/she declares Heads if the number on the
     die is <=i, and Tails otherwise

• Your goal is to guess i solely from this sequence of n
  Heads and Tails.

• Can you read your friend’s mind?
Thank You

More Related Content

PPTX
Least common ancestors in constant time
PPTX
Introduction to statistics iii
PPTX
Rules of a Quantum World
PDF
Strand genomics features in CIO review
PDF
NIPS2010: high-dimensional statistics: prediction, association and causal inf...
PPTX
Introduction to statistics ii
PPT
PPTX
Random Variables and Probabiity Distribution
Least common ancestors in constant time
Introduction to statistics iii
Rules of a Quantum World
Strand genomics features in CIO review
NIPS2010: high-dimensional statistics: prediction, association and causal inf...
Introduction to statistics ii
Random Variables and Probabiity Distribution

Similar to Introduction to statistics (14)

PPT
PPTX
Binomail distribution 23 jan 21
PDF
Classics 2011
PDF
The renyi entropy and the uncertainty relations in quantum mechanics
PPT
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
PPT
15.The Normal distribution (Gaussian).ppt
PDF
SPATIAL POINT PATTERNS
PPT
lecture 8
PPT
Diffraction,unit 2
PPT
Standard Scores
PDF
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
PPT
Chapter 2 Probabilty And Distribution
PPTX
Normal distribution and hypothesis testing
PPT
Probability distribution
Binomail distribution 23 jan 21
Classics 2011
The renyi entropy and the uncertainty relations in quantum mechanics
Talk given at Kobayashi-Maskawa Institute, Nagoya University, Japan.
15.The Normal distribution (Gaussian).ppt
SPATIAL POINT PATTERNS
lecture 8
Diffraction,unit 2
Standard Scores
Probabilistic AI Lecture 1: Introduction to variational inference and the ELBO
Chapter 2 Probabilty And Distribution
Normal distribution and hypothesis testing
Probability distribution
Ad

More from Strand Life Sciences Pvt Ltd (7)

PPTX
Dynamic programming for simd
PPTX
Complex numbers polynomial multiplication
PPTX
Converting High Dimensional Problems to Low Dimensional Ones
PPTX
Searching using Quantum Rules
PPTX
Randomized algorithms
PPTX
PPTX
Alignment of raw reads in Avadis NGS
Dynamic programming for simd
Complex numbers polynomial multiplication
Converting High Dimensional Problems to Low Dimensional Ones
Searching using Quantum Rules
Randomized algorithms
Alignment of raw reads in Avadis NGS
Ad

Recently uploaded (20)

PDF
Architecture types and enterprise applications.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPT
What is a Computer? Input Devices /output devices
PPTX
The various Industrial Revolutions .pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
DOCX
search engine optimization ppt fir known well about this
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Architecture types and enterprise applications.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
What is a Computer? Input Devices /output devices
The various Industrial Revolutions .pptx
Group 1 Presentation -Planning and Decision Making .pptx
Hybrid model detection and classification of lung cancer
Zenith AI: Advanced Artificial Intelligence
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Benefits of Physical activity for teenagers.pptx
1 - Historical Antecedents, Social Consideration.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Final SEM Unit 1 for mit wpu at pune .pptx
search engine optimization ppt fir known well about this
Getting started with AI Agents and Multi-Agent Systems
Developing a website for English-speaking practice to English as a foreign la...
NewMind AI Weekly Chronicles – August ’25 Week III
Taming the Chaos: How to Turn Unstructured Data into Decisions
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game

Introduction to statistics

  • 2. The Quest What can we say about this black box? E.g., What is the probability that it generates a number 5 12 3 9 28 bigger than 5? Observations
  • 3. Distributions What if we had many many observations? Value Frequency -1 0.3 0 0.2 Sum of 1 0.1 frequencies This table is the 2 0.1 is 1 distribution 3 0.1 associated with 4 0.1 this black box 5 0.1
  • 4. Distributions Graphically -1 0 1 2 3 4 5 Area under the curve is 1
  • 5. The Challenge We do not have many many observations So we cannot infer the distribution from the observations What can we do then?
  • 6. What can we do with few observations? Assume distribution is known E.g., Normal, Binomial (from prior knowledge or etc other means) I.e., model approximately using a canonical distribution But the parameters are not known Can these parameters be determined from the observations?
  • 7. Why Canonical Distributions Value Frequency -1 0.3 0 0.2 1 0.1 Too verbose a description for the 2 0.1 distribution 3 0.1 4 0.1 5 0.1 Can the entire distribution be described (even approximately) by just a few parameters, while modeling the data accurately
  • 8. Example: Binomial Distribution A coin that yields 1 with Observations probability p and 0 with probability 1- p, tossed n 1 0 1 1 1 …. times, independently Value Frequency Number of 1’s? 0 1 Distribution, 2 μ=np,σ2=np(1-p) n-1 Can one determine p from the (few) n observations?
  • 9. Other Canonical Distributions Normal μ, σ2 Poisson μ =r,σ2=r Negative Binomial μ =rp/(1-p), σ2= rp/(1-p)2 What are these? Later talk Gamma μ=kθ, σ2 =kθ2
  • 10. Back to the Quest We have few observations Assume these are from a known distribution family But with unknown parameters How do we determine the parameters? How do we determine μ, σ2?
  • 11. Estimating Mean μ, σ2 Estimate for the mean; a good estimate??
  • 12. μ, σ2 What is the mean and variance Normal!! For of this distribution? modest n.
  • 13. μ, σ2 Unbiased Tight as n grows larger
  • 14. Estimating Variance μ, σ2 Estimate for the variance; a good estimate??
  • 15. μ, σ2 μ, σ2 Bias
  • 16. Estimating Variance Correctly μ, σ2 Unbiased!!
  • 17. A Mind Reading Game • Your friend chooses a number (one of 1,3,5) in his/her mind – Call this i • He/She then rolls a 6-faced die 30 times, privately – For each roll, he/she declares Heads if the number on the die is <=i, and Tails otherwise • Your goal is to guess i solely from this sequence of n Heads and Tails. • Can you read your friend’s mind?