Statistical Inference
Statistical Inference Scenario
• Data Generation
• The world we live in is complex, random, and uncertain. At the same
time, it’s one big data-generating machine.
• As we commute to work on subways and in cars, as our blood moves
through our bodies, as we’re shopping, emailing, procrastinating at
work by browsing the Internet and watching the stock market, as
we’re building things, eating things, talking to our friends and family
about things, while factories are producing products, this all at least
potentially produces data.
Data Collection and Sampling methods
• Imagine spending 24 hours looking out the window, and for every
minute, counting and recording the number of people who pass by.
Or gathering up everyone who lives within a mile of your house and
making them tell you how many email messages they receive every
day for the next year. Imagine heading over to your local hospital and
rummaging around in the blood samples looking for patterns in the
DNA. That all sounded creepy, but it wasn’t supposed to. The point
here is that the processes in our lives are actually data-generating
processes.
Roles of data scientist
• We’d like ways to describe, understand, and make sense of these
processes, in part because as scientists we just want to understand
the world better, but many times, understanding these processes is
part of the solution to problems we’re trying to solve.
• Data represents the traces of the real-world processes, and exactly
which traces we gather are decided by our data collection or sampling
method.
• You, the data scientist, the observer, are turning the world into data.
Challenges in Data interpretation
• Once you have all this data, you have somehow captured the world,
or certain traces of the world. But you can’t go walking around with a
huge Excel spreadsheet or database of millions of transactions and
look at it and, with a snap of a finger, understand the world and
process that generated it.
• Simplification through Mathematical models
• So you need a new idea, and that’s to simplify those captured traces
into something more comprehensible, to something that somehow
captures it all in a much more concise way, and that something could
be mathematical models or functions of the data, known as statistical
estimators.
Statistical Inference Def and importance
• This overall process of going from the world to the data, and then
from the data back to the world, is the field of statistical inference.
• More precisely, statistical inference is the discipline that concerns
itself with the development of procedures, methods, and theorems
that allow us to extract meaning and information from data that has
been generated by stochastic (random) processes.
Statistical inference is the process of drawing conclusions or making
predictions about a population based on data from a sample. In other
words, it is a set of methods used to make inferences about an entire
population based on a smaller subset of the population (the sample).
Applications Statistical Inference
• Statistical inference is widely used in many fields, including
science, engineering, economics, and social sciences, among
others. It plays a critical role in making informed decisions and
drawing meaningful conclusions from data.
Population and Sample
• In classical statistical literature, a distinction is made between the
population and the sample. The word population immediately makes
us think of the entire Pakistani population of 220 million people, or
the entire world’s population of 7 billion people.
• But put that image out of your head, because in statistical inference
population isn’t used to simply describe only people. It could be any
set of objects or units, such as tweets or photographs or stars.
• If we could measure the characteristics or extract characteristics of
all
those objects, we’d have a complete set of observations, and the
convention is to use N to represent the total number of observations
in
the population.
Population and Sample
• Suppose your population was all emails sent last year by
employees at a huge corporation, BigCorp. Then a single
observation could be a list of things: the sender’s name, the list
of recipients, date sent, text of email, number of characters in
the email, number of sentences in the email, number of verbs in
the email, and the length of time until first reply.
• When we take a sample, we take a subset of the units of size n
in order to examine the observations to draw conclusions and
make inferences about the population.
• In the BigCorp email example, you could make a list of all the
employees and select 1/10th of those people at random and
take all the email they ever sent, and that would be your
sample.
What is a model?
• Humans try to understand the world around them by representing it
in different ways.
• Architects capture attributes of buildings through blueprints and
three-dimensional, scaled-down versions.
• Molecular biologists capture protein structure with three-
dimensional visualizations of the connections between amino acids.
• Statisticians and data scientists capture the uncertainty and
randomness of data-generating processes with mathematical
functions that express the shape and structure of the data itself.
What is a model?
• A model is our attempt to understand and represent the nature of
reality through a particular lens, be it architectural, biological, or
mathematical.
• A model is an artificial construction where all extraneous detail has
been removed or abstracted. Attention must always be paid to these
abstracted details after a model has been analyzed to see what might
have been overlooked.
Statistical Modeling
• Statistical modeling is the use of mathematical models and statistical
assumptions to generate sample data and make predictions about
the real world. A statistical model is a collection of probability
distributions on a set of all possible outcomes of an experiment.
• Statistical modeling refers to the data science process of applying
statistical analysis to datasets. A statistical model is a mathematical
relationship between one or more random variables and other non-
random variables. The application of statistical modeling to raw data
helps data scientists approach data analysis in a strategic manner,
providing intuitive visualizations that aid in identifying relationships
between variables and making predictions.
Statistical Modeling
• Common data sets for statistical analysis include Internet of
Things (IoT) sensors, census data, public health data, social
media data, imagery data, and other public sector data that
benefit from real-world predictions.
Probability
• Probability denotes the possibility of something happening. It is a
mathematical concept that predicts how likely events are to occur.
The probability values are expressed between 0 and 1. The definition
of probability is the degree to which something is likely to occur. This
fundamental theory of probability is also applied to probability
distributions.
Probability Distribution
• Probability distribution yields the possible outcomes for any random
event. It is also defined based on the underlying sample space as a set
of possible outcomes of any random experiment. These settings could
be a set of real numbers or a set of vectors or a set of any entities. It
is a part of probability and statistics.
• A probability distribution is a table or an equation that links each
possible value that a random variable can assume with its probability
of occurrence.
Statistical Inference for development statistical model.pptx
Introduction to R
• R is a programming language created by statisticians for statistics,
specifically for working with data.
• used by business analysts, data analysts, data scientists, and
• scientists.
• R is unique in that it is not general purpose.
• statistical analysis and data visualization.
• Academics, scientists, and researchers use it to analyze the results of
experiments.
• businesses of all sizes and in every industry use it to extract insights
from the increasing amount of daily data they generate.

More Related Content

PPTX
UNIT1-2.pptx
PPTX
Data science and visualization power point
PPTX
Fundamentals of Data science Introduction Unit 1
PPT
data science ppt of emngineering studnets
PPTX
DataScienceandVisualization_Mod_1_ppt.pptx
PPT
Data Science-1 (1).ppt
PPTX
Data Science topic and introduction to basic concepts involving data manageme...
PDF
Data Anayltics: How to predict anything
UNIT1-2.pptx
Data science and visualization power point
Fundamentals of Data science Introduction Unit 1
data science ppt of emngineering studnets
DataScienceandVisualization_Mod_1_ppt.pptx
Data Science-1 (1).ppt
Data Science topic and introduction to basic concepts involving data manageme...
Data Anayltics: How to predict anything

Similar to Statistical Inference for development statistical model.pptx (20)

PPTX
Morden EcoSystem.pptx
DOCX
Data Analysis and Statistics-skills.docx
PDF
Getting to Know Your Data with R
PPT
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
PPTX
Statistical Learning - Introduction.pptx
PPTX
1 10 everyday reasons why statistics are important
PPTX
An Overview of Basic Statistics
PDF
statistical analysis ppt of data analysis in the world of nitin
PPTX
10 everyday reasons why statistics are important
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
PPTX
SSM Introduction.pptx
PPTX
Introduction to Data (Data Analytics)...
PPTX
classIX_DS_Teacher_Presentation.pptx
PDF
The basic practice of statistics 3rd Edition David S. Moore
PPTX
What is Statistics
PPTX
Statistics.pptx
PPTX
Statistical Graphs Lecture 1 - statistics for computer major.pptx
PPT
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
PDF
Data Science definition
PDF
Let's talk about Data Science
Morden EcoSystem.pptx
Data Analysis and Statistics-skills.docx
Getting to Know Your Data with R
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
Statistical Learning - Introduction.pptx
1 10 everyday reasons why statistics are important
An Overview of Basic Statistics
statistical analysis ppt of data analysis in the world of nitin
10 everyday reasons why statistics are important
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
SSM Introduction.pptx
Introduction to Data (Data Analytics)...
classIX_DS_Teacher_Presentation.pptx
The basic practice of statistics 3rd Edition David S. Moore
What is Statistics
Statistics.pptx
Statistical Graphs Lecture 1 - statistics for computer major.pptx
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Data Science definition
Let's talk about Data Science
Ad

Recently uploaded (20)

PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
semiconductor packaging in vlsi design fab
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
Computer Architecture Input Output Memory.pptx
PDF
My India Quiz Book_20210205121199924.pdf
PDF
Journal of Dental Science - UDMY (2021).pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Module on health assessment of CHN. pptx
PDF
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
PDF
Empowerment Technology for Senior High School Guide
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Uderstanding digital marketing and marketing stratergie for engaging the digi...
semiconductor packaging in vlsi design fab
FORM 1 BIOLOGY MIND MAPS and their schemes
Computer Architecture Input Output Memory.pptx
My India Quiz Book_20210205121199924.pdf
Journal of Dental Science - UDMY (2021).pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
What’s under the hood: Parsing standardized learning content for AI
Complications of Minimal Access-Surgery.pdf
Module on health assessment of CHN. pptx
Mucosal Drug Delivery system_NDDS_BPHARMACY__SEM VII_PCI.pdf
Empowerment Technology for Senior High School Guide
Race Reva University – Shaping Future Leaders in Artificial Intelligence
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Paper A Mock Exam 9_ Attempt review.pdf.
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Ad

Statistical Inference for development statistical model.pptx

  • 2. Statistical Inference Scenario • Data Generation • The world we live in is complex, random, and uncertain. At the same time, it’s one big data-generating machine. • As we commute to work on subways and in cars, as our blood moves through our bodies, as we’re shopping, emailing, procrastinating at work by browsing the Internet and watching the stock market, as we’re building things, eating things, talking to our friends and family about things, while factories are producing products, this all at least potentially produces data.
  • 3. Data Collection and Sampling methods • Imagine spending 24 hours looking out the window, and for every minute, counting and recording the number of people who pass by. Or gathering up everyone who lives within a mile of your house and making them tell you how many email messages they receive every day for the next year. Imagine heading over to your local hospital and rummaging around in the blood samples looking for patterns in the DNA. That all sounded creepy, but it wasn’t supposed to. The point here is that the processes in our lives are actually data-generating processes.
  • 4. Roles of data scientist • We’d like ways to describe, understand, and make sense of these processes, in part because as scientists we just want to understand the world better, but many times, understanding these processes is part of the solution to problems we’re trying to solve. • Data represents the traces of the real-world processes, and exactly which traces we gather are decided by our data collection or sampling method. • You, the data scientist, the observer, are turning the world into data.
  • 5. Challenges in Data interpretation • Once you have all this data, you have somehow captured the world, or certain traces of the world. But you can’t go walking around with a huge Excel spreadsheet or database of millions of transactions and look at it and, with a snap of a finger, understand the world and process that generated it. • Simplification through Mathematical models • So you need a new idea, and that’s to simplify those captured traces into something more comprehensible, to something that somehow captures it all in a much more concise way, and that something could be mathematical models or functions of the data, known as statistical estimators.
  • 6. Statistical Inference Def and importance • This overall process of going from the world to the data, and then from the data back to the world, is the field of statistical inference. • More precisely, statistical inference is the discipline that concerns itself with the development of procedures, methods, and theorems that allow us to extract meaning and information from data that has been generated by stochastic (random) processes. Statistical inference is the process of drawing conclusions or making predictions about a population based on data from a sample. In other words, it is a set of methods used to make inferences about an entire population based on a smaller subset of the population (the sample).
  • 7. Applications Statistical Inference • Statistical inference is widely used in many fields, including science, engineering, economics, and social sciences, among others. It plays a critical role in making informed decisions and drawing meaningful conclusions from data.
  • 8. Population and Sample • In classical statistical literature, a distinction is made between the population and the sample. The word population immediately makes us think of the entire Pakistani population of 220 million people, or the entire world’s population of 7 billion people. • But put that image out of your head, because in statistical inference population isn’t used to simply describe only people. It could be any set of objects or units, such as tweets or photographs or stars. • If we could measure the characteristics or extract characteristics of all those objects, we’d have a complete set of observations, and the convention is to use N to represent the total number of observations in the population.
  • 9. Population and Sample • Suppose your population was all emails sent last year by employees at a huge corporation, BigCorp. Then a single observation could be a list of things: the sender’s name, the list of recipients, date sent, text of email, number of characters in the email, number of sentences in the email, number of verbs in the email, and the length of time until first reply. • When we take a sample, we take a subset of the units of size n in order to examine the observations to draw conclusions and make inferences about the population.
  • 10. • In the BigCorp email example, you could make a list of all the employees and select 1/10th of those people at random and take all the email they ever sent, and that would be your sample.
  • 11. What is a model? • Humans try to understand the world around them by representing it in different ways. • Architects capture attributes of buildings through blueprints and three-dimensional, scaled-down versions. • Molecular biologists capture protein structure with three- dimensional visualizations of the connections between amino acids. • Statisticians and data scientists capture the uncertainty and randomness of data-generating processes with mathematical functions that express the shape and structure of the data itself.
  • 12. What is a model? • A model is our attempt to understand and represent the nature of reality through a particular lens, be it architectural, biological, or mathematical. • A model is an artificial construction where all extraneous detail has been removed or abstracted. Attention must always be paid to these abstracted details after a model has been analyzed to see what might have been overlooked.
  • 13. Statistical Modeling • Statistical modeling is the use of mathematical models and statistical assumptions to generate sample data and make predictions about the real world. A statistical model is a collection of probability distributions on a set of all possible outcomes of an experiment. • Statistical modeling refers to the data science process of applying statistical analysis to datasets. A statistical model is a mathematical relationship between one or more random variables and other non- random variables. The application of statistical modeling to raw data helps data scientists approach data analysis in a strategic manner, providing intuitive visualizations that aid in identifying relationships between variables and making predictions.
  • 14. Statistical Modeling • Common data sets for statistical analysis include Internet of Things (IoT) sensors, census data, public health data, social media data, imagery data, and other public sector data that benefit from real-world predictions.
  • 15. Probability • Probability denotes the possibility of something happening. It is a mathematical concept that predicts how likely events are to occur. The probability values are expressed between 0 and 1. The definition of probability is the degree to which something is likely to occur. This fundamental theory of probability is also applied to probability distributions.
  • 16. Probability Distribution • Probability distribution yields the possible outcomes for any random event. It is also defined based on the underlying sample space as a set of possible outcomes of any random experiment. These settings could be a set of real numbers or a set of vectors or a set of any entities. It is a part of probability and statistics. • A probability distribution is a table or an equation that links each possible value that a random variable can assume with its probability of occurrence.
  • 18. Introduction to R • R is a programming language created by statisticians for statistics, specifically for working with data. • used by business analysts, data analysts, data scientists, and • scientists. • R is unique in that it is not general purpose. • statistical analysis and data visualization. • Academics, scientists, and researchers use it to analyze the results of experiments. • businesses of all sizes and in every industry use it to extract insights from the increasing amount of daily data they generate.