SlideShare a Scribd company logo
EXST 7005:
Statistical Techniques I
Information:
• Lectures: Tue., Thu. 13:30-14:50
• Lecture Hall: Audubon 0104
• Labs: Thu. 15:00-16:50, Fr. 12:30-14:20, 44 Woodin Hall
• Office: 53 Woodin Hall
• Office Hours: Tue. 3:00-4:00, Thu. 11:00-12:00 or by appointment (through ZOOM)
• Email: agentimis1@lsu.edu
• Moodle: Check it often!
• Book: “Statistical Methods” Freund, Wilson, Mohr, 3rd
edition
Thematic Units Covered:
• Graphical and numerical Summary of
Data
• Elements of Probability
• Random Variables
• Basic Distributions
• Confidence Intervals
• Inference and t-tests
• Two sample inference
• Proportions tests
• ANOVA
• Linear Regression
• Categorical Data Analysis (Maybe?)
Grades Breakdown:
Assignment Points Total number Count Available Points:
Lab Assignments 5 12 8 5x8=40
Exams 20 2 2 2x20=40
Project 20 1 1 1x20=20
Total 40+40+20=100
Letter Grade Points Letter Grade Points
A+ x ≥ 99 C+ 80 > x ≥ 78
A 99 > x ≥ 92 C 78 > x ≥ 72
A- 92 > x ≥ 90 C- 72 > x ≥ 70
B+ 90 > x ≥ 88 D+ 70 > x ≥ 68
B 88 > x ≥ 82 D 68> x ≥ 62
B- 82 > x ≥ 80 F 62>X
What is Statistics? How is it used?
• Data Collection:
• Design Experiments
• Sampling
• Data Description:
• Summary Statistics
• Graphs
• Probability:
• Randomness
• Uncertainty
• Statistical Inference:
• Conclusions from Data
• Hypothesis testing
“The science of collecting, analyzing and interpreting data.”
What time do my
colleagues come to
work?
• Underlying Phenomenon:
“My colleagues come to work
around a certain time”
• Observations:
“The common doorway is locked if
they have not arrived or is unlocked
if they have”
• Data:
“A collection of times with the
characterization U(unlocked) and
L(locked)”
Generalities
about data:
• A data set is composed of
information from a set of units.
• Information from a unit is known
as an observation.
• An observation consists of one or
more pieces of information
about the unit; these are called
variables.
Types of variables: • What is the time variable in this
example?
DISCRETE!
• Definition 1.4: A discrete variable can assume
only a countable number of values.
• Typically, discrete variables are frequencies of
observations having specific characteristics, but
all discrete variables are not necessarily
frequencies.
• Definition 1.5: A continuous variable is one
that can take any one of an uncountable
number of values in an interval.
• Continuous variables are usually measured on a
scale and, although they may appear discrete
due to imprecise measurement, they can
conceptually take any value in an interval and
cannot therefore be enumerated.
Types of variables: • Why should I care?
Various statistical programs,
R, SAS, STATA, etc. are very picky
about the type of variables you are
using!
Lots of errors can occur if they are
not defined properly!
• The status variable is also called a
categorical variable.
• The two possibilities Locked and
Unlocked are called names or levels
or types.
• We say that the STATUS variable
follows a nominal scale. weakest
type to do statistics with
• The time variable is called
Quantitative
• The status variable is called
Qualitative
Now what?
Time in Status Time in Status
8:45 L 9:10 U
8:34 L 8:50 U
8:43 L 8:45 U
9:40 L 9:05 U
9:40 L 9:10 U
8:45 L 9:05 U
16:00 L 8:45 U
9:10 L 8:44 U
8:58 L 10:20 U
8:39 L 11:30 U
8:48 L 12:15 U
8:33 L 10:00 U
11:15 L 9:41 U
8:48 L 9:10 U
8:33 L 10:54 U
8:50 U
9:29 U
10:40 U
11:20 U
9:30 U
8:45 U
9:27 U
8:50 U
8:45 U
9:42 U
• To give an answer to our problem
we need to employ some
Numerical Descriptive Statistics.
• First though we need to “clean
up” our data set.
“In projects, cleaning up datasets is
80% of the work. Statistics is 20%.
Convincing people about them is
20%. Revising is 80%. You need to
give 200% every time!”
Transformations:
Status Time Status Time
L 45 U 70
L 43 U 50
L 100 U 45
L 195 U 65
L 48 U 70
L 33 U 65
L 34 U 45
L 45 U 44
L 480 U 140
L 70 U 210
L 58 U 255
L 39 U 120
L 100 U 101
L 48 U 70
L 33 U 174
U 50
U 89
U 160
U 200
U 90
U 45
U 87
U 50
U 45
U 102
• It is difficult to do descriptive
statistics with “absolute time”
• Much easier to change to
“minutes after …”
• What we are doing is called
“change of scale” (useful
technique) and it is a form of
“transformation”!
• Make sure you read about the
different scales on page 10.
Ideas??
Locked
Min 1st
Q. Median Mean 3rd
Q. Max
33 41 48 91.4 85 480
• We want to find the behavior “on
average”
• We can use the mean, or the
median.
• We can draw information from
the quartiles.
• We can draw information from
the max and min.
Unlocked
Min 1st
Q. Median Mean 3rd
Q. Max
44 50 70 97.68 120 255
More Ideas?
• We can use a boxplot
Q1,Q3,median.
Outliers! Status Time Status Time
L 45 U 70
L 43 U 50
L 48 U 45
L 34 U 65
L 45 U 70
L 70 U 65
L 58 U 45
L 39 U 44
L 48 U 101
U 70
U 50
U 89
U 90
U 45
U 87
U 50
U 45
U 102
• Due to errors
• Due to abnormal situations
• May skew our results
• Whole branch of analytics
“If you have no intelligent way to
describe outliers, remove them and
report that you are doing so.”
• Here we removed everything
above and below the Q1 and Q3
Review new data:
Unlocked (clean)
Min 1st
Q. Median Mean 3rd
Q. Max
44 46.25 65 65.72 82.75 102
• Notice now that the mean
becomes more relevant. Perhaps
even more informative than the
median.
• Page 37 has a nice review on
outliers, make sure to check it.
Locked (clean)
Min 1st
Q. Median Mean 3rd
Q. Max
34 43 45 47.78 48 70
New Box Plot:
• We can “infer” that people come
between 47.78 and 65.72
minutes or some time between
8:47 and 9:06.
• Or if we use the medians,
between 8:45 and 9:05.
“This corroborates with the
anecdotal evidence that most
professors and students come to
work at 9:00 am.”
Improve the result:
• Get More Data
• Analyze variances
• Do a survey
• Install cameras
Comment
• The idea of statistics is not “natural” for the human brain
• It was developed WAY after Calculus, Algebra and Trigonometry
If math is the language of science, statistics is the grammar!

More Related Content

PDF
Lekcija 1 - Uvod.pdf
PDF
Designing Test Collections That Provide Tight Confidence Intervals
PDF
Biostatistics CH Lecture Pack
PPTX
Exploratory Data Analysis week 4
PPTX
Presentation of Project and Critique.pptx
PPT
Para Investigación interna de Biología - IB
PPTX
Scientific Method
PPTX
1. Descriptive statistics.pptx engineering
Lekcija 1 - Uvod.pdf
Designing Test Collections That Provide Tight Confidence Intervals
Biostatistics CH Lecture Pack
Exploratory Data Analysis week 4
Presentation of Project and Critique.pptx
Para Investigación interna de Biología - IB
Scientific Method
1. Descriptive statistics.pptx engineering

Similar to 1_1_First Class_F2023.pptx statistics course (20)

PPTX
Data Mining Lecture_2.pptx
PPTX
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
PPTX
Fuzzy mathematics:An application oriented introduction
PPT
8. Hypothesis Testing.ppt
PPT
EDA.ppt
PDF
Physics 01-Introduction and Kinematics (2018) Lab.pdf
PDF
ISSTA'16 Summer School: Intro to Statistics
PDF
Statistics for analytics
PDF
datamining_LectureTwo(Data Pipeline) 2.pdf
PPTX
5. testing differences
PPTX
Data Wrangling_1.pptx
PDF
2013 07 05 (uc3m) lasi emadrid jgzubia deusto learning analytics primeras exp...
PPTX
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
PPTX
1d-Handling Numbers and Dimension Analysis.pptx
PDF
Clustering Methods with R
PPTX
Rs 702 social statistics
PPT
9-NON PARAMETRIC TEST in public health .ppt
PPTX
Feature Engineering
PDF
PDF
Week_2_Lecture.pdf
Data Mining Lecture_2.pptx
Data centre analytics toufic boubez-metafor-dev ops days vancouver-2013-10-25
Fuzzy mathematics:An application oriented introduction
8. Hypothesis Testing.ppt
EDA.ppt
Physics 01-Introduction and Kinematics (2018) Lab.pdf
ISSTA'16 Summer School: Intro to Statistics
Statistics for analytics
datamining_LectureTwo(Data Pipeline) 2.pdf
5. testing differences
Data Wrangling_1.pptx
2013 07 05 (uc3m) lasi emadrid jgzubia deusto learning analytics primeras exp...
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy Magennis
1d-Handling Numbers and Dimension Analysis.pptx
Clustering Methods with R
Rs 702 social statistics
9-NON PARAMETRIC TEST in public health .ppt
Feature Engineering
Week_2_Lecture.pdf
Ad

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
New ISO 27001_2022 standard and the changes
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Transcultural that can help you someday.
PPTX
Managing Community Partner Relationships
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
Introduction to Inferential Statistics.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Introduction to the R Programming Language
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
STERILIZATION AND DISINFECTION-1.ppthhhbx
importance of Data-Visualization-in-Data-Science. for mba studnts
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
New ISO 27001_2022 standard and the changes
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Transcultural that can help you someday.
Managing Community Partner Relationships
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
CYBER SECURITY the Next Warefare Tactics
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Introduction to Inferential Statistics.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
[EN] Industrial Machine Downtime Prediction
Optimise Shopper Experiences with a Strong Data Estate.pdf
Introduction to the R Programming Language
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Ad

1_1_First Class_F2023.pptx statistics course

  • 2. Information: • Lectures: Tue., Thu. 13:30-14:50 • Lecture Hall: Audubon 0104 • Labs: Thu. 15:00-16:50, Fr. 12:30-14:20, 44 Woodin Hall • Office: 53 Woodin Hall • Office Hours: Tue. 3:00-4:00, Thu. 11:00-12:00 or by appointment (through ZOOM) • Email: agentimis1@lsu.edu • Moodle: Check it often! • Book: “Statistical Methods” Freund, Wilson, Mohr, 3rd edition
  • 3. Thematic Units Covered: • Graphical and numerical Summary of Data • Elements of Probability • Random Variables • Basic Distributions • Confidence Intervals • Inference and t-tests • Two sample inference • Proportions tests • ANOVA • Linear Regression • Categorical Data Analysis (Maybe?)
  • 4. Grades Breakdown: Assignment Points Total number Count Available Points: Lab Assignments 5 12 8 5x8=40 Exams 20 2 2 2x20=40 Project 20 1 1 1x20=20 Total 40+40+20=100 Letter Grade Points Letter Grade Points A+ x ≥ 99 C+ 80 > x ≥ 78 A 99 > x ≥ 92 C 78 > x ≥ 72 A- 92 > x ≥ 90 C- 72 > x ≥ 70 B+ 90 > x ≥ 88 D+ 70 > x ≥ 68 B 88 > x ≥ 82 D 68> x ≥ 62 B- 82 > x ≥ 80 F 62>X
  • 5. What is Statistics? How is it used? • Data Collection: • Design Experiments • Sampling • Data Description: • Summary Statistics • Graphs • Probability: • Randomness • Uncertainty • Statistical Inference: • Conclusions from Data • Hypothesis testing “The science of collecting, analyzing and interpreting data.”
  • 6. What time do my colleagues come to work? • Underlying Phenomenon: “My colleagues come to work around a certain time” • Observations: “The common doorway is locked if they have not arrived or is unlocked if they have” • Data: “A collection of times with the characterization U(unlocked) and L(locked)”
  • 7. Generalities about data: • A data set is composed of information from a set of units. • Information from a unit is known as an observation. • An observation consists of one or more pieces of information about the unit; these are called variables.
  • 8. Types of variables: • What is the time variable in this example? DISCRETE! • Definition 1.4: A discrete variable can assume only a countable number of values. • Typically, discrete variables are frequencies of observations having specific characteristics, but all discrete variables are not necessarily frequencies. • Definition 1.5: A continuous variable is one that can take any one of an uncountable number of values in an interval. • Continuous variables are usually measured on a scale and, although they may appear discrete due to imprecise measurement, they can conceptually take any value in an interval and cannot therefore be enumerated.
  • 9. Types of variables: • Why should I care? Various statistical programs, R, SAS, STATA, etc. are very picky about the type of variables you are using! Lots of errors can occur if they are not defined properly! • The status variable is also called a categorical variable. • The two possibilities Locked and Unlocked are called names or levels or types. • We say that the STATUS variable follows a nominal scale. weakest type to do statistics with • The time variable is called Quantitative • The status variable is called Qualitative
  • 10. Now what? Time in Status Time in Status 8:45 L 9:10 U 8:34 L 8:50 U 8:43 L 8:45 U 9:40 L 9:05 U 9:40 L 9:10 U 8:45 L 9:05 U 16:00 L 8:45 U 9:10 L 8:44 U 8:58 L 10:20 U 8:39 L 11:30 U 8:48 L 12:15 U 8:33 L 10:00 U 11:15 L 9:41 U 8:48 L 9:10 U 8:33 L 10:54 U 8:50 U 9:29 U 10:40 U 11:20 U 9:30 U 8:45 U 9:27 U 8:50 U 8:45 U 9:42 U • To give an answer to our problem we need to employ some Numerical Descriptive Statistics. • First though we need to “clean up” our data set. “In projects, cleaning up datasets is 80% of the work. Statistics is 20%. Convincing people about them is 20%. Revising is 80%. You need to give 200% every time!”
  • 11. Transformations: Status Time Status Time L 45 U 70 L 43 U 50 L 100 U 45 L 195 U 65 L 48 U 70 L 33 U 65 L 34 U 45 L 45 U 44 L 480 U 140 L 70 U 210 L 58 U 255 L 39 U 120 L 100 U 101 L 48 U 70 L 33 U 174 U 50 U 89 U 160 U 200 U 90 U 45 U 87 U 50 U 45 U 102 • It is difficult to do descriptive statistics with “absolute time” • Much easier to change to “minutes after …” • What we are doing is called “change of scale” (useful technique) and it is a form of “transformation”! • Make sure you read about the different scales on page 10.
  • 12. Ideas?? Locked Min 1st Q. Median Mean 3rd Q. Max 33 41 48 91.4 85 480 • We want to find the behavior “on average” • We can use the mean, or the median. • We can draw information from the quartiles. • We can draw information from the max and min. Unlocked Min 1st Q. Median Mean 3rd Q. Max 44 50 70 97.68 120 255
  • 13. More Ideas? • We can use a boxplot Q1,Q3,median.
  • 14. Outliers! Status Time Status Time L 45 U 70 L 43 U 50 L 48 U 45 L 34 U 65 L 45 U 70 L 70 U 65 L 58 U 45 L 39 U 44 L 48 U 101 U 70 U 50 U 89 U 90 U 45 U 87 U 50 U 45 U 102 • Due to errors • Due to abnormal situations • May skew our results • Whole branch of analytics “If you have no intelligent way to describe outliers, remove them and report that you are doing so.” • Here we removed everything above and below the Q1 and Q3
  • 15. Review new data: Unlocked (clean) Min 1st Q. Median Mean 3rd Q. Max 44 46.25 65 65.72 82.75 102 • Notice now that the mean becomes more relevant. Perhaps even more informative than the median. • Page 37 has a nice review on outliers, make sure to check it. Locked (clean) Min 1st Q. Median Mean 3rd Q. Max 34 43 45 47.78 48 70
  • 16. New Box Plot: • We can “infer” that people come between 47.78 and 65.72 minutes or some time between 8:47 and 9:06. • Or if we use the medians, between 8:45 and 9:05. “This corroborates with the anecdotal evidence that most professors and students come to work at 9:00 am.”
  • 17. Improve the result: • Get More Data • Analyze variances • Do a survey • Install cameras
  • 18. Comment • The idea of statistics is not “natural” for the human brain • It was developed WAY after Calculus, Algebra and Trigonometry If math is the language of science, statistics is the grammar!