Descriptive Statistics– Summarizing and Visualizing Data.pptx
1. Descriptive Statistics – Summarizing and Visualizing Data
Once data has been collected, the first step is to describe it. Descriptive statistics provide the tools to summarize and organize
data so we can see patterns and trends. The goal here isn’t to draw conclusions about a population (that’s inferential statistics),
but to get a solid grasp of the sample at hand.
There are two major forms of descriptive statistics: measures of central tendency and measures of variability (dispersion).
Measures of Central Tendency tell us where the center of a dataset lies:
•Mean (average): Add up all values and divide by the number of values. It’s sensitive to outliers.
•Median: The middle value when data is ordered. It’s robust against outliers.
•Mode: The most frequently occurring value(s). Useful for categorical data.
Measures of Variability show us how spread out the data is:
•Range: Difference between the highest and lowest values.
•Variance: Average of squared deviations from the mean, showing how spread out data points are.
•Standard Deviation: The square root of the variance; gives a sense of average distance from the mean.
•Interquartile Range (IQR): The range of the middle 50% of data, between Q1 (25th percentile) and Q3 (75th percentile), less
affected by outliers.
Data visualization is a powerful companion to numeric descriptions. Common tools include:
•Histograms (for frequency distribution of numerical data),
•Box plots (to visualize median, quartiles, and outliers),
•Bar charts (for categorical variables),
•Scatter plots (to show relationships between two quantitative variables).
Another useful concept is distribution shape. Data can be symmetrical, skewed, uniform, or bimodal. A classic example is the
normal distribution (bell curve), which underpins much of inferential statistics. Understanding whether your data is
approximately normal affects decisions down the line.