SlideShare a Scribd company logo
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Independent Sample T-test
Basic Terminologies
 Sample data is the subset of population data used to represent the entire group as whole
 For instance, if we want to come up with average value of all cars in united states, it is
impractical to assess the each car value in united states, adding these numbers and dividing
by total number of cars
 Instead, we can randomly select some of the cars, say 200 and get value of each of these 200
cars and find average of these 200 numbers
 These 200 numbers containing randomly selected 200 cars’ values is called a sample data of
entire United states’ cars’ values (population data)
 There are various sampling techniques such as simple random sampling, stratified sampling
and systematic sampling which are explained in annexure section
Basic Terminologies
 Null hypothesis in case of Independent sample t-test is a general statement that there is no
statistically significant difference between two samples
 Alternative hypothesis in case of Independent sample t-test is the one that states that there is a
statistically significant difference between two samples
 For instance, an online store marketing manager decides to test the hypothesis that females
have significantly higher tendency to shop online than males
 In this case following would be the null and alternative hypothesis:
 Null hypothesis : There is no significant difference between males and females in terms
of tendency to shop online
 Alternative hypothesis : There is statistically significant difference between males and
females in terms of tendency to shop online
 P- value : In case of independent sample t test, it indicates whether there is a
statistically significant difference between two samples
 For different levels of accuracy desired, the p-value can be checked at different
thresholds and inference can be made accordingly
 For instance, for confidence level or accuracy = 95% ( error =5%) , we have to
check p-value against the threshold of 0.05.
 If p-value < 0.05 then the difference is significant else the difference is
insignificant
 Similarly, for confidence level =98% (error =2%), we have to check p-value
against the threshold of 0.02.
 If p-value < 0.02 then the difference is significant else the difference is
insignificant and so on
Basic Terminologies
Introduction
• Independent sample t-test is a statistical test that determines
whether there is a statistically significant difference between the
means of two independent samples
• For instance, checking if average value of a sedan car type is significantly
different than the SUV car type
• Here the hypothesis would be set as follows :
• Null hypothesis : SUV and Sedan car types have insignificant difference in terms of value
• Alternative hypothesis : Value of SUV and Sedan differ significantly
Example : Input
Let’s conduct the Independent t-test on following two variables, one
is a dimension containing two values and the other is a measure :
Group Value
A 90
A 95
A 80
B 78
B 75
B 70
B 65
Two Independent Groups Dependent Variable
Example : Output
Group “A” Mean
Value
79.0
Group “B” Mean Value 72.0
Mean Difference 7.0
P-value 0.041
 At 95% confidence level (5% chance of error) :
 As p-value = 0.041 which is less than 0.05, there is a statistically significant
difference between the means of two groups A and B
 Mean of Group A is significantly higher than that of Group B
 At 98 % confidence level (2% chance of error) :
 As p-value = 0.041 which is greater than 0.02, there is no statistically
significant difference between the means of two groups A and B
Standard input parameters & sample UI
Sample output 1 : Interpretation
Sample output 2 : Model Summary
Sample output 3 : OUTLIERS
Outliers : They are the data values that differ greatly from the majority of a set of data.
Limitations
• Can be applied on only two samples (one dimension with two values
and one measure at a time)
• Observations within each group must be independent
• The values in each group must be normally distributed
• Number of data points should be at least 30
General applications
• Medicine
• Has the quality of life improved for patients who took drug A as opposed to patients
who took drug B?
• Sociology
• Are men more satisfied with their jobs than women? Do they earn more?
• Biology
• Are foxes in one specific habitat larger than in another?
• Economics
• Is the economic growth of developing nations larger than the economic growth of
the first world?
• Marketing
• Does customer segment A spend more on groceries than customer segment B?
Use case 1
Business benefit:
•Once the test is completed, p-value is
generated which indicates whether
there is statistical difference between
income of two groups.
•Based on this value, a manager can
easily conclude that whether average
income earned by female employees is
statistically different from male
employees and if the different is
statistically significant then which
gender earns higher or lower.
Business problem :
•An HR Manager wants to find out
whether male employees earn more
than female employees.
•Here the dependent variable would be
‘Total Annual Income’ .
Use case 1 : Input Dataset
Gender Income
Male 21000
Male 15000
Male 25600
Male 23000
Female 19750
Female 25000
Female 21250
Female 14400
Female 10000
Use case 1 : Output
Value
“Male” Mean Income Value 19444.44
“Female” Mean Income Value 18080.0
Mean Difference 1364.44
P-value 0.406
P-value : 0.406 (> 0.05) indicates that there is no significant difference
between income of males and females.
Use case 2
Business benefit:
• Once the test is completed, p-
value is generated which
indicates whether there is a
statistical difference between
purchase amounts of both
segments.
• Based on this value, grocery store
manager can decide on its
marketing strategies for better
sales and increased revenue.
Business problem :
• A Grocery store sales manager
wants to know whether customer
segment A spends more on
groceries than customer segment
B.
• Here the dependent variable
would be ‘Purchase Amount'.
Use case 3
Business benefit:
• Once the test is completed, p-value
is generated which indicates
whether there is statistical
difference between cholestrol
concentration of two groups.
• Based on this value, researcher can
conclude whether exercise was
more effective than the diet control
to control cholestrol level and
suggest better treatment to
patients.
Business problem :
• Suppose a medical researcher
decided to investigate whether an
exercise or diet control is more
effective in lowering cholestrol
levels. There are two groups :
Calorie-controlled diet group &
exercise-training group.
• Here the dependent variable would
be ‘Cholestrol concentrations’ .
Sampling Methods
• There are three main types of sampling :
• Simple random sampling:
• Here, the selection is purely based on a chance and every item has an equal chance of getting
selected
• Lottery system is an example of simple random sampling
• Stratified sampling:
• Here, the population data is divided into subgroups known as strata
• The members in each of the subgroup formed have similar attributes and characteristics in
terms of demographics, income, location etc.
• A random sample from each of these subgroups is taken in proportion to the subgroup size
relative to the population size
• These subsets of subgroups are then added to from a final stratified random sample
• Higher statistical precision is achieved through this method due to low variability within each
subgroup, also less sample size is required for this method of sampling when compared to
simple random sampling
Sampling Methods
• Government policymakers generally make use of stratified random sampling method for
coming up with better targeted solutions
• Systematic sampling:
• Here, the researcher has to decide the sampling size first and then the interval of
sampling – the standard distance between each sampled element
• Divide total population size by sample size to come up with this interval
• For instance, say you want to create a systematic random sample of 1,000 people
from a population of 10,000.
• Using a list of the total population, number each person from 1 to 10,000.
• Then, randomly choose a number, like 4, as the number to start with. This means that the
person numbered "4" would be your first selection, and then every tenth person from then
on would be included in your sample.
• Your sample, then, would be composed of persons numbered 14, 24, 34, 44, 54, and so on
down the line until you reach the person numbered 9,994
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

PPT
9. principles of social research
PPT
One Sample T Test
PPT
Quantitative Data analysis
PPT
Bivariate analysis
PDF
Factorial ANOVA
PPTX
Secularisation
PPTX
Classification and tabulation of data
PPT
Statistical Inference
9. principles of social research
One Sample T Test
Quantitative Data analysis
Bivariate analysis
Factorial ANOVA
Secularisation
Classification and tabulation of data
Statistical Inference

What's hot (20)

PPT
PPTX
Univariate & bivariate analysis
KEY
Content Analysis
PDF
Hypothesis testing; z test, t-test. f-test
PPT
Culture Change, Globalization, and the Future
PPTX
Statistics "Descriptive & Inferential"
PPT
Estimation and hypothesis testing 1 (graduate statistics2)
PPT
Statistics and probability
PPTX
Data measurement techniques
PDF
Types of Statistics
PPTX
Introduction to Statistics
PPTX
Statistical inference concept, procedure of hypothesis testing
PPTX
Analysis of variance (ANOVA)
PPTX
Chi square(hospital admin) A
PDF
Applied Business Statistics ,ken black , ch 3 part 1
PPT
Effect Size
PPT
Regression analysis
PDF
Assumptions of Linear Regression - Machine Learning
PPTX
Probability distribution in R
PPT
Linear regression
Univariate & bivariate analysis
Content Analysis
Hypothesis testing; z test, t-test. f-test
Culture Change, Globalization, and the Future
Statistics "Descriptive & Inferential"
Estimation and hypothesis testing 1 (graduate statistics2)
Statistics and probability
Data measurement techniques
Types of Statistics
Introduction to Statistics
Statistical inference concept, procedure of hypothesis testing
Analysis of variance (ANOVA)
Chi square(hospital admin) A
Applied Business Statistics ,ken black , ch 3 part 1
Effect Size
Regression analysis
Assumptions of Linear Regression - Machine Learning
Probability distribution in R
Linear regression
Ad

Similar to What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization? (20)

PPTX
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
PDF
Bmgt 311 chapter_13
PPTX
Sampling design
PDF
Basic Statistics in Social Science Research.pdf
PPTX
Sample Size Calculations for Impact Evaluations
PPTX
What is the Chi Square Test of Association and How Can it be Used for Analysis?
PPT
ARM Module 4 advanced research methodology
PPTX
sample size determination and power of study
DOC
QUANTITATIVE TECHNIQUES IN MANAGEMENT
PPTX
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
DOCX
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
DOCX
Between Black and White Population1. Comparing annual percent .docx
DOCX
Need a nonplagiarised paper and a form completed by 1006015 before.docx
PPTX
Inferential Statistics.pptx
PPT
Qt business statistics-lesson1-2013
PPT
PPTX
Data sources and data Types of stat.pptx
PPTX
Hypothsis testing
PDF
Bmgt 311 chapter_13
PPTX
Variable inferential statistics
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
Bmgt 311 chapter_13
Sampling design
Basic Statistics in Social Science Research.pdf
Sample Size Calculations for Impact Evaluations
What is the Chi Square Test of Association and How Can it be Used for Analysis?
ARM Module 4 advanced research methodology
sample size determination and power of study
QUANTITATIVE TECHNIQUES IN MANAGEMENT
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
Between Black and White Population1. Comparing annual percent .docx
Need a nonplagiarised paper and a form completed by 1006015 before.docx
Inferential Statistics.pptx
Qt business statistics-lesson1-2013
Data sources and data Types of stat.pptx
Hypothsis testing
Bmgt 311 chapter_13
Variable inferential statistics
Ad

More from Smarten Augmented Analytics (20)

PPTX
Hot Lead Prediction Analytics Use Case - Smarten
PPTX
Crop Yield Predictive Analytics Use Case – Smarten
PPTX
Crime Type Prediction - Augmented Analytics Use Case – Smarten
PPTX
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
PPTX
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
PPTX
What Is Random Forest Classification And How Can It Help Your Business?
PPTX
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
PPTX
Students' Academic Performance Predictive Analytics Use Case – Smarten
PPTX
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
PPTX
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PPTX
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
PPTX
Fraud Mitigation Predictive Analytics Use Case – Smarten
PPTX
Quality Control Predictive Analytics Use Case - Smarten
PPTX
Machine Maintenance Management Predictive Analytics Use Case - Smarten
PPTX
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
PPTX
Marketing Optimization Augmented Analytics Use Cases - Smarten
PPTX
Human Resource Attrition Augmented Analytics Use Case - Smarten
PPTX
Customer Targeting Augmented Analytics Use Case - Smarten
PPTX
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
Hot Lead Prediction Analytics Use Case - Smarten
Crop Yield Predictive Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Random Forest Classification And How Can It Help Your Business?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
Students' Academic Performance Predictive Analytics Use Case – Smarten
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Fraud Mitigation Predictive Analytics Use Case – Smarten
Quality Control Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?

Recently uploaded (20)

PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
AutoCAD Professional Crack 2025 With License Key
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
Advanced SystemCare Ultimate Crack + Portable (2025)
Complete Guide to Website Development in Malaysia for SMEs
Salesforce Agentforce AI Implementation.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Autodesk AutoCAD Crack Free Download 2025
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
iTop VPN Crack Latest Version Full Key 2025
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
Design an Analysis of Algorithms II-SECS-1021-03
AutoCAD Professional Crack 2025 With License Key
Computer Software and OS of computer science of grade 11.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Odoo Companies in India – Driving Business Transformation.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Weekly report ppt - harsh dattuprasad patel.pptx

What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Basic Terminologies  Sample data is the subset of population data used to represent the entire group as whole  For instance, if we want to come up with average value of all cars in united states, it is impractical to assess the each car value in united states, adding these numbers and dividing by total number of cars  Instead, we can randomly select some of the cars, say 200 and get value of each of these 200 cars and find average of these 200 numbers  These 200 numbers containing randomly selected 200 cars’ values is called a sample data of entire United states’ cars’ values (population data)  There are various sampling techniques such as simple random sampling, stratified sampling and systematic sampling which are explained in annexure section
  • 4. Basic Terminologies  Null hypothesis in case of Independent sample t-test is a general statement that there is no statistically significant difference between two samples  Alternative hypothesis in case of Independent sample t-test is the one that states that there is a statistically significant difference between two samples  For instance, an online store marketing manager decides to test the hypothesis that females have significantly higher tendency to shop online than males  In this case following would be the null and alternative hypothesis:  Null hypothesis : There is no significant difference between males and females in terms of tendency to shop online  Alternative hypothesis : There is statistically significant difference between males and females in terms of tendency to shop online
  • 5.  P- value : In case of independent sample t test, it indicates whether there is a statistically significant difference between two samples  For different levels of accuracy desired, the p-value can be checked at different thresholds and inference can be made accordingly  For instance, for confidence level or accuracy = 95% ( error =5%) , we have to check p-value against the threshold of 0.05.  If p-value < 0.05 then the difference is significant else the difference is insignificant  Similarly, for confidence level =98% (error =2%), we have to check p-value against the threshold of 0.02.  If p-value < 0.02 then the difference is significant else the difference is insignificant and so on Basic Terminologies
  • 6. Introduction • Independent sample t-test is a statistical test that determines whether there is a statistically significant difference between the means of two independent samples • For instance, checking if average value of a sedan car type is significantly different than the SUV car type • Here the hypothesis would be set as follows : • Null hypothesis : SUV and Sedan car types have insignificant difference in terms of value • Alternative hypothesis : Value of SUV and Sedan differ significantly
  • 7. Example : Input Let’s conduct the Independent t-test on following two variables, one is a dimension containing two values and the other is a measure : Group Value A 90 A 95 A 80 B 78 B 75 B 70 B 65 Two Independent Groups Dependent Variable
  • 8. Example : Output Group “A” Mean Value 79.0 Group “B” Mean Value 72.0 Mean Difference 7.0 P-value 0.041  At 95% confidence level (5% chance of error) :  As p-value = 0.041 which is less than 0.05, there is a statistically significant difference between the means of two groups A and B  Mean of Group A is significantly higher than that of Group B  At 98 % confidence level (2% chance of error) :  As p-value = 0.041 which is greater than 0.02, there is no statistically significant difference between the means of two groups A and B
  • 10. Sample output 1 : Interpretation
  • 11. Sample output 2 : Model Summary
  • 12. Sample output 3 : OUTLIERS Outliers : They are the data values that differ greatly from the majority of a set of data.
  • 13. Limitations • Can be applied on only two samples (one dimension with two values and one measure at a time) • Observations within each group must be independent • The values in each group must be normally distributed • Number of data points should be at least 30
  • 14. General applications • Medicine • Has the quality of life improved for patients who took drug A as opposed to patients who took drug B? • Sociology • Are men more satisfied with their jobs than women? Do they earn more? • Biology • Are foxes in one specific habitat larger than in another? • Economics • Is the economic growth of developing nations larger than the economic growth of the first world? • Marketing • Does customer segment A spend more on groceries than customer segment B?
  • 15. Use case 1 Business benefit: •Once the test is completed, p-value is generated which indicates whether there is statistical difference between income of two groups. •Based on this value, a manager can easily conclude that whether average income earned by female employees is statistically different from male employees and if the different is statistically significant then which gender earns higher or lower. Business problem : •An HR Manager wants to find out whether male employees earn more than female employees. •Here the dependent variable would be ‘Total Annual Income’ .
  • 16. Use case 1 : Input Dataset Gender Income Male 21000 Male 15000 Male 25600 Male 23000 Female 19750 Female 25000 Female 21250 Female 14400 Female 10000
  • 17. Use case 1 : Output Value “Male” Mean Income Value 19444.44 “Female” Mean Income Value 18080.0 Mean Difference 1364.44 P-value 0.406 P-value : 0.406 (> 0.05) indicates that there is no significant difference between income of males and females.
  • 18. Use case 2 Business benefit: • Once the test is completed, p- value is generated which indicates whether there is a statistical difference between purchase amounts of both segments. • Based on this value, grocery store manager can decide on its marketing strategies for better sales and increased revenue. Business problem : • A Grocery store sales manager wants to know whether customer segment A spends more on groceries than customer segment B. • Here the dependent variable would be ‘Purchase Amount'.
  • 19. Use case 3 Business benefit: • Once the test is completed, p-value is generated which indicates whether there is statistical difference between cholestrol concentration of two groups. • Based on this value, researcher can conclude whether exercise was more effective than the diet control to control cholestrol level and suggest better treatment to patients. Business problem : • Suppose a medical researcher decided to investigate whether an exercise or diet control is more effective in lowering cholestrol levels. There are two groups : Calorie-controlled diet group & exercise-training group. • Here the dependent variable would be ‘Cholestrol concentrations’ .
  • 20. Sampling Methods • There are three main types of sampling : • Simple random sampling: • Here, the selection is purely based on a chance and every item has an equal chance of getting selected • Lottery system is an example of simple random sampling • Stratified sampling: • Here, the population data is divided into subgroups known as strata • The members in each of the subgroup formed have similar attributes and characteristics in terms of demographics, income, location etc. • A random sample from each of these subgroups is taken in proportion to the subgroup size relative to the population size • These subsets of subgroups are then added to from a final stratified random sample • Higher statistical precision is achieved through this method due to low variability within each subgroup, also less sample size is required for this method of sampling when compared to simple random sampling
  • 21. Sampling Methods • Government policymakers generally make use of stratified random sampling method for coming up with better targeted solutions • Systematic sampling: • Here, the researcher has to decide the sampling size first and then the interval of sampling – the standard distance between each sampled element • Divide total population size by sample size to come up with this interval • For instance, say you want to create a systematic random sample of 1,000 people from a population of 10,000. • Using a list of the total population, number each person from 1 to 10,000. • Then, randomly choose a number, like 4, as the number to start with. This means that the person numbered "4" would be your first selection, and then every tenth person from then on would be included in your sample. • Your sample, then, would be composed of persons numbered 14, 24, 34, 44, 54, and so on down the line until you reach the person numbered 9,994
  • 22. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018