What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?

Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s

Basic Terminologies
 Sample data is the subset of population data used to represent the entire group as whole
 For instance, if we want to come up with average value of all cars in united states, it is
impractical to assess the each car value in united states, adding these numbers and dividing
by total number of cars
 Instead, we can randomly select some of the cars, say 200 and get value of each of these 200
cars and find average of these 200 numbers
 These 200 numbers containing randomly selected 200 cars’ values is called a sample data of
entire United states’ cars’ values (population data)
 There are various sampling techniques such as simple random sampling, stratified sampling
and systematic sampling which are explained in annexure section

Basic Terminologies
 Null hypothesis in case of Independent sample t-test is a general statement that there is no
statistically significant difference between two samples
 Alternative hypothesis in case of Independent sample t-test is the one that states that there is a
 For instance, an online store marketing manager decides to test the hypothesis that females
have significantly higher tendency to shop online than males
 In this case following would be the null and alternative hypothesis:
 Null hypothesis : There is no significant difference between males and females in terms
of tendency to shop online
 Alternative hypothesis : There is statistically significant difference between males and
females in terms of tendency to shop online

 P- value : In case of independent sample t test, it indicates whether there is a
 For different levels of accuracy desired, the p-value can be checked at different
thresholds and inference can be made accordingly
 For instance, for confidence level or accuracy = 95% ( error =5%) , we have to
check p-value against the threshold of 0.05.
 If p-value < 0.05 then the difference is significant else the difference is
insignificant
 Similarly, for confidence level =98% (error =2%), we have to check p-value
against the threshold of 0.02.
 If p-value < 0.02 then the difference is significant else the difference is
insignificant and so on
Basic Terminologies

Introduction
• Independent sample t-test is a statistical test that determines
whether there is a statistically significant difference between the
means of two independent samples
• For instance, checking if average value of a sedan car type is significantly
different than the SUV car type
• Here the hypothesis would be set as follows :
• Null hypothesis : SUV and Sedan car types have insignificant difference in terms of value
• Alternative hypothesis : Value of SUV and Sedan differ significantly

Example : Input
Let’s conduct the Independent t-test on following two variables, one
is a dimension containing two values and the other is a measure :
Group Value
A 90
A 95
A 80
B 78
B 75
B 70
B 65
Two Independent Groups Dependent Variable

Example : Output
Group “A” Mean
Value
79.0
Group “B” Mean Value 72.0
Mean Difference 7.0
P-value 0.041
 At 95% confidence level (5% chance of error) :
 As p-value = 0.041 which is less than 0.05, there is a statistically significant
difference between the means of two groups A and B
 Mean of Group A is significantly higher than that of Group B
 At 98 % confidence level (2% chance of error) :
 As p-value = 0.041 which is greater than 0.02, there is no statistically
significant difference between the means of two groups A and B

Standard input parameters & sample UI

Sample output 1 : Interpretation

Sample output 2 : Model Summary

Sample output 3 : OUTLIERS
Outliers : They are the data values that differ greatly from the majority of a set of data.

Limitations
• Can be applied on only two samples (one dimension with two values
and one measure at a time)
• Observations within each group must be independent
• The values in each group must be normally distributed
• Number of data points should be at least 30

General applications
• Medicine
• Has the quality of life improved for patients who took drug A as opposed to patients
who took drug B?
• Sociology
• Are men more satisfied with their jobs than women? Do they earn more?
• Biology
• Are foxes in one specific habitat larger than in another?
• Economics
• Is the economic growth of developing nations larger than the economic growth of
the first world?
• Marketing
• Does customer segment A spend more on groceries than customer segment B?

Use case 1
Business benefit:
•Once the test is completed, p-value is
generated which indicates whether
there is statistical difference between
income of two groups.
•Based on this value, a manager can
easily conclude that whether average
income earned by female employees is
statistically different from male
employees and if the different is
statistically significant then which
gender earns higher or lower.
Business problem :
•An HR Manager wants to find out
whether male employees earn more
than female employees.
•Here the dependent variable would be
‘Total Annual Income’ .

Use case 1 : Input Dataset
Gender Income
Male 21000
Male 15000
Male 25600
Male 23000
Female 19750
Female 25000
Female 21250
Female 14400
Female 10000

Use case 1 : Output
Value
“Male” Mean Income Value 19444.44
“Female” Mean Income Value 18080.0
Mean Difference 1364.44
P-value 0.406
P-value : 0.406 (> 0.05) indicates that there is no significant difference
between income of males and females.

Use case 2
Business benefit:
• Once the test is completed, p-
value is generated which
indicates whether there is a
statistical difference between
purchase amounts of both
segments.
• Based on this value, grocery store
manager can decide on its
marketing strategies for better
sales and increased revenue.
Business problem :
• A Grocery store sales manager
wants to know whether customer
segment A spends more on
groceries than customer segment
B.
• Here the dependent variable
would be ‘Purchase Amount'.

Use case 3
Business benefit:
• Once the test is completed, p-value
is generated which indicates
whether there is statistical
difference between cholestrol
concentration of two groups.
• Based on this value, researcher can
conclude whether exercise was
more effective than the diet control
to control cholestrol level and
suggest better treatment to
patients.
Business problem :
• Suppose a medical researcher
decided to investigate whether an
exercise or diet control is more
effective in lowering cholestrol
levels. There are two groups :
Calorie-controlled diet group &
exercise-training group.
• Here the dependent variable would
be ‘Cholestrol concentrations’ .

Sampling Methods
• There are three main types of sampling :
• Simple random sampling:
• Here, the selection is purely based on a chance and every item has an equal chance of getting
selected
• Lottery system is an example of simple random sampling
• Stratified sampling:
• Here, the population data is divided into subgroups known as strata
• The members in each of the subgroup formed have similar attributes and characteristics in
terms of demographics, income, location etc.
• A random sample from each of these subgroups is taken in proportion to the subgroup size
relative to the population size
• These subsets of subgroups are then added to from a final stratified random sample
• Higher statistical precision is achieved through this method due to low variability within each
subgroup, also less sample size is required for this method of sampling when compared to
simple random sampling

Sampling Methods
• Government policymakers generally make use of stratified random sampling method for
coming up with better targeted solutions
• Systematic sampling:
• Here, the researcher has to decide the sampling size first and then the interval of
sampling – the standard distance between each sampled element
• Divide total population size by sample size to come up with this interval
• For instance, say you want to create a systematic random sample of 1,000 people
from a population of 10,000.
• Using a list of the total population, number each person from 1 to 10,000.
• Then, randomly choose a number, like 4, as the number to start with. This means that the
person numbered "4" would be your first selection, and then every tenth person from then
on would be included in your sample.
• Your sample, then, would be composed of persons numbered 14, 24, 34, 44, 54, and so on
down the line until you reach the person numbered 9,994

Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?

More Related Content

What's hot (20)

Similar to What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization? (20)

More from Smarten Augmented Analytics (20)

Recently uploaded (20)

What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?