Using Google Sheets statistics functions

a111chen @2023
Using Google Sheets For Statistics

Objective
• Try to use Google Sheets function especially : LET, ARRAYFORMULA
• For function LET,
• It can show how the variable in the formula, the variable can be in array or range
• We can compare with different statistic formula and more easy.
• In future, just copy the formula, and change the variable, and boom finished.
• For function ARRAYFORMULA,
• In my opinion, Google Sheets Arrayformula is more easy to detect mistake at the
formula compare with Mircosoft Excel

Table of Content
• Construct Frequency Table with Normal Distribution
• Normality testing
• Normal Approximate to Binomial Distribution
• Sample Size
• Hypothesis Testing
• One Sample
• Two Sample
• Correlation between Two Sample
• K-Independent Sample

Construct Frequency Table
• Lowest
• =MIN(Dataset)
• Highest
• =MAX(Dataset)
• Total Class No
• =ROUNDUP(LN(COUNT(Dataset))/LN(2),0)
• Class Width
• =ROUNDUP((Highest-Lowest)/TotalNo,0)
• Frequency
• =COUNTIFS(Dataset,">="&Lower,Dataset,"<"&Upper)
https://www.youtube.com/watch?v=YfVu7xGHgnA

Frequency Graph
0
5
10
15
20
25
70 - 75 75 - 80 80 - 85 85 - 90 90 - 95 95 - 100
Class Boundaries
Frequency
(Min)
(Max)
(Mean)
(Median)
(Q1)
(Q3)
https://www.youtube.com/watch?v=39lsUsJsc2c

What graph can show the Normality?
8.3%
5.0%
20.0%
21.7%
38.3%
6.7%
70 - 75 75 - 80 80 - 85 85 - 90 90 - 95 95 - 100
Class Boundaries
Skew: -0.75, Kurt:0.06
Frequency in % with Normal Distribution
Count in %
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
-3 -2 -1 0 1 2 3
Data
quantiles
(Z-score)
Normal theoretical quantile (Z-score)
QQ Plot
https://www.youtube.com/watch?v=g5DTW2IQwxk

Construct Normal Distribution
• Normal Distribution
• =NORM.DIST(Midpoint,Mean,SD,FALSE)*ClassWidth
• Normal theoretical quantile (Z-score)
• =NORM.S.INV((RANK.AVG(x,Dataset,1)-0.5)/n)
• Data quantiles (Z-score)
• =STANDARDIZE(x,Mean,SD)
• Normality test
• Skew
• =SKEW(Dataset)
• Kurt
• =KURT(Dataset)
• Positive skewness extending toward more positive values.
• Negative skewness extending toward more negative values.
• Positive kurtosis indicates a relatively peaked distribution.
• Negative kurtosis indicates a relatively flat distribution.

Common use
• Total sample number, n
• =COUNT(Dataset)
• =let(f,F2:F6, SUM(f))
• Mean
• =AVERAGE(Dataset)
• =let(n,F7,x,$E$2:$E$6,f,F2:F6, SUMPRODUCT(x,f)/n)
• Standard deviation,SD
• =STDEV.S(Dataset)
• =let(n,F7, x,$E$2:$E$6,f,F2:F6, Mean,F8, sqrt(SUMPRODUCT(f,x^2)/n-Mean^2))

Binomial Distribution
• Number of trials, n
• probability of success, p
• P(X=x)
• = BINOM.DIST(x,n,p,FALSE)
• Check Probability between (from table)
• =PROB(RangeX,RangeP(X=x),LowerLimit,UpperLimit)
• =SUMIFS(RangeP(X=x), RangeX,">="& LowerLimit, RangeX,"<="& UpperLimit)
PROB returns the error value.
• If any value in prob_range ≤ 0 or if any value in
prob_range > 1,
• If the sum of the values in prob_range is not
equal to 1,
• If x_range and prob_range contain a different
number of data points,

Normal Approximate to Binomial Distribution
• Rough guideline:
• np >=10 and n(1-p) >=10
• Example
• n = 75, p = 0.6
• Mean = 45
• Std Dev. = 4.24
-0.02
0.00
0.02
0.04
0.06
0.08
0.10
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
n = 75, p = 0.6, Mean = 45, Std Dev. = 4.24
Normal Approximate to Binomial
Distribution
P(X=x) x Norm
https://www.youtube.com/watch?v=CCqWkJ_pqNU&t=313s https://www.youtube.com/watch?v=A2sd09qCvcg

Simulation with Random Number
• Random between 0 to 1
• =RAND()
• Random within specific range
• =RAND()*([UpperLimit]-[LowerLimit])+[LowerLimit]
• =RANDBETWEEN([LowerLimit],[UpperLimit])
• Normally simulate about 1,000 dataset
• Example
• =let(Price,$B$2,VariableCost,RANDBETWEEN($D$6*100,$E$6*100)/100,DemandQ
ty,RANDBETWEEN($D$7*100,$E$7*100)/100,FixedCost,$B$3,(Price-
VariableCost)*DemandQty-FixedCost)
https://www.indeed.com/career-advice/career-development/how-to-randomize-numbers-in-excel

Sample Size
https://www.checkmarket.com/blog/how-to-estimate-your-population-and-survey-sample-size/

Hypothesis Testing
Type I - Error
• critical value of the sample mean (1)
• =let(n,B4,mean,B2,SD,B3,Alpha,B7, mean+ CONFIDENCE.NORM(Alpha,SD,n))
• critical value of the sample mean (2)
• =let(n,B4,mean,B2,SD,B3,Alpha,B7, mean- CONFIDENCE.NORM(Alpha,SD,n))
Type II - Error
• Power of the test
• =let(n,B4, mean,B12, SD,B3, cvalue,B13, SE,SD/SQRT(n),Zscore,(cvalue-
mean)/SE,Beta,NORM.S.DIST(Zscore)-NORM.S.DIST(0)+0.5,1-Beta)
• Reject (Power of test <0.8) or accept

T-test for One sample, and Parametric
• T-test
• =let(n,B25,Meanp,B26,Means,B27,SDs,B28, SE,SDs/SQRT(n),(Means-Meanp)/SE)
• T-test critical
• =let(n,B25, alpha,B24,DF,n-1,abs(T.INV(alpha,DF)))
• Reject (t > t-test critical) or accept
Note:
• Parametric Data is Normal distributed
• Non Parametric Data is not Normal distributed

Observe Data and Expected Data, What test?
• Independent
• Chi-Square : ⅀ (Oi - Ei)^2/Ei
• =let(OA,B82:B84,OB,C82:C84,EA,E82:E84,
EB,F82:F84, sum(ARRAYFORMULA((OA-
EA)^2/EA),ARRAYFORMULA((OB-
EB)^2/EB)))
• Chi-Square Critical
• =let(nCol,B74,nRow,B75,
alpha,B77,DF,(nCol-1)*(nRow-
1),CHISQ.INV.RT(alpha,DF))
• Reject (Chi-Square > Chi-square critical ) or
accept
• P-Value
• =let(nCol,B74,nRow,B75, chisquare,B88,
df,(nCol-1)*(nRow-1),
CHISQ.DIST.RT(chisquare,df))
• =Let(Oi,B82:C84,Ei,E82:F84,
CHISQ.TEST(Oi,Ei))
0
5
10
15
20
25
30
35
40
Yes (O1) No (O2) Yes (E1) No (E2)
Observe Expected
Heavy smoker Moderate Nonsmoker
Eij = Sum of Col(i)* Sum of Row(j)/(Total sum of Col or Row)

Chi-square Test
Related
• McNemar test
• =let(FF,B130,NFNF,C131,(abs(FF-
NFNF)-1)^2/(FF+NFNF))
• Chi-square critical
• =let(nCol,B122,nRow,B123,
alpha,B124,DF,(nCol-1)*(nRow-
1),CHISQ.INV.RT(alpha,DF))
• Reject (McNemar > Chi-square critical )
or accept
10
60
90
40
Favor Not Favor
Before/ After
Favor Not Favor

Chi-Square related Statistic
• Statistic, Phi
• =let(ChiSquare,B62,N,B60, sqrt((Chisquare/N)))
• Statistic, Cramer’s V
• =let(ChiSquare,B62,N,B60,K,B61,sqrt((ChiSquare/(N*(K-1)))))
• Statistic, Contigency coefficienct C
• =let(ChiSquare,B62,N,B60, sqrt((ChiSquare/(ChiSquare+N))))
• Probability of error, P(A)
• =let(n,D74,totalCol,B74:C74,PercentCol,B75:C75, 1-
SUMPRODUCT(totalCol,PercentCol)/n)
• Probability of error, P(B)
• =let(OA,B71:B73,OB,C71:C73,TotalRow,D71:D73,n,D74,1-
sum(ARRAYFORMULA(OA^2/TotalRow+OB^2/TotalRow))/n)
• Goodman & Kruskal tau:
• =Let(PA,B81,PB,B82,(PA-PB)/PA)

Correlation Graph, How to test?
-1000
0
1000
2000
3000
4000
5000
6000
-5 0 5 10 15 20 25 30
Y
(Price)
X (Temperature)

T-Test Correlation Test
• Pearson correlation (Parametric)
• =let(n,F4,y,B3:B12,x,C3:C12,r,PEARSON(y,x), r/sqrt((1-r^2)/(n-2)))
• =let(n,F4,y,B3:B12,x,C3:C12,r, CORREL(y,x), r/sqrt((1-r^2)/(n-2)))
• =let(y,B3:B12,x,C3:C12,n,count(x),a,n*SUMPRODUCT(x,y)-
(sum(x)*sum(y)),b,n*SUMPRODUCT(x^2)-sum(x)^2,c,n*SUMPRODUCT(y^2)-
sum(y)^2,a/SQRT(b*c))
• Spearman’s Rho (Non Parametric)
• =let(x,E114:E123,y,F114:F123,n,count(x),RankX,MAP(x,LAMBDA(r,RANK.AVG(r,x))),Ran
kY,MAP(y,LAMBDA(r,RANK.AVG(r,y))),rs,1-6*SUMXMY2(RankX,RankY)/(n^3-
n),rs*sqrt((n-2)/(1-rs^2)))
• T-Test Critical
• =let(n,F4, alpha,F7,df,n-2,abs(T.INV(alpha,df)))
• Reject (Pearson correlation > T-Test critical) or accept
• Reject (Spearman’s Rho > T-Test critical) or accept

F-Test Correlation Test
• F-Ratio (Parametric)
• =let(n,B41,k,B42,x,C23:C32,y,B23:B32,Intercept,INTERCEPT(y,x),Slope,SLOPE(y,x),
RegMeanSq,devsq(ARRAYFORMULA(x*Slope+Intercept))/(k-1),
ResidualMeanSq, SUMXMY2(ARRAYFORMULA(x*Slope+Intercept),y)/(n-k),
RegMeanSq/ResidualMeanSq)
• F-test Critical
• =let(n,B41,k,B42,dfA,k-1,dfB,n-k,F.INV.RT(0.05,dfA,dfB))
• Reject (F-Ratio > F-Test critical) or accept

T-Test Two Sample Test, Parametric
• Related
• T-test
• =let(YA,B103:B112,YB,C103:C112,n,B97,df,n-1,sd,sqrt((SUMXMY2(YA,YB)-
sum(ARRAYFORMULA(YA-YB))^2/n)/df),(sum(ARRAYFORMULA(YA-
YB))/n)/(sd/sqrt(n)))
• T-test critical
• =let(n,B97, alpha,B98,df,n-1, abs(T.INV.2T(alpha,df)))
• Reject (T-Test > T-Test critical) or accept
• P-Value
• =let(n,B97,df,n-1,Ttest,B115,T.DIST.2T(Ttest,df))
• =let(YA,B103:B112,YB,C103:C112,T.TEST(YA,YB,2,1))
tails - Specifies the number of
distribution tails.
• If 1: uses a one-tailed distribution.
• If 2: uses a two-tailed distribution.
type - Specifies the type of t-Test.
• If 1: a paired test is performed.
• If 2: a two-sample equal variance
(homoscedastic) test is performed.
• If 3: a two-sample unequal
variance (heteroscedastic) test is
performed.
T.INV(alpha,df)
= T.INV.2T(2*alpha,df)

K-Independent Sample
0
10
20
30
40
50
60
70
80
90
100
1 2 3
k (number of groups)
1 2
b (number of replicate)
n (Total number of
group interaction)

Analysis of variance (One Way)
• F Value
• =let(K,B140,GA,H145:H164,GB,I145:I164,GC,J145:J164,N,count(GA,GB,GC),
nG,count(GA),sum2n,sum(GA,GB,GC)^2/N,
sumx2n,sum(sum(GA)^2,sum(GB)^2,sum(GC)^2)/nG,sumSQn,sum(SUMSQ(GA),SU
MSQ(GB),SUMSQ(GC)),SSBetween,(sumx2n-sum2n)/(k-1),SSWithin,(sumSQn-
sumx2n)/(n-k),SSBetween/SSWithin)
• F-test Critical value
• =let(n,B139,k,B140,dfA,k-1,dfB,n-k,F.INV.RT(0.05,dfA,dfB))

ANOVA: Two-Factor With Replication
• [A]
• =let(GA,H145:H164,GB,I145:I164,GC,J145:J164,n,count(GA),
sum(sum(GA)^2,sum(GB)^2,sum(GC)^2)/n)
• [BA]
• =let(A,H145:J154,B,H155:J164,n,count(A),sum(sum(A)^2,sum(B)^2)/n)
• [AB]
• =let(GAA,H145:H154,GBA,I145:I154,GCA,J145:J154,GAB,H155:H164,GBB,I155:I164,GCB,J1
55:J164,n,count(GAA),
sum(sum(GAA)^2,sum(GBA)^2,sum(GCA)^2,sum(GAB)^2,sum(GBB)^2,sum(GCB)^2)/n)
• [Y]
• =let(A,H145:J154,B,H155:J164,SUMSQ(A,B))
• [T]
• =let(A,H145:J154,B,H155:J164,SUM(A,B)^2/count(A,B))

ANOVA: Two-Factor With Replication
• Within Group (S/AB)
• =Let(Y,E178,AB,E177,k,B168,b,B169,n,B170,(Y-AB)/(k*b*(n-1)))
• F Value - Between Group A
• =Let(Y,E178,AB,E177,k,B168,b,B169,n,B170,A,E175,T,E179,Within,D187,(A-T)/(k-
1)/Within)
• F Value - Between Group B
• =Let(Y,E178,AB,E177,k,B168,b,B169,n,B170,BA,E176,T,E179,Within,D187,(BA-T)/(b-
1)/Within)
• F Value - Interaction (AxB)
• =Let(Y,E178,AB,E177,k,B168,b,B169,n,B170,BA,E176,A,E175,T,E179,Within,D187,(AB-
A-BA+T)/(k-1)/(b-1)/Within)
• F-test Critical value
• =let(k,$B$168,b,$B$169,n,$B$170, dfWithin,(k*b*(n-
1)),df,C184,F.INV.RT(0.05,df,dfWithin))
• Reject (F Value > F-Test critical) or accept

Conclusion
• Already show some example of function LET, ARRAYFORMULA in statistics
• If need more further understand the statistic for business use, maybe can refer to
• Main Reference: Business Research Methods, Pamela Schindler, 14th Edition.
• Beside from the understand above example, it also can
• Applied to business use, for example marketing research, organization behavior
research
• Combine with lambda function, for our example Spearman’s Rho
• Create new formula

Using Google Sheets statistics functions

More Related Content

Similar to Using Google Sheets statistics functions (20)

More from Chen Jian Yuan (19)

Recently uploaded (20)

Using Google Sheets statistics functions