SlideShare a Scribd company logo
OPERATOR EXAMPLE
specify rep78 variable to be an indicator variablei. regress price i.rep78specify indicators
ib. set the third category of rep78 to be the base categoryregress price ib(3).rep78specify base indicator
fvset command to change base fvset base frequent rep78 set the base to most frequently occurring category for rep78
c. treat mpg as a continuous variable and
specify an interaction between foreign and mpg
regress price i.foreign#c.mpg i.foreigntreat variable as continuous
# create a squared mpg term to be used in regressionregress price mpg c.mpg#c.mpgspecify interactions
o. set rep78 as an indicator; omit observations with rep78 == 2regress price io(2).rep78omit a variable or indicator
## regress price c.mpg##c.mpg create all possible interactions with mpg (mpg and mpg2
)specify factorial interactions
DESCRIPTION
CATEGORICAL VARIABLES
identify a group to which
an observations belongs
INDICATOR VARIABLES
denote whether
something is true or false
T F
CONTINUOUS VARIABLES
measure something
Declare Data
tsline spot
plot time series of sunspots
xtset id year
declare national longitudinal data to be a panel
generate lag_spot = L1.spot
create a new variable of annual lags of sun spots
tsreport
report time series aspects of a dataset
xtdescribe
report panel aspects of a dataset
xtsum hours
summarize hours worked, decomposing
standard deviation into between and
within components
arima spot, ar(1/2)
estimate an auto-regressive model with 2 lags
xtreg ln_w c.age##c.age ttl_exp, fe vce(robust)
estimate a fixed-effects model with robust standard errors
xtline ln_wage if id <= 22, tlabel(#3)
plot panel data as a line plot
svydescribe
report survey data details
svy: mean age, over(sex)
estimate a population mean for each subpopulation
svy: tabulate sex heartatk
report two-way table with tests of independence
svy, subpop(rural): mean age
estimate a population mean for rural areas
tsset time, yearly
declare sunspot data to be yearly time series
TIME SERIES webuse sunspot, clear PANEL / LONGITUDINAL webuse nlswork, clear
SURVEY DATA webuse nhanes2b, clear
svyset psuid [pweight = finalwgt], strata(stratid)
declare survey design for a dataset
svy: reg zinc c.age##c.age female weight rural
estimate a regression using survey weights
stset studytime, failure(died)
declare survey design for a dataset
SURVIVAL ANALYSIS webuse drugtr, clear
stsum
summarize survival-time data
stcox drug age
estimate a cox proportional hazard model
tscollap
carryforward
tsspell
compact time series into means, sums and end-of-period values
carry non-missing values forward from one obs. to the next
identify spells or runs in time series
USEFUL ADD-INS
pwmean mpg, over(rep78) pveffects mcompare(tukey)
estimate pairwise comparisons of means with equal
variances include multiple comparison adjustment
webuse systolic, clearanova systolic drug
analysis of variance and covariance
ttest mpg, by(foreign)
estimate t test on equality of means for mpg by foreign
tabulate foreign rep78, chi2 exact expected
tabulate foreign and repair record and return chi2
and Fisher’s exact statistic alongside the expected values
prtest foreign == 0.5
one-sample test of proportions
ksmirnov mpg, by(foreign) exact
Kolmogorov-Smirnov equality-of-distributions test
ranksum mpg, by(foreign) exact
equality tests on unmatched data (independent samples)
By declaring data type, you enable Stata to apply data munging and analysis functions specific to certain data types
TIME SERIES OPERATORS
L. lag x t-1
L2. 2-period lag x t-2
F. lead x t+1
F2. 2-period lead x t+2
D. difference x t
-x t-1
D2. difference of difference xt
-xt−1
-(xt−1
-xt−2
)
S. seasonal difference x t
-xt-1
S2. lag-2 (seasonal difference) xt
−xt−2
logit foreign headroom mpg, or
estimate logistic regression and
report odds ratios
regress price mpg weight, robust
estimate ordinary least squares (OLS) model
on mpg weight and foreign, apply robust standard errors
probit foreign turn price, vce(robust)
estimate probit regression with
robust standard errors
rreg price mpg weight, genwt(reg_wt)
estimate robust regression to eliminate outliers
regress price mpg weight if foreign == 0, cluster(rep78)
regress price only on domestic cars, cluster standard errors
bootstrap, reps(100): regress mpg /*
*/ weight gear foreign
estimate regression with bootstrapping
jackknife r(mean), double: sum mpg
jackknife standard error of sample mean
Examples use auto.dta (sysuse auto, clear)
unless otherwise noted
Data Analysis
For more info see Stata’s reference manual (stata.com)
Cheat Sheetwith Stata 14.1
Summarize Data
Statistical Tests
Estimation with Categorical & Factor Variables
Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) geocenter.github.io/StataTraining updated March 2016
CC BY NCDisclaimer: we are not affiliated with Stata. But we like it.
display _b[length] display _se[length]
return coefficient estimate or standard error for mpg
from most recent regression model
margins, dydx(length)
return the estimated marginal effect for mpg
margins, eyex(length)
return the estimated elasticity for price
predict yhat if e(sample)
create predictions for sample on which model was fit
predict double resid, residuals
calculate residuals based on last fit model
test mpg = 0
test linear hypotheses that mpg estimate equals zero
lincom headroom - length
test linear combination of estimates (headroom = length)
regress price headroom length Used in all postestimation examples
more details at http://www.stata.com/manuals14/u25.pdf
pwcorr price mpg weight, star(0.05)
return all pairwise correlation coefficients with sig. levels
correlate mpg price
return correlation or covariance matrix
mean price mpg
estimates of means, including standard errors
proportion rep78 foreign
estimates of proportions, including standard errors for
categories identified in varlist
ratio
estimates of ratio, including standard errors
total price
estimates of totals, including standard errors
ci mpg price, level(99)
compute standard errors and confidence intervals
stem mpg
return stem-and-leaf display of mpg
summarize price mpg, detail
calculate a variety of univariate summary statistics
frequently used commands are
highlighted in yellow
univar price mpg, boxplot
calculate univariate summary, with box-and-whiskers plot
ssc install univar
returns e-class information when post option is used
Type help regress postestimation plots
for additional diagnostic plots
hettest test for heteroskedasticityestat
vif report variance inflation factor
ovtest test for omitted variable bias
dfbeta(length)
calculate measure of influence
rvfplot, yline(0)
plot residuals
against fitted
values
plot all partial-
regression leverage
plots in one graph
avplots
Residuals
Fitted values
price
mpg
price
rep78
price
headroom
price
weight
not appropriate with robust standard errorsDiagnostics2
Postestimation3
Estimate Models1
commands that use a fitted model
stores results as -class
r
e
r
e
r eResults are stored as either -class or -class. See Programming Cheat Sheet
r
e
r
r
r
r
r
r
e
e
e
e
0
100
200 Number of sunspots
19501850 1900
4
2
0
4
2
0
1970 1980 1990
id 1 id 2
id 3 id 4
4
2
0
wage relative to inflation
Blinder-Oaxaca decomposition
ADDITIONAL MODELS
xtline plot
tsline plot
instrumental variablesivregress ivreg2
principal components analysispca
factor analysisfactor
count outcomespoisson • nbreg
censored datatobit
difference-in-differencediff
built-in Stata
command
regression discontinuityrd
dynamic panel estimatorxtabond xtabond2
propensity score matchingpsmatch2
synthetic control analysissynth
oaxaca
user-written
ssc install ivreg2

More Related Content

PPTX
Statistics for data science
PDF
Introduction to STATA - Ali Rashed
PPTX
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
PDF
Stata tutorial
PDF
Classification and regression trees (cart)
PDF
CART: Not only Classification and Regression Trees
PPTX
Logistical Regression.pptx
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
Statistics for data science
Introduction to STATA - Ali Rashed
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Stata tutorial
Classification and regression trees (cart)
CART: Not only Classification and Regression Trees
Logistical Regression.pptx
Principal Component Analysis (PCA) and LDA PPT Slides

What's hot (20)

PPTX
Statistical distributions
PDF
3 Data Structure in R
PPT
Chap8 basic cluster_analysis
PPT
Introduction to Stata
PDF
Dimensionality Reduction
PPT
Base SAS Statistics Procedures
PDF
Principal Component Analysis and Clustering
PPTX
STANDARD DEVIATION (2018) (STATISTICS)
PPTX
Statistical Analysis with R -I
PPTX
Panel data analysis
PDF
Lecture 4: Statistical Inference
PPTX
Causality detection
PDF
Data management in Stata
PDF
Winning data science competitions, presented by Owen Zhang
PPTX
Principal component analysis
PDF
R workshop xiv--Survival Analysis with R
PPTX
What is Binary Logistic Regression Classification and How is it Used in Analy...
PPTX
Descriptive Statistics in R.pptx
PPTX
Introduction to principal component analysis (pca)
PPT
Decision tree
Statistical distributions
3 Data Structure in R
Chap8 basic cluster_analysis
Introduction to Stata
Dimensionality Reduction
Base SAS Statistics Procedures
Principal Component Analysis and Clustering
STANDARD DEVIATION (2018) (STATISTICS)
Statistical Analysis with R -I
Panel data analysis
Lecture 4: Statistical Inference
Causality detection
Data management in Stata
Winning data science competitions, presented by Owen Zhang
Principal component analysis
R workshop xiv--Survival Analysis with R
What is Binary Logistic Regression Classification and How is it Used in Analy...
Descriptive Statistics in R.pptx
Introduction to principal component analysis (pca)
Decision tree
Ad

Viewers also liked (9)

PDF
Cheat sheets for data scientists
PPT
Econ govt cheat sheet
PPTX
The Firms (Principles of Economics)
DOC
Managerial Accounting Mid-term Cheat Sheet
PPT
Planning, scheduling and resource allocation
PPT
micro & macro economics
PPT
Macroeconomics slide
PDF
Ultimate Accounting "Cheat Sheet"
PPT
Demand, Supply, and Market Equilibrium
Cheat sheets for data scientists
Econ govt cheat sheet
The Firms (Principles of Economics)
Managerial Accounting Mid-term Cheat Sheet
Planning, scheduling and resource allocation
micro & macro economics
Macroeconomics slide
Ultimate Accounting "Cheat Sheet"
Demand, Supply, and Market Equilibrium
Ad

Similar to Stata cheat sheet analysis (20)

PDF
Microeconometrics_Using_Stata analisis de datos analisis de datos (2).pdf
PDF
I stata
PPTX
Lab practice session.pptx
PDF
Statistical Regression With Python
PDF
Cheat Sheet for Stata v15.00 PDF Complete
PDF
Stata Cheat Sheets (all)
PPTX
Accounting serx
PPTX
Accounting serx
PDF
Stata cheat sheet: data processing
PPTX
Statistics Linear Regression Model by Maqsood Asalam
PPT
IntroductionSTATA.ppt
PPTX
Topic 5 (multiple regression)
PPTX
11.2. Quantitative Data Analysis - Regression.pptx
PPTX
Introduction to Regression Analysis and R
DOCX
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
PDF
Stat342 ch1
DOCX
Lab 3 Set Working Directory, Scatterplots and Introduction to.docx
PPTX
Advanced Econometrics L3-4.pptx
PPTX
Topic 5 (multiple regression)
PDF
Stata cheat sheet: data visualization
Microeconometrics_Using_Stata analisis de datos analisis de datos (2).pdf
I stata
Lab practice session.pptx
Statistical Regression With Python
Cheat Sheet for Stata v15.00 PDF Complete
Stata Cheat Sheets (all)
Accounting serx
Accounting serx
Stata cheat sheet: data processing
Statistics Linear Regression Model by Maqsood Asalam
IntroductionSTATA.ppt
Topic 5 (multiple regression)
11.2. Quantitative Data Analysis - Regression.pptx
Introduction to Regression Analysis and R
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
Stat342 ch1
Lab 3 Set Working Directory, Scatterplots and Introduction to.docx
Advanced Econometrics L3-4.pptx
Topic 5 (multiple regression)
Stata cheat sheet: data visualization

Recently uploaded (20)

PPTX
Introduction to Knowledge Engineering Part 1
PDF
Lecture1 pattern recognition............
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Quality review (1)_presentation of this 21
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Reliability_Chapter_ presentation 1221.5784
Introduction to Knowledge Engineering Part 1
Lecture1 pattern recognition............
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
1_Introduction to advance data techniques.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
climate analysis of Dhaka ,Banglades.pptx
ISS -ESG Data flows What is ESG and HowHow
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Data_Analytics_and_PowerBI_Presentation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
oil_refinery_comprehensive_20250804084928 (1).pptx
Fluorescence-microscope_Botany_detailed content
Qualitative Qantitative and Mixed Methods.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Quality review (1)_presentation of this 21
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Reliability_Chapter_ presentation 1221.5784

Stata cheat sheet analysis

  • 1. OPERATOR EXAMPLE specify rep78 variable to be an indicator variablei. regress price i.rep78specify indicators ib. set the third category of rep78 to be the base categoryregress price ib(3).rep78specify base indicator fvset command to change base fvset base frequent rep78 set the base to most frequently occurring category for rep78 c. treat mpg as a continuous variable and specify an interaction between foreign and mpg regress price i.foreign#c.mpg i.foreigntreat variable as continuous # create a squared mpg term to be used in regressionregress price mpg c.mpg#c.mpgspecify interactions o. set rep78 as an indicator; omit observations with rep78 == 2regress price io(2).rep78omit a variable or indicator ## regress price c.mpg##c.mpg create all possible interactions with mpg (mpg and mpg2 )specify factorial interactions DESCRIPTION CATEGORICAL VARIABLES identify a group to which an observations belongs INDICATOR VARIABLES denote whether something is true or false T F CONTINUOUS VARIABLES measure something Declare Data tsline spot plot time series of sunspots xtset id year declare national longitudinal data to be a panel generate lag_spot = L1.spot create a new variable of annual lags of sun spots tsreport report time series aspects of a dataset xtdescribe report panel aspects of a dataset xtsum hours summarize hours worked, decomposing standard deviation into between and within components arima spot, ar(1/2) estimate an auto-regressive model with 2 lags xtreg ln_w c.age##c.age ttl_exp, fe vce(robust) estimate a fixed-effects model with robust standard errors xtline ln_wage if id <= 22, tlabel(#3) plot panel data as a line plot svydescribe report survey data details svy: mean age, over(sex) estimate a population mean for each subpopulation svy: tabulate sex heartatk report two-way table with tests of independence svy, subpop(rural): mean age estimate a population mean for rural areas tsset time, yearly declare sunspot data to be yearly time series TIME SERIES webuse sunspot, clear PANEL / LONGITUDINAL webuse nlswork, clear SURVEY DATA webuse nhanes2b, clear svyset psuid [pweight = finalwgt], strata(stratid) declare survey design for a dataset svy: reg zinc c.age##c.age female weight rural estimate a regression using survey weights stset studytime, failure(died) declare survey design for a dataset SURVIVAL ANALYSIS webuse drugtr, clear stsum summarize survival-time data stcox drug age estimate a cox proportional hazard model tscollap carryforward tsspell compact time series into means, sums and end-of-period values carry non-missing values forward from one obs. to the next identify spells or runs in time series USEFUL ADD-INS pwmean mpg, over(rep78) pveffects mcompare(tukey) estimate pairwise comparisons of means with equal variances include multiple comparison adjustment webuse systolic, clearanova systolic drug analysis of variance and covariance ttest mpg, by(foreign) estimate t test on equality of means for mpg by foreign tabulate foreign rep78, chi2 exact expected tabulate foreign and repair record and return chi2 and Fisher’s exact statistic alongside the expected values prtest foreign == 0.5 one-sample test of proportions ksmirnov mpg, by(foreign) exact Kolmogorov-Smirnov equality-of-distributions test ranksum mpg, by(foreign) exact equality tests on unmatched data (independent samples) By declaring data type, you enable Stata to apply data munging and analysis functions specific to certain data types TIME SERIES OPERATORS L. lag x t-1 L2. 2-period lag x t-2 F. lead x t+1 F2. 2-period lead x t+2 D. difference x t -x t-1 D2. difference of difference xt -xt−1 -(xt−1 -xt−2 ) S. seasonal difference x t -xt-1 S2. lag-2 (seasonal difference) xt −xt−2 logit foreign headroom mpg, or estimate logistic regression and report odds ratios regress price mpg weight, robust estimate ordinary least squares (OLS) model on mpg weight and foreign, apply robust standard errors probit foreign turn price, vce(robust) estimate probit regression with robust standard errors rreg price mpg weight, genwt(reg_wt) estimate robust regression to eliminate outliers regress price mpg weight if foreign == 0, cluster(rep78) regress price only on domestic cars, cluster standard errors bootstrap, reps(100): regress mpg /* */ weight gear foreign estimate regression with bootstrapping jackknife r(mean), double: sum mpg jackknife standard error of sample mean Examples use auto.dta (sysuse auto, clear) unless otherwise noted Data Analysis For more info see Stata’s reference manual (stata.com) Cheat Sheetwith Stata 14.1 Summarize Data Statistical Tests Estimation with Categorical & Factor Variables Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) geocenter.github.io/StataTraining updated March 2016 CC BY NCDisclaimer: we are not affiliated with Stata. But we like it. display _b[length] display _se[length] return coefficient estimate or standard error for mpg from most recent regression model margins, dydx(length) return the estimated marginal effect for mpg margins, eyex(length) return the estimated elasticity for price predict yhat if e(sample) create predictions for sample on which model was fit predict double resid, residuals calculate residuals based on last fit model test mpg = 0 test linear hypotheses that mpg estimate equals zero lincom headroom - length test linear combination of estimates (headroom = length) regress price headroom length Used in all postestimation examples more details at http://www.stata.com/manuals14/u25.pdf pwcorr price mpg weight, star(0.05) return all pairwise correlation coefficients with sig. levels correlate mpg price return correlation or covariance matrix mean price mpg estimates of means, including standard errors proportion rep78 foreign estimates of proportions, including standard errors for categories identified in varlist ratio estimates of ratio, including standard errors total price estimates of totals, including standard errors ci mpg price, level(99) compute standard errors and confidence intervals stem mpg return stem-and-leaf display of mpg summarize price mpg, detail calculate a variety of univariate summary statistics frequently used commands are highlighted in yellow univar price mpg, boxplot calculate univariate summary, with box-and-whiskers plot ssc install univar returns e-class information when post option is used Type help regress postestimation plots for additional diagnostic plots hettest test for heteroskedasticityestat vif report variance inflation factor ovtest test for omitted variable bias dfbeta(length) calculate measure of influence rvfplot, yline(0) plot residuals against fitted values plot all partial- regression leverage plots in one graph avplots Residuals Fitted values price mpg price rep78 price headroom price weight not appropriate with robust standard errorsDiagnostics2 Postestimation3 Estimate Models1 commands that use a fitted model stores results as -class r e r e r eResults are stored as either -class or -class. See Programming Cheat Sheet r e r r r r r r e e e e 0 100 200 Number of sunspots 19501850 1900 4 2 0 4 2 0 1970 1980 1990 id 1 id 2 id 3 id 4 4 2 0 wage relative to inflation Blinder-Oaxaca decomposition ADDITIONAL MODELS xtline plot tsline plot instrumental variablesivregress ivreg2 principal components analysispca factor analysisfactor count outcomespoisson • nbreg censored datatobit difference-in-differencediff built-in Stata command regression discontinuityrd dynamic panel estimatorxtabond xtabond2 propensity score matchingpsmatch2 synthetic control analysissynth oaxaca user-written ssc install ivreg2