SlideShare a Scribd company logo
CAPMAS
International Statistics Day
20-10-2015
Egyptian Economic Census
Workshop 2012/2013
Introduction to STATA
By Ali Rashed
Population Council
17th – 19th October 2015
 STATA is a complete, integrated statistical
package that provides everything you
need for data analysis, data management,
and graphics.
 STATA is not sold in pieces, which means
you get everything you need in one
package without annual license fees.
 Fast, Accurate, and Easy to use:
WHY STATA
 You can access all of STATA’s data
management, statistical, and analysis features
from the menus and associated dialogs.
 Command syntax: a simple and consistent
 Online help & a topical index built into the
online help system
 All analyses can be reproduced and
documented for publication and review.
WHY STATA, Cont.
 Run Stata, Open a data set, describe its contents and Exit:
 Run Stata program from the “Start” button
 “use” Command:
 Open a Stata data set from the “File” pull-down Menu
 Example:
 cd “D:My DocumentsTraining CoursesUNDP Jordan June 2011Jordan
LMPS2010 “
 Use "JLMPS indiv public v1_ 0.dta", clear
 “describe” Command
 dir and cd commands work just like in DOS
 STATA commands are case-sensitive.
Type in small letters
Opening (Using) a data set
 Note the FOUR main Windows:
1. Command: to issue the commands to Stata
2. Result: to see the results
3. Variables: Shows the list of variables of the data set
active in memory: Click on a variable name to put it
into the command window
4. Review: Keeps track of the commands issued, so
each command you type is displayed here.
 Click on a command to put it into the command window for
editing
 Double-click on a command to execute it directly
The STATA Display
 You can resize these 4 windows
independently, and you can
 resize the outer window as well. To save
your window size
 changes, click on Edit, Preferences, Save
Preferences Set
Main FOUR Windows, Cont.
File types:
 xxxx.do files → txt files with your
commands, for future reference and editing
 xxxx.log files → txt files with your output,
for future reference and printing
 xxxx.dta files → data files in Stata format
 xxxx.gph files → graph files in Stata format
 xxxx.ado files → programs in Stata
STATA Files Types
Introduction to STATA - Ali Rashed
Introduction to STATA - Ali Rashed
Introduction to STATA - Ali Rashed
 Log File: For good documentation of operations and
output
 Variable storage type:
 byte : variable stored in one byte
 int: variable stored in 2 bytes
 long: 4 bytes (for variables with 9 digits or more)
 Float: 4 bytes (7 digits of accuracy )
 double: 8 bytes (16 digits of accuracy )
 “compress” command (Reduce the storage type to
minimum storage necessary)
 set memory 500m,perm
Describing Data
 Commands
 summarize (or summarize x y z)
 provides summary statistics for all or a subset of variables
 remember SATATA commands are case-sensitive
you can always use abbreviations if they are
not ambiguous
 e.g. sum x
 summarize by subgroup
 sort groupvar
 bysort groupvar: sum varname
Summarizing
 in qualifier
 Defines range of observation that command
applies to
 Examples:
list in 5/10
list gov pubpriv frame sector4d sector2d in 4/l
(the letter l refers to last)
Edit command

Specifying Subsets of the Data
if qualifier
 Defines observations that satisfy a certain
condition
 Example:
 sum empl weight totprod totsales totwage outputfc
totva netva netindtax profit1 population if pubpriv ==1
 sum empl weight totprod totsales totwage outputfc
totva netva netindtax profit1 population hhcount if
profit1 >=0 & profit1<200
 count if profit1<0 & pubpriv ==1 //How Many?//
 tab gov if profit1<0 & pubpriv ==1
 tab sector4d if profit1<0 & pubpriv ==1
 tab sector2d if profit1<0 & pubpriv ==1
 == is equal to
 != is not equal to ( ~= also works)
 > is greater than
 < is less than
 >= is greater than or equal to
 <= is less than or equal to
Logical operators
 “generate” command is used to create new
variables
 “replace” command is used to modify an existing
variable
 Examples:
sum profit1 // Net Profit
gen LnProfit=ln(profit1)
generate durEstablished= 2015-firmage
sum durEstablished
replace durEstablished =. if durEstablished <0
recode durEstablished (min/0 =. )
Transforming Variables
Basic descriptive commands
• describe or d
Gives a summary of the current data file:
•Number of observations/variables
•Data file size
•List of variables (name, type, label value)
• codebook
– Variables summary:
•Type, range, values, frequency
• List or l
– Display the values of the variables for each
observation
Basic data set management
• Sort
– Sort the data set
– Examples: sort gov or sort gov sector2d
• Keeping variables
– Examples:
• keep id gov pubpriv sector2d profit1 : will only keep these variables
• Dropping variables
– Examples:
• drop gov pubpriv frame : will drop variables these variables
• drop gov- prjs : will drop all variables from gov to prjs
• drop w* : will drop all variables beginning with q
Creation of variables
• Command: generate (or gen)
• Create string variables
• gen str10 cityname= « Cairo"
• Create numeric variables
• gen Net_Profit=profit1- netindtax (type float by default)
• gen byte Sales_Per_worker= totsales / saleworktot
• Change a variable type:
• gen str7 cluster=substr(id,7,12)
• edit id cluster
• gen str4 year="2015"
• destring year, replace
• Rename variables
– Ren oldname new name: ren id firm_id
• Recode variable values
• for var profit1 netindtax : recode X (min/0=.)
Variables Labels and Values
• Labelling variable names
•label var gov "Governorate"
•label var profit1 « Net Profit in ,000"
• Labelling variables values (2 steps)
•label def yesno 0 "No" 2 "Yes"
•label val public yesno
• Changing label values
•label def yesno 1 "Yes" 2 "No", modify
•label val public yesno
Identify and Delete duplicated
observations
• duplicates list id
• duplicates report id
• duplicates browse
• duplicates tag id,gen(tag)
• duplicates drop id
• duplicates drop id, force
 tabulate command produces frequency cross-tabs of
one or two variables
 tabu gov
 tabu gov sector2d
 tabu gov sector2d,col
 tabu gov sector2d, col row missing nolabel nofreq
 tab1 varlist - performs one-way tables for varlist (tab1 gov
sector4d sector2d )
 tab2 varlist - performs all possible 2-way tables for varlist
(tab2 age sector2d sector4d)
 Table Command
Tabulation
 Several types of weights
- fweight or frequency weights: are weights that indicate the
number of duplicated observations
- aweight or analytical weights: are weights that are inversely
proportional to the variance of an observation.
- iweights or importance weights: are weights that indicate
the "importance" of the observation in some vague sense.
- pweight or probability weight: or sampling weights, are
weights that denote the inverse of the probability that the
observation is included due to the sampling design.
Using Weights
 EXAMPLES
 Frequency weights
 tabu gov sector2d [fweight=int(weight)],
ro co
 Analytical wegihts
 tabu gov sector2d [aweights=weight]
Using Weights
 To add observations from two files with the same
variables
 append command
 To add variables from two files with similar
observations
 merge
 To add variables from two files with different
observations (e.g. individuals and household)
 merge idvar
Combining 2 or More STATA Files
 Merging by unique id allows you to combine variables from
two different STATA data sets
 Examples
 Merging an individual’s employment variables to his/her
demographic characteristics
 Merging the parent’s info to the individual’s demographic file.
 Merging information on a parent who is present in the
household to an individual’s demographic file
 Merging community information to the individual or
household level files
Merging Files
 The objective is to match observations that share a
unique id from two files
 The master file: the file to merge into
 The using file: the file to merge from
 Examples with two files containing indiv. information
 open the file containing the variables you need
 use filename, clear
 keep the unique id and the variables you need
 keep indid hhid gov pn varnames
Match Merge
 sort by unique id
 Sort id
 save under new name
 save temp1
 use master data set
 use “ORIGINAL FILE.dta”, clear
 sort by unique id
 sort id
 merge by unique id
 merge id using temp1
Match Merge (2)
 checking how successful your merge was
 tabu _merge
 _merge==3 observ in both master and using
 _merge==2 observations in using but not in master
 _merge==1 observation in master but not in using
 drop _merge
 update option
 substitute missing values in master with nonmissing values in
using for same variables
 replace option
 replaces any value in master with non-missing value in using
Match Merge (3)
 1- Merging individual-level data into individual level files
 2- Merging household level data into individual-level file
 3- Merging individual-level data into household-level file
Types of Match Merge
 On-line help is one of the most useful aspects of
STATA
 Now connected to STATA Corp web site through the
net
 Help menu
 search
 stata commands
 Stata Technical Bulletin
Using STATA’s on-line help
 What’s new in STATA
 STATA is web-aware
 use data sets over the web
 example: use
http://www.stata.com/manual/oddeven.dta,clear
 updates
 update query
 check out help menu
For Advanced Users
 Stata can accept data in several forms.
 Stata Editor:
 Enter a small data set consisting of 6 observations, and three
variables, where var1 is the name of individual, var2 is his
income, and var3 is his/her consumption.
 Then, “list”, “describe”, and “save”.
 Stata can read ASCII (text) file,
 Delimited ASCII, data separated by : spaces, comma, tab.
 Fixed length ASCII file
 Utilities to transform data sets from one form (say SPSS,
Excel, etc.) into all other forms (STAT/Transfer).
Inputting and Reading Data
 ASCII delimited files are text files where data are
separated by delimiters
 If missing observations are spaces, then delimiter
should not be a space, use comma instead
 For space delimited data, the command to use is:
 infile x y z using data.txt
 x y z should be names equal in number to the variables in
each record
 if x y z is omitted, STATA assigns v1 v2 v3
 describe
 compress
 infile assumes numeric format unless otherwise
specified
 Assume x is a string (alphanumeric) variable
 infile str10 x y z
Reading Delimited ASCII files
 Another common format is comma or tab delimited
data
 Variables names are assumed to be in first row, also
comma or tab delimited
 No need to identify string variables in comma or tab
delimited files
 The appropriate STATA command is
 insheet using filename.csv, comma
 insheet using filename.txt, tab
 A utility program such as STAT/TRANSFER can be
used to read most data formats, including SPSS,
Excel, SAS, Dbase, Access, etc.
Reading Delimited ASCII files
 Fixed format ASCII files has no separators
between variables but each variable always
appears in the same positions
 This is how data typically come from data
entry packages
 Two ways of doing it:
 Without data dictionary
infix rectyp 1-2 gov 3-4 qism 5-6 psu 7-9 urbrur 10
hhgov 11-14 hhpsu 15-16 using rec02.dat
 With data dictionary
Prepare dictionary file using text editor as
explained in handout
Reading Fixed Format ASCII files
Using STATA Graphs
graph twoway scatterplots, line plots, etc.
graph matrix scatterplot matrices
graph bar bar charts
graph dot dot charts
graph box box-and-whisker plots
graph pie pie charts
histogram
graph save
graph use
graph display
graph combine
graph export
Macros
• A macro is a shorthand—one thing standing for
another. For instance:
• local list "age weight sex"
• regress outcome `list'
is the same as
• regress outcome age weight sex
• local or global?
What is the difference? Which one should I use?
Global can get you into a mess
Better to stick with local variables rather than get in over
your head
Thank you
Introduction to STATA - Ali Rashed

More Related Content

PPTX
INTRODUCTION TO STATA.pptx
PDF
Data management in Stata
PDF
Introduction to Stata
PPT
Stata Training_EEA.ppt
PDF
Stata tutorial
PPT
Introduction to Stata
PDF
Introduction to STATA(2).pdf
PPTX
STATA - Time Series Analysis
INTRODUCTION TO STATA.pptx
Data management in Stata
Introduction to Stata
Stata Training_EEA.ppt
Stata tutorial
Introduction to Stata
Introduction to STATA(2).pdf
STATA - Time Series Analysis

What's hot (20)

PDF
Stata statistics
PPTX
STATA - Introduction
PDF
Stata tutorial university of princeton
PDF
Multiple regression in spss
PPT
Descriptive statistics ii
PPTX
PPT
Introduction To SPSS
PPT
Introduction To Statistics
PPT
An introduction to spss
PPTX
Introduction To SPSS
PDF
Time series analysis in Stata
PPTX
7 anova chi square test
PPT
Data management through spss
PDF
Spss training notes
PDF
Spss tutorial 1
PDF
Data Analysis using SPSS: Part 1
PPT
Bivariate analysis
PPTX
Introduction to Statistics - Basic concepts
PPTX
Descriptive Statistics
PDF
SPSS introduction Presentation
Stata statistics
STATA - Introduction
Stata tutorial university of princeton
Multiple regression in spss
Descriptive statistics ii
Introduction To SPSS
Introduction To Statistics
An introduction to spss
Introduction To SPSS
Time series analysis in Stata
7 anova chi square test
Data management through spss
Spss training notes
Spss tutorial 1
Data Analysis using SPSS: Part 1
Bivariate analysis
Introduction to Statistics - Basic concepts
Descriptive Statistics
SPSS introduction Presentation
Ad

Similar to Introduction to STATA - Ali Rashed (20)

PDF
An introduction to STATA.pdf
PPTX
Statistics Linear Regression Model by Maqsood Asalam
DOCX
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
PDF
Phd coursestatalez2datamanagement
PPTX
Complete_STATA_Introduction_Beginner.pptx
PDF
StataTutorial.pdf
PPTX
introduction-stata.pptx
PPTX
introductions to Stata software power point
PDF
Microeconometrics_Using_Stata analisis de datos analisis de datos (2).pdf
PDF
Getting started with stata 13
PDF
Cheat Sheet for Stata v15.00 PDF Complete
PPTX
Introduction
PDF
Stata Cheat Sheets (all)
DOCX
Stata claass lecture
PDF
Phd courselez1introtostata
PPTX
Introduction - Using Stata
PDF
STATA_Training_for_data_science_juniors.pdf
PPT
IntroductionSTATA.ppt
PDF
slides.pdf
PPT
Actividad3 david a. condori tantani
An introduction to STATA.pdf
Statistics Linear Regression Model by Maqsood Asalam
IMG1.jpgIMG2.jpgIMG3.jpg2016 6 19 156 Page .docx
Phd coursestatalez2datamanagement
Complete_STATA_Introduction_Beginner.pptx
StataTutorial.pdf
introduction-stata.pptx
introductions to Stata software power point
Microeconometrics_Using_Stata analisis de datos analisis de datos (2).pdf
Getting started with stata 13
Cheat Sheet for Stata v15.00 PDF Complete
Introduction
Stata Cheat Sheets (all)
Stata claass lecture
Phd courselez1introtostata
Introduction - Using Stata
STATA_Training_for_data_science_juniors.pdf
IntroductionSTATA.ppt
slides.pdf
Actividad3 david a. condori tantani
Ad

More from Economic Research Forum (20)

PPTX
Session 4 farhad mehran, single most data gaps
PDF
Session 3 mahdi ben jelloul, microsimulation for policy evaluation
PDF
Session 3 m.a. marouani, structual change, skills demand and job quality
PPTX
Session 3 ishac diwn, bridging mirco and macro appraoches
PDF
Session 3 asif islam, jobs flagship report
PPTX
Session 2 yemen hlel, insights from tunisia
PPT
Session 2 samia satti, insights from sudan
PPTX
Session 2 mona amer, insights from egypt
PPTX
Session 2 ali souag, insights from algeria
PPTX
Session 2 abdel rahmen el lahga, insights from tunisia
PPTX
Session 1 ragui assaad, moving beyond the unemployment rate
PPTX
Session 1 luca fedi, towards a research agenda
PDF
من البيانات الى السياسات : مبادرة إتاحة البيانات المنسقة
PPTX
The Future of Jobs is Facing the Biggest Policy Induced Price Distortion in H...
PPTX
Job- Creating Growth in the Emerging Global Economy
PPTX
The Role of Knowledge in the Process of Innovation in the New Global Economy:...
PPTX
Rediscovering Industrial Policy for the 21st Century: Where to Start?
PPTX
How the Rise of the Intangibles Economy is Disrupting Work in Africa
PPTX
On Ideas and Economic Policy: A Survey of MENA Economists
PPTX
Future Research Directions for ERF
Session 4 farhad mehran, single most data gaps
Session 3 mahdi ben jelloul, microsimulation for policy evaluation
Session 3 m.a. marouani, structual change, skills demand and job quality
Session 3 ishac diwn, bridging mirco and macro appraoches
Session 3 asif islam, jobs flagship report
Session 2 yemen hlel, insights from tunisia
Session 2 samia satti, insights from sudan
Session 2 mona amer, insights from egypt
Session 2 ali souag, insights from algeria
Session 2 abdel rahmen el lahga, insights from tunisia
Session 1 ragui assaad, moving beyond the unemployment rate
Session 1 luca fedi, towards a research agenda
من البيانات الى السياسات : مبادرة إتاحة البيانات المنسقة
The Future of Jobs is Facing the Biggest Policy Induced Price Distortion in H...
Job- Creating Growth in the Emerging Global Economy
The Role of Knowledge in the Process of Innovation in the New Global Economy:...
Rediscovering Industrial Policy for the 21st Century: Where to Start?
How the Rise of the Intangibles Economy is Disrupting Work in Africa
On Ideas and Economic Policy: A Survey of MENA Economists
Future Research Directions for ERF

Recently uploaded (20)

DOCX
Alexistogel: Solusi Tepat untuk Anda yang Cari Bandar Toto Macau Resmi
PDF
2025 Shadow report on Ukraine's progression regarding Chapter 29 of the acquis
PDF
Items # 6&7 - 900 Cambridge Oval Right-of-Way
PPTX
OUR GOVERNMENT-Grade 5 -World around us.
PPTX
11Sept2023_LTIA-Cluster-Training-Presentation.pptx
DOC
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
PDF
oil palm convergence 2024 mahabubnagar.pdf
PPT
generalgeologygroundwaterchapt11-181117073208.ppt
PPTX
Quiz - Saturday.pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PPTX
Omnibus rules on leave administration.pptx
PDF
ISO-9001-2015-gap-analysis-checklist-sample.pdf
PDF
PPT Item #s 2&3 - 934 Patterson SUP & Final Review
PPTX
Weekly Report 17-10-2024_cybersecutity.pptx
PPTX
sepsis.pptxMNGHGBDHSB KJHDGBSHVCJB KJDCGHBYUHFB SDJKFHDUJ
PDF
The Detrimental Impacts of Hydraulic Fracturing for Oil and Gas_ A Researched...
PDF
Abhay Bhutada and Other Visionary Leaders Reinventing Governance in India
PDF
buyers sellers meeting of mangoes in mahabubnagar.pdf
PPTX
SOMANJAN PRAMANIK_3500032 2042.pptx
PPTX
Vocational Education for educational purposes
PPTX
Nur Shakila Assesmentlwemkf;m;mwee f.pptx
Alexistogel: Solusi Tepat untuk Anda yang Cari Bandar Toto Macau Resmi
2025 Shadow report on Ukraine's progression regarding Chapter 29 of the acquis
Items # 6&7 - 900 Cambridge Oval Right-of-Way
OUR GOVERNMENT-Grade 5 -World around us.
11Sept2023_LTIA-Cluster-Training-Presentation.pptx
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
oil palm convergence 2024 mahabubnagar.pdf
generalgeologygroundwaterchapt11-181117073208.ppt
Quiz - Saturday.pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Omnibus rules on leave administration.pptx
ISO-9001-2015-gap-analysis-checklist-sample.pdf
PPT Item #s 2&3 - 934 Patterson SUP & Final Review
Weekly Report 17-10-2024_cybersecutity.pptx
sepsis.pptxMNGHGBDHSB KJHDGBSHVCJB KJDCGHBYUHFB SDJKFHDUJ
The Detrimental Impacts of Hydraulic Fracturing for Oil and Gas_ A Researched...
Abhay Bhutada and Other Visionary Leaders Reinventing Governance in India
buyers sellers meeting of mangoes in mahabubnagar.pdf
SOMANJAN PRAMANIK_3500032 2042.pptx
Vocational Education for educational purposes
Nur Shakila Assesmentlwemkf;m;mwee f.pptx

Introduction to STATA - Ali Rashed

  • 1. CAPMAS International Statistics Day 20-10-2015 Egyptian Economic Census Workshop 2012/2013 Introduction to STATA By Ali Rashed Population Council 17th – 19th October 2015
  • 2.  STATA is a complete, integrated statistical package that provides everything you need for data analysis, data management, and graphics.  STATA is not sold in pieces, which means you get everything you need in one package without annual license fees.  Fast, Accurate, and Easy to use: WHY STATA
  • 3.  You can access all of STATA’s data management, statistical, and analysis features from the menus and associated dialogs.  Command syntax: a simple and consistent  Online help & a topical index built into the online help system  All analyses can be reproduced and documented for publication and review. WHY STATA, Cont.
  • 4.  Run Stata, Open a data set, describe its contents and Exit:  Run Stata program from the “Start” button  “use” Command:  Open a Stata data set from the “File” pull-down Menu  Example:  cd “D:My DocumentsTraining CoursesUNDP Jordan June 2011Jordan LMPS2010 “  Use "JLMPS indiv public v1_ 0.dta", clear  “describe” Command  dir and cd commands work just like in DOS  STATA commands are case-sensitive. Type in small letters Opening (Using) a data set
  • 5.  Note the FOUR main Windows: 1. Command: to issue the commands to Stata 2. Result: to see the results 3. Variables: Shows the list of variables of the data set active in memory: Click on a variable name to put it into the command window 4. Review: Keeps track of the commands issued, so each command you type is displayed here.  Click on a command to put it into the command window for editing  Double-click on a command to execute it directly The STATA Display
  • 6.  You can resize these 4 windows independently, and you can  resize the outer window as well. To save your window size  changes, click on Edit, Preferences, Save Preferences Set Main FOUR Windows, Cont.
  • 7. File types:  xxxx.do files → txt files with your commands, for future reference and editing  xxxx.log files → txt files with your output, for future reference and printing  xxxx.dta files → data files in Stata format  xxxx.gph files → graph files in Stata format  xxxx.ado files → programs in Stata STATA Files Types
  • 11.  Log File: For good documentation of operations and output  Variable storage type:  byte : variable stored in one byte  int: variable stored in 2 bytes  long: 4 bytes (for variables with 9 digits or more)  Float: 4 bytes (7 digits of accuracy )  double: 8 bytes (16 digits of accuracy )  “compress” command (Reduce the storage type to minimum storage necessary)  set memory 500m,perm Describing Data
  • 12.  Commands  summarize (or summarize x y z)  provides summary statistics for all or a subset of variables  remember SATATA commands are case-sensitive you can always use abbreviations if they are not ambiguous  e.g. sum x  summarize by subgroup  sort groupvar  bysort groupvar: sum varname Summarizing
  • 13.  in qualifier  Defines range of observation that command applies to  Examples: list in 5/10 list gov pubpriv frame sector4d sector2d in 4/l (the letter l refers to last) Edit command  Specifying Subsets of the Data
  • 14. if qualifier  Defines observations that satisfy a certain condition  Example:  sum empl weight totprod totsales totwage outputfc totva netva netindtax profit1 population if pubpriv ==1  sum empl weight totprod totsales totwage outputfc totva netva netindtax profit1 population hhcount if profit1 >=0 & profit1<200  count if profit1<0 & pubpriv ==1 //How Many?//  tab gov if profit1<0 & pubpriv ==1  tab sector4d if profit1<0 & pubpriv ==1  tab sector2d if profit1<0 & pubpriv ==1
  • 15.  == is equal to  != is not equal to ( ~= also works)  > is greater than  < is less than  >= is greater than or equal to  <= is less than or equal to Logical operators
  • 16.  “generate” command is used to create new variables  “replace” command is used to modify an existing variable  Examples: sum profit1 // Net Profit gen LnProfit=ln(profit1) generate durEstablished= 2015-firmage sum durEstablished replace durEstablished =. if durEstablished <0 recode durEstablished (min/0 =. ) Transforming Variables
  • 17. Basic descriptive commands • describe or d Gives a summary of the current data file: •Number of observations/variables •Data file size •List of variables (name, type, label value) • codebook – Variables summary: •Type, range, values, frequency • List or l – Display the values of the variables for each observation
  • 18. Basic data set management • Sort – Sort the data set – Examples: sort gov or sort gov sector2d • Keeping variables – Examples: • keep id gov pubpriv sector2d profit1 : will only keep these variables • Dropping variables – Examples: • drop gov pubpriv frame : will drop variables these variables • drop gov- prjs : will drop all variables from gov to prjs • drop w* : will drop all variables beginning with q
  • 19. Creation of variables • Command: generate (or gen) • Create string variables • gen str10 cityname= « Cairo" • Create numeric variables • gen Net_Profit=profit1- netindtax (type float by default) • gen byte Sales_Per_worker= totsales / saleworktot • Change a variable type: • gen str7 cluster=substr(id,7,12) • edit id cluster • gen str4 year="2015" • destring year, replace • Rename variables – Ren oldname new name: ren id firm_id • Recode variable values • for var profit1 netindtax : recode X (min/0=.)
  • 20. Variables Labels and Values • Labelling variable names •label var gov "Governorate" •label var profit1 « Net Profit in ,000" • Labelling variables values (2 steps) •label def yesno 0 "No" 2 "Yes" •label val public yesno • Changing label values •label def yesno 1 "Yes" 2 "No", modify •label val public yesno
  • 21. Identify and Delete duplicated observations • duplicates list id • duplicates report id • duplicates browse • duplicates tag id,gen(tag) • duplicates drop id • duplicates drop id, force
  • 22.  tabulate command produces frequency cross-tabs of one or two variables  tabu gov  tabu gov sector2d  tabu gov sector2d,col  tabu gov sector2d, col row missing nolabel nofreq  tab1 varlist - performs one-way tables for varlist (tab1 gov sector4d sector2d )  tab2 varlist - performs all possible 2-way tables for varlist (tab2 age sector2d sector4d)  Table Command Tabulation
  • 23.  Several types of weights - fweight or frequency weights: are weights that indicate the number of duplicated observations - aweight or analytical weights: are weights that are inversely proportional to the variance of an observation. - iweights or importance weights: are weights that indicate the "importance" of the observation in some vague sense. - pweight or probability weight: or sampling weights, are weights that denote the inverse of the probability that the observation is included due to the sampling design. Using Weights
  • 24.  EXAMPLES  Frequency weights  tabu gov sector2d [fweight=int(weight)], ro co  Analytical wegihts  tabu gov sector2d [aweights=weight] Using Weights
  • 25.  To add observations from two files with the same variables  append command  To add variables from two files with similar observations  merge  To add variables from two files with different observations (e.g. individuals and household)  merge idvar Combining 2 or More STATA Files
  • 26.  Merging by unique id allows you to combine variables from two different STATA data sets  Examples  Merging an individual’s employment variables to his/her demographic characteristics  Merging the parent’s info to the individual’s demographic file.  Merging information on a parent who is present in the household to an individual’s demographic file  Merging community information to the individual or household level files Merging Files
  • 27.  The objective is to match observations that share a unique id from two files  The master file: the file to merge into  The using file: the file to merge from  Examples with two files containing indiv. information  open the file containing the variables you need  use filename, clear  keep the unique id and the variables you need  keep indid hhid gov pn varnames Match Merge
  • 28.  sort by unique id  Sort id  save under new name  save temp1  use master data set  use “ORIGINAL FILE.dta”, clear  sort by unique id  sort id  merge by unique id  merge id using temp1 Match Merge (2)
  • 29.  checking how successful your merge was  tabu _merge  _merge==3 observ in both master and using  _merge==2 observations in using but not in master  _merge==1 observation in master but not in using  drop _merge  update option  substitute missing values in master with nonmissing values in using for same variables  replace option  replaces any value in master with non-missing value in using Match Merge (3)
  • 30.  1- Merging individual-level data into individual level files  2- Merging household level data into individual-level file  3- Merging individual-level data into household-level file Types of Match Merge
  • 31.  On-line help is one of the most useful aspects of STATA  Now connected to STATA Corp web site through the net  Help menu  search  stata commands  Stata Technical Bulletin Using STATA’s on-line help
  • 32.  What’s new in STATA  STATA is web-aware  use data sets over the web  example: use http://www.stata.com/manual/oddeven.dta,clear  updates  update query  check out help menu For Advanced Users
  • 33.  Stata can accept data in several forms.  Stata Editor:  Enter a small data set consisting of 6 observations, and three variables, where var1 is the name of individual, var2 is his income, and var3 is his/her consumption.  Then, “list”, “describe”, and “save”.  Stata can read ASCII (text) file,  Delimited ASCII, data separated by : spaces, comma, tab.  Fixed length ASCII file  Utilities to transform data sets from one form (say SPSS, Excel, etc.) into all other forms (STAT/Transfer). Inputting and Reading Data
  • 34.  ASCII delimited files are text files where data are separated by delimiters  If missing observations are spaces, then delimiter should not be a space, use comma instead  For space delimited data, the command to use is:  infile x y z using data.txt  x y z should be names equal in number to the variables in each record  if x y z is omitted, STATA assigns v1 v2 v3  describe  compress  infile assumes numeric format unless otherwise specified  Assume x is a string (alphanumeric) variable  infile str10 x y z Reading Delimited ASCII files
  • 35.  Another common format is comma or tab delimited data  Variables names are assumed to be in first row, also comma or tab delimited  No need to identify string variables in comma or tab delimited files  The appropriate STATA command is  insheet using filename.csv, comma  insheet using filename.txt, tab  A utility program such as STAT/TRANSFER can be used to read most data formats, including SPSS, Excel, SAS, Dbase, Access, etc. Reading Delimited ASCII files
  • 36.  Fixed format ASCII files has no separators between variables but each variable always appears in the same positions  This is how data typically come from data entry packages  Two ways of doing it:  Without data dictionary infix rectyp 1-2 gov 3-4 qism 5-6 psu 7-9 urbrur 10 hhgov 11-14 hhpsu 15-16 using rec02.dat  With data dictionary Prepare dictionary file using text editor as explained in handout Reading Fixed Format ASCII files
  • 37. Using STATA Graphs graph twoway scatterplots, line plots, etc. graph matrix scatterplot matrices graph bar bar charts graph dot dot charts graph box box-and-whisker plots graph pie pie charts histogram graph save graph use graph display graph combine graph export
  • 38. Macros • A macro is a shorthand—one thing standing for another. For instance: • local list "age weight sex" • regress outcome `list' is the same as • regress outcome age weight sex • local or global? What is the difference? Which one should I use? Global can get you into a mess Better to stick with local variables rather than get in over your head