SlideShare a Scribd company logo
Data Manipulation :
Transforming, Recoding, and
Splitting/Grouping
Lecture Overview
 Transformation (computing) of Data
 Recoding
 Splitting/Grouping
Transformation (computing) of Data
 SPSS has very powerful capabilities for creating new
variables as a function of existing variables, for
example
 To create averages of existing variables
 To rescale existing variables
 To compute difference scores by subtracting one variable
from another
Computing A New Variable
 TargetVariable - new variable
name.
 Numeric Expression - defining
the new variable, essentially
giving SPSS a formula.
 The variables in the Numeric
Expression need to be either
existing variables or
numbers.
 Specify theType and Label
for the newVariable.
Conditionally Computing A New Variable
 Variables can also be
computed conditionally.
For instance, if in the above
example, you were only
interested in the change in
salaries for people who began
working for the company
within the last six years.
 If… button
(optional case selection
condition)
Transforming An Existing Variable
 For example, the variable
jobtime represents months of
experience on the job, but
we may wish to analyze data
in terms of years on the job
 Give a new variable name
or keep the existing name
for theTargetVariable
Recoding Variables
 Another way to modify the values of existing variables
in your dataset. In the Data Editor: Transform ->
Recode
 Into Same Variables option -
changes the values of the
existing variables.
 Into Different Variables
option - create a new
variable with the recoded
values. (not overwrite your
original data)
 Both options are essentially
the same, except that
recoding into a different
variable requires you to
supply a new variable name.
Recoding Variables
Categories to Categories
For example, the variable jobcat codes an employee's status in
three categories, but for a particular analysis you may want to
combine two of these classifications into a single category.
The original coding was
 Clerical = 1 -> 1
 Custodial = 2 -> 2
 Manager = 3 -> 2
Recoding Variables
Input variable -> Output variable
-> Change bottom -> Define the old and new
values
jobcat
Old New
3 2
2 2
1 1
0 (missing) ?
 With System-
missing
selected
 Without System-
missing
selected
Recoding Variables
Numeric to Categories
 Recode the continuous variable to the categorical variable
 Useful in cross-tabulation
 Group ranges of the variable into categories
For example, we need to code an employee's current salary into
two categories as the following
Less than or equal to $27,000 (Low) -> 1
More than $27,000 (High) -> 2
Recoding Variables
Numeric to Categories
 How about if
there are more
than 2 groups?
 Where is the
value 27,000?
Splitting/Grouping
 In some situations, you may want to perform the same analysis
on different groups within the same dataset.
 Analyses such as these can be conducted by first selecting the
Split File function from the Data menu in the Data Editor:
Data -> Split File...
Splitting/Grouping
Because the split file command remains in
effect indefinitely, you should reset this
option when you no longer want a split file
analysis.
If you select the option
to "Sort the file by
grouping variable,"
SPSS will run the
"Sort File" command
in the background.
Splitting/Grouping
 The Compare groups and Organize output by groups result in the same
values in the output, regardless of the analysis being performed, but
they differ in the way in which the output is presented.
 Compare groups
Descriptive Statistics
216 15750 58125 26031.92 7558.021
216
258 19650 135000 41441.78 19499.214
258
Current Salary
Valid N (listwise)
Current Salary
Valid N (listwise)
Gender
Female
Male
N Minimum Maximum Mean Std. Deviation
SORT CASES BY gender .
SPLIT FILE
LAYERED BY gender .
Splitting/Grouping
 Organize output by groups
Descriptive Statisticsa
216 15750 58125 26031.92 7558.021
216
Current Salary
Valid N (listwise)
N Minimum Maximum Mean Std. Deviation
Gender = Female
a.
Gender = Female
Descriptive Statisticsa
258 19650 135000 41441.78 19499.214
258
Current Salary
Valid N (listwise)
N Minimum Maximum Mean Std. Deviation
Gender = Male
a.
Gender = Male
SORT CASES BY gender .
SPLIT FILE
SEPARATE BY gender .
datamanipulation_recoding in data analysis.pptx

More Related Content

PPTX
spss presentation complete ( basic intro )
PPTX
DATA VIEW SHEET ,VARIABLE VIEW SHEET.pptx
PPT
5116427.ppt
PPT
introduction to spss
PDF
SPSS Data management SPSS WORKSHOP 2.pdf
PPTX
4. chapter iv(transform)
PDF
lecture14DATASCIENCE AND MACHINE LER.pdf
PPTX
Data entry in Excel and SPSS
spss presentation complete ( basic intro )
DATA VIEW SHEET ,VARIABLE VIEW SHEET.pptx
5116427.ppt
introduction to spss
SPSS Data management SPSS WORKSHOP 2.pdf
4. chapter iv(transform)
lecture14DATASCIENCE AND MACHINE LER.pdf
Data entry in Excel and SPSS

Similar to datamanipulation_recoding in data analysis.pptx (20)

PPT
Excel Datamining Addin Advanced
PPT
Excel Datamining Addin Advanced
PPTX
SPSS PRESENTATION.PPT.pptx
PPT
James Colby Maddox Business Intellignece and Computer Science Portfolio
PPS
Advanced excel unit 01
PPTX
How to process data in SPSS ?
PPTX
Spss by vijay ambast
PPT
Quantitative analysis using SPSS
PPTX
Data Coding and Data Management using SPSS
PPT
Data management through spss
PDF
Defining Data in IBM SPSS Statistics
PPTX
Data mining
PDF
4b6c1c5c-e913-4bbf-b3a4-41e23cb961ba-161004200047.pdf
PPT
Spps training presentation 1
PPT
How_to_Enter_Data_in_SPSS data Analysis.ppt
PDF
Chapter 02-logistic regression
PPT
Introduction to Oracle Functions--(SQL)--Abhishek Sharma
PPT
6967176.ppt
PDF
Set Analyse OK.pdf
PDF
Introduction to spss
Excel Datamining Addin Advanced
Excel Datamining Addin Advanced
SPSS PRESENTATION.PPT.pptx
James Colby Maddox Business Intellignece and Computer Science Portfolio
Advanced excel unit 01
How to process data in SPSS ?
Spss by vijay ambast
Quantitative analysis using SPSS
Data Coding and Data Management using SPSS
Data management through spss
Defining Data in IBM SPSS Statistics
Data mining
4b6c1c5c-e913-4bbf-b3a4-41e23cb961ba-161004200047.pdf
Spps training presentation 1
How_to_Enter_Data_in_SPSS data Analysis.ppt
Chapter 02-logistic regression
Introduction to Oracle Functions--(SQL)--Abhishek Sharma
6967176.ppt
Set Analyse OK.pdf
Introduction to spss
Ad

Recently uploaded (20)

PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Database Infoormation System (DBIS).pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
DOCX
Factor Analysis Word Document Presentation
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
New ISO 27001_2022 standard and the changes
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Introduction to Data Science and Data Analysis
PDF
How to run a consulting project- client discovery
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Business Analytics and business intelligence.pdf
Database Infoormation System (DBIS).pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
SAP 2 completion done . PRESENTATION.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Factor Analysis Word Document Presentation
Acceptance and paychological effects of mandatory extra coach I classes.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
New ISO 27001_2022 standard and the changes
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Data Science and Data Analysis
How to run a consulting project- client discovery
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Qualitative Qantitative and Mixed Methods.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Ad

datamanipulation_recoding in data analysis.pptx

  • 1. Data Manipulation : Transforming, Recoding, and Splitting/Grouping
  • 2. Lecture Overview  Transformation (computing) of Data  Recoding  Splitting/Grouping
  • 3. Transformation (computing) of Data  SPSS has very powerful capabilities for creating new variables as a function of existing variables, for example  To create averages of existing variables  To rescale existing variables  To compute difference scores by subtracting one variable from another
  • 4. Computing A New Variable  TargetVariable - new variable name.  Numeric Expression - defining the new variable, essentially giving SPSS a formula.  The variables in the Numeric Expression need to be either existing variables or numbers.  Specify theType and Label for the newVariable.
  • 5. Conditionally Computing A New Variable  Variables can also be computed conditionally. For instance, if in the above example, you were only interested in the change in salaries for people who began working for the company within the last six years.  If… button (optional case selection condition)
  • 6. Transforming An Existing Variable  For example, the variable jobtime represents months of experience on the job, but we may wish to analyze data in terms of years on the job  Give a new variable name or keep the existing name for theTargetVariable
  • 7. Recoding Variables  Another way to modify the values of existing variables in your dataset. In the Data Editor: Transform -> Recode  Into Same Variables option - changes the values of the existing variables.  Into Different Variables option - create a new variable with the recoded values. (not overwrite your original data)  Both options are essentially the same, except that recoding into a different variable requires you to supply a new variable name.
  • 8. Recoding Variables Categories to Categories For example, the variable jobcat codes an employee's status in three categories, but for a particular analysis you may want to combine two of these classifications into a single category. The original coding was  Clerical = 1 -> 1  Custodial = 2 -> 2  Manager = 3 -> 2
  • 9. Recoding Variables Input variable -> Output variable -> Change bottom -> Define the old and new values
  • 10. jobcat Old New 3 2 2 2 1 1 0 (missing) ?  With System- missing selected  Without System- missing selected
  • 11. Recoding Variables Numeric to Categories  Recode the continuous variable to the categorical variable  Useful in cross-tabulation  Group ranges of the variable into categories For example, we need to code an employee's current salary into two categories as the following Less than or equal to $27,000 (Low) -> 1 More than $27,000 (High) -> 2
  • 13.  How about if there are more than 2 groups?  Where is the value 27,000?
  • 14. Splitting/Grouping  In some situations, you may want to perform the same analysis on different groups within the same dataset.  Analyses such as these can be conducted by first selecting the Split File function from the Data menu in the Data Editor: Data -> Split File...
  • 15. Splitting/Grouping Because the split file command remains in effect indefinitely, you should reset this option when you no longer want a split file analysis. If you select the option to "Sort the file by grouping variable," SPSS will run the "Sort File" command in the background.
  • 16. Splitting/Grouping  The Compare groups and Organize output by groups result in the same values in the output, regardless of the analysis being performed, but they differ in the way in which the output is presented.  Compare groups Descriptive Statistics 216 15750 58125 26031.92 7558.021 216 258 19650 135000 41441.78 19499.214 258 Current Salary Valid N (listwise) Current Salary Valid N (listwise) Gender Female Male N Minimum Maximum Mean Std. Deviation SORT CASES BY gender . SPLIT FILE LAYERED BY gender .
  • 17. Splitting/Grouping  Organize output by groups Descriptive Statisticsa 216 15750 58125 26031.92 7558.021 216 Current Salary Valid N (listwise) N Minimum Maximum Mean Std. Deviation Gender = Female a. Gender = Female Descriptive Statisticsa 258 19650 135000 41441.78 19499.214 258 Current Salary Valid N (listwise) N Minimum Maximum Mean Std. Deviation Gender = Male a. Gender = Male SORT CASES BY gender . SPLIT FILE SEPARATE BY gender .