SlideShare a Scribd company logo
Predictive Analytics 101:
An overview of how to create a dataset and
model to identify students at risk of attrition
Karen DeSantis
Senior Analyst
Office of Planning, Assessment and Institutional Research
Pace University
Pace’s Inaugural Retention Conference
June 16, 2017
Data Types and Sources
• Demographic
• Economic
• High school specific
• Pace specific
• Dates and deadlines
• Census
• Applications (Pace University and Financial Aid)
• Orientation
– BCSSE (Beginning College Survey of Student Engagement)
– Placement tests
• Historical data
Variables
• Demographic
– Gender, Age, Race, International, Underrepresented Minority
• Economic
– Financial Aid package, Tuition, Unmet need, Grants
• High school specific
– GPA, test scores (SAT, ACT, etc.)
– BCSSE responses, Placement data (from Orientation)
• Pace specific
– School, Campus, Residence, Major, CAP or Honors, Legacy, Athlete
• Dates and commitment
– Deposit Date, Attended orientation
• End of Semester Data:
– Starfish, Event attendance, End of semester GPA
Models
• Identified Dependent variable: Prediction of which students
will leave the University
– One semester (Fall to Spring semesters) – only a small percentage leave
– One year (Fall to Fall semesters) – up to 25% leave
• Gathered historical data for 2013, 2014, and 2015 First Year,
Full Time class cohorts
• Gathered data for the 2016 First Year, Full time cohort
• Data cleaning takes more time than you expect
– Variables may be missing
– Some students did not take BCSSE, SATs or complete FAFSA forms
– Recoding of variables into binary variables (0,1)
– Computing variables to be on a scale rather than absolute
values such as financial aid
Model – Variable selection
• Which variables correlated with the Dependent variable for
the historical data?
o SAT scores
o High School GPA
o Placement scores
o Undecided majors
Analysis
• Binary Logistic analysis
– Binary selected because there are two outcomes: Return or Attrite
• Statistical package selected affects analysis
– SPSS requires all variables to have a value to include a case (student) in the
analysis
• If a case has one variable empty, it will not be included in the SPSS analysis
– Created a binary “Dataset” variable so the analysis was run on the complete
dataset with an Attrition variable (students from 2013 to 2015) and used the
variables for the 2016 students without an Attrition value
• Saved Predicted values
– Analysis provided a predicted value for all students in the model
• Compared predicted values for each of the 2013 to 2015
cohorts to see how well the model fit with the students who
already left
Lists of Students
• Students with the highest predicted value for attrition were
identified for the 2016 cohort
• List of top 500 students was isolated and shared with the
Division of Student Success
• Using financial aid variables as well as the predicted attrition
variable, identified students who had highest financial need
within the 2016 cohort
• List of top 500 students with highest financial need shared
with Financial aid
Assessment of Model
• Identify 2016 cohort students who attrite from Fall to Spring
• Assessed identified students predicted scores from the two
models
• Identifying top predicted students in each cohort year and
comparing attrition rates for the two models
• Comparing top predicted students from 2016 to the top
predicted students attrition rates for the previous years
• Future: After Fall 2017 census, compare attrition of 2016
students who were contacted with attrition of the whole
class.
Outreach Feedback
• Feedback from DSS and Financial Aid
– How many students were actually contacted?
• What were their difficulties contacting some students?
– Comments and suggestions by those who performed the outreach
• Were students already on advisors/counselors radar?
– How outreach was performed and by whom
– What outcomes happened after DSS outreach?
– Did FA outreach result in additional financial aid awards for the
following year?
Next steps
• Remove 2013 data from analysis
– BCSSE data is more complete beginning with the 2014 cohort when it was
included in orientation
• Plans for Fall 2017 cohort
Additional Ideas
• What new variables can we add to the model?
– Grades from Math Courses or first Course in major
– Blackboard engagement
Concerns?
Suggestions?
Questions?
Thank you
Karen DeSantis
kdesantis@pace.edu

More Related Content

PPTX
How any institution can get started on learning analytics
PDF
1710 track1 bagirov
PPTX
G2C Community of Practice Analytics Overview
PDF
Paper planes short ver linkedin
PPTX
How Data Science is Preventing College Dropouts and Advancing Student Success
PDF
Fostering data exploration to achieve enrollment & student success goals
PDF
Non-Traditional Student Enrollment Analytics
PPTX
The Role of Non-Cognitive Indicators in Predictive and Proactive Analytics: T...
How any institution can get started on learning analytics
1710 track1 bagirov
G2C Community of Practice Analytics Overview
Paper planes short ver linkedin
How Data Science is Preventing College Dropouts and Advancing Student Success
Fostering data exploration to achieve enrollment & student success goals
Non-Traditional Student Enrollment Analytics
The Role of Non-Cognitive Indicators in Predictive and Proactive Analytics: T...

Similar to predictive-analytics-101.pptx (20)

PDF
1440 track 3 bagirov_using his laptop
PPTX
In Focus Presentation: Improving retention: predicting at-risk students by an...
PPTX
RallyZ: Session 2
PPTX
Determining Student Return on Investment in College Edu
PDF
1015 track1 bagirov
PDF
Data Driven College Counseling by SchooLinks
PDF
Capstone eLearning Deck
PPTX
Improving Student Achievement with New Approaches to Data
PPTX
Program eval webinar final v2
PDF
Enhancing the Quality of Predictive Modeling on College Enrollement
PPTX
Early Detection of At-Risk Students Using Machine Learning Based on LMS Log Data
PDF
2016 NCAIR Analytics: Reflective to Predictive
PPTX
Precon presentation 2015
PDF
Ohio Education Research Center: Christopher King Keynote Presentation
PPTX
Harnessing Decentralized Data to Improve Advising and Student Success - NASPA...
PPTX
IIM Rohtak Case Study Competition
PDF
Analyzing the College Experience: The Power of Data
PDF
Running the Numbers: Improving Your Position for Enrollment Planning and Fore...
PPTX
Ellen Wagner: Putting Data to Work
PDF
Texas higher ed forecast 2017 to 20259111
1440 track 3 bagirov_using his laptop
In Focus Presentation: Improving retention: predicting at-risk students by an...
RallyZ: Session 2
Determining Student Return on Investment in College Edu
1015 track1 bagirov
Data Driven College Counseling by SchooLinks
Capstone eLearning Deck
Improving Student Achievement with New Approaches to Data
Program eval webinar final v2
Enhancing the Quality of Predictive Modeling on College Enrollement
Early Detection of At-Risk Students Using Machine Learning Based on LMS Log Data
2016 NCAIR Analytics: Reflective to Predictive
Precon presentation 2015
Ohio Education Research Center: Christopher King Keynote Presentation
Harnessing Decentralized Data to Improve Advising and Student Success - NASPA...
IIM Rohtak Case Study Competition
Analyzing the College Experience: The Power of Data
Running the Numbers: Improving Your Position for Enrollment Planning and Fore...
Ellen Wagner: Putting Data to Work
Texas higher ed forecast 2017 to 20259111

Recently uploaded (20)

PPTX
Biomechanics of the Hip - Basic Science.pptx
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
. Radiology Case Scenariosssssssssssssss
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
The Minerals for Earth and Life Science SHS.pptx
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
C1 cut-Methane and it's Derivatives.pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPTX
Pharmacology of Autonomic nervous system
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPT
6.1 High Risk New Born. Padetric health ppt
Biomechanics of the Hip - Basic Science.pptx
Fluid dynamics vivavoce presentation of prakash
. Radiology Case Scenariosssssssssssssss
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
TOTAL hIP ARTHROPLASTY Presentation.pptx
The Minerals for Earth and Life Science SHS.pptx
BODY FLUIDS AND CIRCULATION class 11 .pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
C1 cut-Methane and it's Derivatives.pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Placing the Near-Earth Object Impact Probability in Context
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Pharmacology of Autonomic nervous system
lecture 2026 of Sjogren's syndrome l .pdf
6.1 High Risk New Born. Padetric health ppt

predictive-analytics-101.pptx

  • 1. Predictive Analytics 101: An overview of how to create a dataset and model to identify students at risk of attrition Karen DeSantis Senior Analyst Office of Planning, Assessment and Institutional Research Pace University Pace’s Inaugural Retention Conference June 16, 2017
  • 2. Data Types and Sources • Demographic • Economic • High school specific • Pace specific • Dates and deadlines • Census • Applications (Pace University and Financial Aid) • Orientation – BCSSE (Beginning College Survey of Student Engagement) – Placement tests • Historical data
  • 3. Variables • Demographic – Gender, Age, Race, International, Underrepresented Minority • Economic – Financial Aid package, Tuition, Unmet need, Grants • High school specific – GPA, test scores (SAT, ACT, etc.) – BCSSE responses, Placement data (from Orientation) • Pace specific – School, Campus, Residence, Major, CAP or Honors, Legacy, Athlete • Dates and commitment – Deposit Date, Attended orientation • End of Semester Data: – Starfish, Event attendance, End of semester GPA
  • 4. Models • Identified Dependent variable: Prediction of which students will leave the University – One semester (Fall to Spring semesters) – only a small percentage leave – One year (Fall to Fall semesters) – up to 25% leave • Gathered historical data for 2013, 2014, and 2015 First Year, Full Time class cohorts • Gathered data for the 2016 First Year, Full time cohort • Data cleaning takes more time than you expect – Variables may be missing – Some students did not take BCSSE, SATs or complete FAFSA forms – Recoding of variables into binary variables (0,1) – Computing variables to be on a scale rather than absolute values such as financial aid
  • 5. Model – Variable selection • Which variables correlated with the Dependent variable for the historical data? o SAT scores o High School GPA o Placement scores o Undecided majors
  • 6. Analysis • Binary Logistic analysis – Binary selected because there are two outcomes: Return or Attrite • Statistical package selected affects analysis – SPSS requires all variables to have a value to include a case (student) in the analysis • If a case has one variable empty, it will not be included in the SPSS analysis – Created a binary “Dataset” variable so the analysis was run on the complete dataset with an Attrition variable (students from 2013 to 2015) and used the variables for the 2016 students without an Attrition value • Saved Predicted values – Analysis provided a predicted value for all students in the model • Compared predicted values for each of the 2013 to 2015 cohorts to see how well the model fit with the students who already left
  • 7. Lists of Students • Students with the highest predicted value for attrition were identified for the 2016 cohort • List of top 500 students was isolated and shared with the Division of Student Success • Using financial aid variables as well as the predicted attrition variable, identified students who had highest financial need within the 2016 cohort • List of top 500 students with highest financial need shared with Financial aid
  • 8. Assessment of Model • Identify 2016 cohort students who attrite from Fall to Spring • Assessed identified students predicted scores from the two models • Identifying top predicted students in each cohort year and comparing attrition rates for the two models • Comparing top predicted students from 2016 to the top predicted students attrition rates for the previous years • Future: After Fall 2017 census, compare attrition of 2016 students who were contacted with attrition of the whole class.
  • 9. Outreach Feedback • Feedback from DSS and Financial Aid – How many students were actually contacted? • What were their difficulties contacting some students? – Comments and suggestions by those who performed the outreach • Were students already on advisors/counselors radar? – How outreach was performed and by whom – What outcomes happened after DSS outreach? – Did FA outreach result in additional financial aid awards for the following year?
  • 10. Next steps • Remove 2013 data from analysis – BCSSE data is more complete beginning with the 2014 cohort when it was included in orientation • Plans for Fall 2017 cohort
  • 11. Additional Ideas • What new variables can we add to the model? – Grades from Math Courses or first Course in major – Blackboard engagement Concerns? Suggestions? Questions?