SlideShare a Scribd company logo
The Analytics & Data Science
Landscape
Philip E. Bourne
peb6a@virginia.edu
Analytics Challenges in Modern Tax Administration
November 16, 2020
Disclaimer
• I pay taxes but typically do not get
it right
• My PhD is actually in physical
chemistry
• I did work for the NIH for 3 years
On a Positive Note
• I have been working with “big” data for many years
• As Dean I am very interested in mapping the
capabilities of our students to the needs of the
workplace
• As a researcher I am concerned that the research our
school undertakes is for societal benefit
This moment in time….
What of the future?
One view is the 6D’s
5
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
From a presentation to the Advisory Board to the NIH Director
Example - photography
6
Everything is Digital Data to be Analyzed…
Play the data science game – pick an
object/subject and you will immediately see a
reason why data science is important …
The Analytics and Data Science Landscape
If I were a tie maker I would be
undertaking a data science analysis right
now…
Large collection of
random images with
metadata before and
during the pandemic
Who is still wearing
ties?
• Age
• Profession
• Ethnicity
• Socioeconomic
status
• Location
• …..
Causality –
Does the pandemic
represent a shift in tie
wearing? If so by how
much?
Prediction –
What will be the market
post COVID?
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
This is a paradigm
shift ..
A Paradigm Shift Reflected in the Workforce
Increased Demand over the Past Five Years
74%
Artificial Intelligence specialists
Top industries hiring this talent: Computer software, internet,
information technology and services, higher education, consumer
electronics
37%
Data Scientist
Top industries hiring this talent: Information technology and
services, computer software, internet, financial services, higher
education
33%
Data Engineer
Top industries hiring this talent: Information technology and
services, internet, computer software, financial services, hospital
and healthcare
How is Academia Responding?
Every University has Some Initiative
Workforce Demand Outweighs
Supply – A Problem for the IRS?
The Rising Demand for Data Scientists
*for graduates seeking employment
100% 100% 100% 98% 97%
UVA School of Data Science
Graduate Job Placement
2019 2018 2017 2016 2015
*
Roles
Machine Learning Engineer, Director of Data
Science, Deep Learning Research Scientist,
Senior Data Analyst, Data Science Developer,
Consultant, Product Data Analyst, Financial
Engineer, Engagement Manager & more
Industries
● Finance
● Government
● Healthcare & Medicine
● Professional Sports
● Commerce
● Media
● Higher Ed
● Technology
Recent Poll of Machine Learning PhD Students
A New School for a New Century
A School Without Walls
Mission
To be a national and international leader in responsible data science
emphasizing interdisciplinary collaboration which results in furthering
discovery, sharing knowledge, and societal benefit
Our Working Definition
• Use of the ever increasing amount of open, complex, diverse
digital data frequently in ubiquitous cloud environments
• Finding ways to ask and then answer relevant questions by
combining such diverse data sets
• Arriving at statistically significant conclusions not otherwise
obtainable
• Sharing such findings in a useful way
• Translating such findings into actions that improve the human
condition
Use Case – Data Integration
Researcher and Assistant Professor of Medicine
Dr. Thomas Hartka, also a current online Masters
in Data Science student, is combining two
disparate data sets—electronic health records and
DMV crash data—to save lives after motor vehicle
crashes.
“I enrolled in the MSDS program to
expand my research on automotive
safety. I have already used
techniques from classes in my work.
I hope to expand my research to
real-time analytics to improve
emergency room care.”
— Dr. Thomas Hartka, UVA School
of Medicine
Guiding Principles
• Excellence
• Integrity
• Diversity
• Openness and transparency
• FAIR data - the ability to Find, Access, Interoperate and Reuse data
• Innovation
• For the social good
• Data/code as first
class citizen – Part
of promotion and
tenure
http://www.ncbi.nlm.nih.gov/pubmed/26207759
Why FAIR?
[Adapted from Carole Goble]
Only 12% of data
from research is
preserved
Infrastructure
Commons - Platform Stack
https://datascience.nih.gov/commons
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Why not more like AirBnB?
https://doi.org/10.1371/journal.pbio.2001818
How and What are We Teaching …
The 4+1 Model
The model is based
on the core insight
that all definitions of
data science assume
a pipeline and that
this pipeline forms a
parallel process
[From Raf Alvarado]
Our Representation of Data Science
The 4+1 Model
• Value – assuring
societal benefit
• Design -
Communication of the
value of data
• Systems – the means
to communicate and
convey benefit
• Analytics – models
and methods
• Practice – where
everything happens
[From Raf Alvarado]
The 4+1 Model Interplay
[From Raf Alvarado]
• Value + Design = Openness,
responsibility
• Value + Analytics = Human
centered AI, algorithmic bias
• Value + Systems =
sustainability, access,
environmental impact
• Design + Analytics = literate
programming, visualization
• Design + Systems =
dashboards, engineering
design
• Analytics + Systems = ML
engineering
The 4+1 Model
27
Integration Practice of DS, Capstones
Analytics Linear Models, Data Mining, Bayesian ML, Deep
Learning, Text Analytics, Foundations of CS
Systems Programming and Systems, Big Data Analytics
Value Ethics of Big Data
Design Practice of DS, Visualization
We strive to build a curriculum that aligns with our model
Distinctive Features
28
 Foundational topics in analytics from linear models to data
mining and machine learning
 An integrated curriculum developed in consultation with
practicing data scientists that incorporates a challenging
capstone experience
 Applications and data drawn from different disciplines, e.g.,
science, business, and health
 Instruction in the best practices in the management and
conduct of data science projects
 Computational methods built on the latest techniques in R and
Python
 Required course in data ethics
 Emphasis on team science — data science is a team sport
Analytics and Machine Learning
29
STAT 6021 Linear Models - Multiple linear regression, logistic
regression. (R)
SYS 6018 Data Mining - Tree-based methods, kernel methods,
unsupervised learning. Uses An Introduction to Statistical Learning
by James, Witten, Hastie and Tibshirani. (R)
SYS 6014 Bayesian Machine Learning - Methods to handle
uncertainty and to apply per variable weight distributions (as
opposed to a single optimal value). (Python)
SYS 6016 Machine Learning - Focuses on neural networks,
including deep learning, convolutional neural networks, recurrent
neural networks, and autoencoders. (Python)
Computer Science
30
CS 5010 Programming and Systems for Data Science -- Python,
Pandas, data analysis at scale and in context, some development
practices.
CS 5021 Foundations of CS -- Data structures, algorithms,
complexity, relational and noSQL databases; "CS in a box."
Data Ethics
31
Virtuous cycle between {computer science,
statistics, applied mathematics} and the
humanities
Exemplified with use cases
It’s a culture not a tick of the box
Computer Science
Statistics
Applied Mathematics
Humanities
Practice and Application of Data
Science
32
DS 6001 and 6003 focus on data design
Flow of data between Human and Machine domains
H → M: Establishing data so that it can be analyzed
M → H: Presenting results of analysis to the world
6001: Data engineering pipeline -- acquiring, cleaning, exploring
6003: Data product development -- presenting, visualizing, app
dev
Electives
33
CS 6160 Theory of Computation
CS 6444 Parallel Computing
CS 6501 Text Mining
CS 6750 Database Systems
DS 5001 Exploratory Text Analytics
DS 6559 Biomedical Cloud Computing Seminar
SARC 5400 Data Visualization
STAT 6250 Longitudinal Data Analysis
STAT 6260 Categorical Data Analysis
SYS 6023 Cognitive Systems Engineering
SYS 6050 Risk Analysis
SYS 6582 Reinforcement Learning
SYS 7001 System and Decision Sciences
Capstones
34
A parallel and culminating experience that focuses on a real world
data science problem
Emphasizes problem definition and scoping
Employ project management
Involves developing, evaluating, and creating a data product for a
client
Requires presentations, a proposal and a published paper (IEEE)
Teams of students work on separate projects under guidance of an
advisor
Furthering Discovery to Build a Better World
RESEARCH
Cybersecurity
Detecting broad-spectrum cyber
threats almost immediately after
they are launched through a $7.6
million Defense Advanced
Research Projects Agency
(DARPA) grant.
Environment
Using NASA data collected aboard the
International Space Station to examine climate
change in the Shenandoah National Forest
and beyond, and find solutions
Health & Medicine
Securing high-performance computing
equipment and personnel to allow
collaboration across the university on brain
science research like Autism, Alzheimer’s,
mental health disorders, traumatic brain
injuries and more.
Business
Discovering what makes a job
interview successful for the
candidate and the recruiter, and
how to mitigate bias in the
recruiting process
Democracy
Investigating how terrorist groups recruit
women through propaganda and examining
risk and threat assessment for extremist
violence perpetrated by women.
Education
Helping economically disadvantaged,
underrepresented populations pursue tailored
educational workforce pathways that have a
higher probability of leading them to success.
Applying Data Science Across Industries
“To tackle challenges in science and medicine.”
— Elizabeth Driskell, MSDS ‘20
“To inform public policy and government.”
— Bradley Katcher, MSDS ‘20
“I want to use data science to find a new way of
thinking.” — Alex Gromadzki, MSDS ‘21
“I want to use data science to solve complex business
problems.” — Ruslan Askerov, MSDS ‘21
“To address poverty and income inequality.”
— Arti Patel, MSDS ‘20
Growing the School
M.S. IN DATA SCIENCE
Residential & Online
2020
2020-2023
UNDERGRADUATE
COURSES
increase to 18
courses per AY
2021
PH.D. PROGRAM
2023
UNDERGRADUATE
MAJOR
Building occupied
Team Size (FTEs)
5
40
60
80
120
Exec. Ed.
SDS and IRS - Actions
• Workforce pipeline - awareness
• Continuing Ed opportunities
• Provision of synthetic data
• Funded and collaborative research
• Faculty, Capstone, Presidential Fellowship, PhD Internships
• IRS Admits – MSDS, PhD
• Join the corporate commons
• ….
QUESTIONS?
peb6a@virginia.edu
@pebourne
SDS Faculty Research
Data Science Faculty member or affiliated
faculty Website Research Interests
Nada Basit
https://engineering.virginia.edu/facul
ty/nada-basit
Machine Learning, Bioinformatics, Data Mining, Pattern
Recognition
Phil Bourne
https://engineering.virginia.edu/facul
ty/philip-e-bourne
Multiscale Modeling Using Data Science Techniques
Early Stage Drug Discovery and Drug Repurposing
Early Stage Drug Methods and Tools for Macromolecular
Don Brown
https://engineering.virginia.edu/facul
ty/donald-e-brown-phd
Data Fusion, Knowledge Discovery, and Simulation
Optimization
Sallie Keller
https://biocomplexity.virginia.edu/sal
lie-keller
social and decision informatics, statistical underpinnings of
data science, and data access and confidentiality.
Daniel Mietchen
https://tools.wmflabs.org/scholia/aut
hor/Q20895785
Computational Biology, Biodiversity integrating research
workflows with the World Wide Web through open
licensing, open standards, and open collaboration via
Rafael Avarado http://transducer.ontoligent.com/
Cultural Analytics and Machine Learning, Digital
Humanities, Text Analysis
Heman Shakeri https://www.hemanshakeri.com/
structure and function of interconnected networks, often
expressed via graphs that comprise a set of nodes and a
set of connections between them.
Jonathan Kropko
https://facultydirectory.virginia.edu/f
aculty/jk8sd
methods to examine historical data, to test theories of
voting in U.S. presidential elections, and to handle
nonresponse in surveys.
Michael Porter
https://engineering.virginia.edu/facul
ty/michael-d-porter
event prediction, pattern and anomaly detection, and data
linkage - applications for Criminology, Transportation,
Terrorism, Defense, Security, Forensics, Business
Mohammad Fallahi-Sichani new hire
designing and building new experimental and
computational tools to enable the analysis, interpretation
and rational modulation of multi-scale processes that
Jack Van Horn
https://scholar.google.com/citations?
user=i9bGqbgAAAAJ&hl=en Psychology and Data Science, Cognitive Neuroscience
Pete Alonzi https://github.com/alonzi
Vicente Ordonez
https://engineering.virginia.edu/facul
ty/vicente-ordonez-roman
Computer Vision, Natural Language Processing and
Machine Learning
Tim Clark
https://scholar.google.com/citations?
user=k-iwlCUAAAAJ&hl=en
next generation approaches for biomedical
communications and data integration, including
semantically integrated data repositories, claims and
Gerard Learmonth
https://www.researchgate.net/profil
e/Gerard_Learmonth
Generation and testing of pseudorandom number
generators; Abstract database design; Strategic
applications of information systems and technology
Hongning Wang http://www.cs.virginia.edu/~hw5x/
data mining, machine learning, and information retrieval,
with a special emphasis on computational user behavior
modelin
Stephen Adams
http://www.nsfcvdi.org/wordpress/c
vdi_personnel/steven-adams-ph-d/
Adaptive Decision Systems Lab at UVA and his research is
applied to several domains including activity recognition,
prognostics and health management for manufacturing
Aidong Zhang
https://engineering.virginia.edu/facul
ty/aidong-zhang ML, Data mining, bioinformatics
Jundong Li http://people.virginia.edu/~jl6qk/
Data Mining, Machine Learning, Social Computing, and
Deep Learning
Brian Wright
https://www.linkedin.com/in/brian-
wright-ph-d-90063027/

More Related Content

PPTX
1. Data Analytics-introduction
PDF
Introduction to data analytics
PPTX
Big Data Analytics
PPTX
Data analytics vs. Data analysis
PPTX
Data science life cycle
PPTX
How different between Big Data, Business Intelligence and Analytics ?
PPTX
Introduction to Data Science
PPTX
Data quality and data profiling
1. Data Analytics-introduction
Introduction to data analytics
Big Data Analytics
Data analytics vs. Data analysis
Data science life cycle
How different between Big Data, Business Intelligence and Analytics ?
Introduction to Data Science
Data quality and data profiling

What's hot (20)

PDF
Feature Engineering in Machine Learning
PDF
Big data Analytics
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
PDF
Data Visualization in Data Science
PPTX
Data analytics
PPTX
Introduction to data analytics
PPTX
Introduction to Data Analytics
PDF
Data integration
PDF
Exploratory Data Analysis in Spark
PPT
Analytics with Descriptive, Predictive and Prescriptive Techniques
PDF
Data Mining Techniques
PPTX
Data analytics
PPTX
Big Data Analytics
PPTX
Data clustring
PPTX
Kdd process
PPTX
3 Data Mining Tasks
PDF
Introduction To Data Science
PDF
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
PDF
Data science presentation
PDF
Data Warehousing
Feature Engineering in Machine Learning
Big data Analytics
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Visualization in Data Science
Data analytics
Introduction to data analytics
Introduction to Data Analytics
Data integration
Exploratory Data Analysis in Spark
Analytics with Descriptive, Predictive and Prescriptive Techniques
Data Mining Techniques
Data analytics
Big Data Analytics
Data clustring
Kdd process
3 Data Mining Tasks
Introduction To Data Science
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
Data science presentation
Data Warehousing
Ad

Similar to The Analytics and Data Science Landscape (20)

PPTX
University of Virginia School of Data Science
PDF
AI for Marking Industry application for.pdf
PDF
Data_Science_Applications_&_Use_Cases.pdf
PPTX
Data_Science_Applications_&_Use_Cases.pptx
PPTX
Data_Science_Applications_&_Use_Cases.pptx
PPTX
Real-time applications of Data Science.pptx
PDF
50YearsDataScience.pdf
PPTX
Biomedical Data Science: We Are Not Alone
PPTX
The UVA School of Data Science
PDF
From Rocket Science to Data Science
PDF
Luciano uvi hackfest.28.10.2020
PPTX
UVA School of Data Science
PDF
iTrain Malaysia: Data Science by Tarun Sukhani
PPTX
Data science and visualization power point
PPTX
DataScienceandVisualization_Mod_1_ppt.pptx
PPTX
What Data Science Will Mean to You - One Person's View
PPTX
One View of Data Science
PPT
data science ppt of emngineering studnets
PPTX
50 Years of Data Science
PDF
Data+Science : A First Course
University of Virginia School of Data Science
AI for Marking Industry application for.pdf
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
Real-time applications of Data Science.pptx
50YearsDataScience.pdf
Biomedical Data Science: We Are Not Alone
The UVA School of Data Science
From Rocket Science to Data Science
Luciano uvi hackfest.28.10.2020
UVA School of Data Science
iTrain Malaysia: Data Science by Tarun Sukhani
Data science and visualization power point
DataScienceandVisualization_Mod_1_ppt.pptx
What Data Science Will Mean to You - One Person's View
One View of Data Science
data science ppt of emngineering studnets
50 Years of Data Science
Data+Science : A First Course
Ad

More from Philip Bourne (20)

PPTX
Your Science Needs You - More Than Ever Before
PPTX
The Biological Data Sustainability Paradox: A Time to Think Differently
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
AI in Medical Education A Meta View to Start a Conversation
PPTX
AI+ Now and Then How Did We Get Here And Where Are We Going
PPTX
Thoughts on Biological Data Sustainability
PPTX
What is FAIR Data and Who Needs It?
PPTX
Data Science Meets Biomedicine, Does Anything Change
PPTX
Data Science Meets Drug Discovery
PPTX
BIMS7100-2023. Social Responsibility in Research
PPTX
AI from the Perspective of a School of Data Science
PPTX
Novo Nordisk 080522.pptx
PPTX
Towards a US Open research Commons (ORC)
PPTX
COVID and Precision Education
PPTX
Cancer Research Meets Data Science — What Can We Do Together?
PPTX
Data Science Meets Open Scholarship – What Comes Next?
PPTX
Data to Advance Sustainability
PPTX
Frontiers of Computing at the Cellular and Molecular Scales
PPTX
Social Responsibility in Research
Your Science Needs You - More Than Ever Before
The Biological Data Sustainability Paradox: A Time to Think Differently
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
AI in Medical Education A Meta View to Start a Conversation
AI+ Now and Then How Did We Get Here And Where Are We Going
Thoughts on Biological Data Sustainability
What is FAIR Data and Who Needs It?
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Drug Discovery
BIMS7100-2023. Social Responsibility in Research
AI from the Perspective of a School of Data Science
Novo Nordisk 080522.pptx
Towards a US Open research Commons (ORC)
COVID and Precision Education
Cancer Research Meets Data Science — What Can We Do Together?
Data Science Meets Open Scholarship – What Comes Next?
Data to Advance Sustainability
Frontiers of Computing at the Cellular and Molecular Scales
Social Responsibility in Research

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Cell Structure & Organelles in detailed.
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Lesson notes of climatology university.
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Institutional Correction lecture only . . .
PPTX
Pharma ospi slides which help in ospi learning
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Cell Types and Its function , kingdom of life
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Basic Mud Logging Guide for educational purpose
Complications of Minimal Access Surgery at WLH
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Cell Structure & Organelles in detailed.
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Microbial diseases, their pathogenesis and prophylaxis
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Sports Quiz easy sports quiz sports quiz
O7-L3 Supply Chain Operations - ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Lesson notes of climatology university.
Microbial disease of the cardiovascular and lymphatic systems
Institutional Correction lecture only . . .
Pharma ospi slides which help in ospi learning
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Cell Types and Its function , kingdom of life
Final Presentation General Medicine 03-08-2024.pptx
Basic Mud Logging Guide for educational purpose

The Analytics and Data Science Landscape

  • 1. The Analytics & Data Science Landscape Philip E. Bourne peb6a@virginia.edu Analytics Challenges in Modern Tax Administration November 16, 2020
  • 2. Disclaimer • I pay taxes but typically do not get it right • My PhD is actually in physical chemistry • I did work for the NIH for 3 years
  • 3. On a Positive Note • I have been working with “big” data for many years • As Dean I am very interested in mapping the capabilities of our students to the needs of the workplace • As a researcher I am concerned that the research our school undertakes is for societal benefit
  • 4. This moment in time….
  • 5. What of the future? One view is the 6D’s 5
  • 6. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - photography 6
  • 7. Everything is Digital Data to be Analyzed… Play the data science game – pick an object/subject and you will immediately see a reason why data science is important …
  • 9. If I were a tie maker I would be undertaking a data science analysis right now… Large collection of random images with metadata before and during the pandemic Who is still wearing ties? • Age • Profession • Ethnicity • Socioeconomic status • Location • ….. Causality – Does the pandemic represent a shift in tie wearing? If so by how much? Prediction – What will be the market post COVID?
  • 11. A Paradigm Shift Reflected in the Workforce Increased Demand over the Past Five Years 74% Artificial Intelligence specialists Top industries hiring this talent: Computer software, internet, information technology and services, higher education, consumer electronics 37% Data Scientist Top industries hiring this talent: Information technology and services, computer software, internet, financial services, higher education 33% Data Engineer Top industries hiring this talent: Information technology and services, internet, computer software, financial services, hospital and healthcare
  • 12. How is Academia Responding?
  • 13. Every University has Some Initiative
  • 14. Workforce Demand Outweighs Supply – A Problem for the IRS?
  • 15. The Rising Demand for Data Scientists *for graduates seeking employment 100% 100% 100% 98% 97% UVA School of Data Science Graduate Job Placement 2019 2018 2017 2016 2015 * Roles Machine Learning Engineer, Director of Data Science, Deep Learning Research Scientist, Senior Data Analyst, Data Science Developer, Consultant, Product Data Analyst, Financial Engineer, Engagement Manager & more Industries ● Finance ● Government ● Healthcare & Medicine ● Professional Sports ● Commerce ● Media ● Higher Ed ● Technology
  • 16. Recent Poll of Machine Learning PhD Students
  • 17. A New School for a New Century A School Without Walls Mission To be a national and international leader in responsible data science emphasizing interdisciplinary collaboration which results in furthering discovery, sharing knowledge, and societal benefit
  • 18. Our Working Definition • Use of the ever increasing amount of open, complex, diverse digital data frequently in ubiquitous cloud environments • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition
  • 19. Use Case – Data Integration Researcher and Assistant Professor of Medicine Dr. Thomas Hartka, also a current online Masters in Data Science student, is combining two disparate data sets—electronic health records and DMV crash data—to save lives after motor vehicle crashes. “I enrolled in the MSDS program to expand my research on automotive safety. I have already used techniques from classes in my work. I hope to expand my research to real-time analytics to improve emergency room care.” — Dr. Thomas Hartka, UVA School of Medicine
  • 20. Guiding Principles • Excellence • Integrity • Diversity • Openness and transparency • FAIR data - the ability to Find, Access, Interoperate and Reuse data • Innovation • For the social good
  • 21. • Data/code as first class citizen – Part of promotion and tenure http://www.ncbi.nlm.nih.gov/pubmed/26207759 Why FAIR? [Adapted from Carole Goble] Only 12% of data from research is preserved
  • 22. Infrastructure Commons - Platform Stack https://datascience.nih.gov/commons Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface Why not more like AirBnB? https://doi.org/10.1371/journal.pbio.2001818
  • 23. How and What are We Teaching …
  • 24. The 4+1 Model The model is based on the core insight that all definitions of data science assume a pipeline and that this pipeline forms a parallel process [From Raf Alvarado]
  • 25. Our Representation of Data Science The 4+1 Model • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [From Raf Alvarado]
  • 26. The 4+1 Model Interplay [From Raf Alvarado] • Value + Design = Openness, responsibility • Value + Analytics = Human centered AI, algorithmic bias • Value + Systems = sustainability, access, environmental impact • Design + Analytics = literate programming, visualization • Design + Systems = dashboards, engineering design • Analytics + Systems = ML engineering
  • 27. The 4+1 Model 27 Integration Practice of DS, Capstones Analytics Linear Models, Data Mining, Bayesian ML, Deep Learning, Text Analytics, Foundations of CS Systems Programming and Systems, Big Data Analytics Value Ethics of Big Data Design Practice of DS, Visualization We strive to build a curriculum that aligns with our model
  • 28. Distinctive Features 28  Foundational topics in analytics from linear models to data mining and machine learning  An integrated curriculum developed in consultation with practicing data scientists that incorporates a challenging capstone experience  Applications and data drawn from different disciplines, e.g., science, business, and health  Instruction in the best practices in the management and conduct of data science projects  Computational methods built on the latest techniques in R and Python  Required course in data ethics  Emphasis on team science — data science is a team sport
  • 29. Analytics and Machine Learning 29 STAT 6021 Linear Models - Multiple linear regression, logistic regression. (R) SYS 6018 Data Mining - Tree-based methods, kernel methods, unsupervised learning. Uses An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani. (R) SYS 6014 Bayesian Machine Learning - Methods to handle uncertainty and to apply per variable weight distributions (as opposed to a single optimal value). (Python) SYS 6016 Machine Learning - Focuses on neural networks, including deep learning, convolutional neural networks, recurrent neural networks, and autoencoders. (Python)
  • 30. Computer Science 30 CS 5010 Programming and Systems for Data Science -- Python, Pandas, data analysis at scale and in context, some development practices. CS 5021 Foundations of CS -- Data structures, algorithms, complexity, relational and noSQL databases; "CS in a box."
  • 31. Data Ethics 31 Virtuous cycle between {computer science, statistics, applied mathematics} and the humanities Exemplified with use cases It’s a culture not a tick of the box Computer Science Statistics Applied Mathematics Humanities
  • 32. Practice and Application of Data Science 32 DS 6001 and 6003 focus on data design Flow of data between Human and Machine domains H → M: Establishing data so that it can be analyzed M → H: Presenting results of analysis to the world 6001: Data engineering pipeline -- acquiring, cleaning, exploring 6003: Data product development -- presenting, visualizing, app dev
  • 33. Electives 33 CS 6160 Theory of Computation CS 6444 Parallel Computing CS 6501 Text Mining CS 6750 Database Systems DS 5001 Exploratory Text Analytics DS 6559 Biomedical Cloud Computing Seminar SARC 5400 Data Visualization STAT 6250 Longitudinal Data Analysis STAT 6260 Categorical Data Analysis SYS 6023 Cognitive Systems Engineering SYS 6050 Risk Analysis SYS 6582 Reinforcement Learning SYS 7001 System and Decision Sciences
  • 34. Capstones 34 A parallel and culminating experience that focuses on a real world data science problem Emphasizes problem definition and scoping Employ project management Involves developing, evaluating, and creating a data product for a client Requires presentations, a proposal and a published paper (IEEE) Teams of students work on separate projects under guidance of an advisor
  • 35. Furthering Discovery to Build a Better World RESEARCH Cybersecurity Detecting broad-spectrum cyber threats almost immediately after they are launched through a $7.6 million Defense Advanced Research Projects Agency (DARPA) grant. Environment Using NASA data collected aboard the International Space Station to examine climate change in the Shenandoah National Forest and beyond, and find solutions Health & Medicine Securing high-performance computing equipment and personnel to allow collaboration across the university on brain science research like Autism, Alzheimer’s, mental health disorders, traumatic brain injuries and more. Business Discovering what makes a job interview successful for the candidate and the recruiter, and how to mitigate bias in the recruiting process Democracy Investigating how terrorist groups recruit women through propaganda and examining risk and threat assessment for extremist violence perpetrated by women. Education Helping economically disadvantaged, underrepresented populations pursue tailored educational workforce pathways that have a higher probability of leading them to success.
  • 36. Applying Data Science Across Industries “To tackle challenges in science and medicine.” — Elizabeth Driskell, MSDS ‘20 “To inform public policy and government.” — Bradley Katcher, MSDS ‘20 “I want to use data science to find a new way of thinking.” — Alex Gromadzki, MSDS ‘21 “I want to use data science to solve complex business problems.” — Ruslan Askerov, MSDS ‘21 “To address poverty and income inequality.” — Arti Patel, MSDS ‘20
  • 37. Growing the School M.S. IN DATA SCIENCE Residential & Online 2020 2020-2023 UNDERGRADUATE COURSES increase to 18 courses per AY 2021 PH.D. PROGRAM 2023 UNDERGRADUATE MAJOR Building occupied Team Size (FTEs) 5 40 60 80 120 Exec. Ed.
  • 38. SDS and IRS - Actions • Workforce pipeline - awareness • Continuing Ed opportunities • Provision of synthetic data • Funded and collaborative research • Faculty, Capstone, Presidential Fellowship, PhD Internships • IRS Admits – MSDS, PhD • Join the corporate commons • ….
  • 40. SDS Faculty Research Data Science Faculty member or affiliated faculty Website Research Interests Nada Basit https://engineering.virginia.edu/facul ty/nada-basit Machine Learning, Bioinformatics, Data Mining, Pattern Recognition Phil Bourne https://engineering.virginia.edu/facul ty/philip-e-bourne Multiscale Modeling Using Data Science Techniques Early Stage Drug Discovery and Drug Repurposing Early Stage Drug Methods and Tools for Macromolecular Don Brown https://engineering.virginia.edu/facul ty/donald-e-brown-phd Data Fusion, Knowledge Discovery, and Simulation Optimization Sallie Keller https://biocomplexity.virginia.edu/sal lie-keller social and decision informatics, statistical underpinnings of data science, and data access and confidentiality. Daniel Mietchen https://tools.wmflabs.org/scholia/aut hor/Q20895785 Computational Biology, Biodiversity integrating research workflows with the World Wide Web through open licensing, open standards, and open collaboration via Rafael Avarado http://transducer.ontoligent.com/ Cultural Analytics and Machine Learning, Digital Humanities, Text Analysis Heman Shakeri https://www.hemanshakeri.com/ structure and function of interconnected networks, often expressed via graphs that comprise a set of nodes and a set of connections between them. Jonathan Kropko https://facultydirectory.virginia.edu/f aculty/jk8sd methods to examine historical data, to test theories of voting in U.S. presidential elections, and to handle nonresponse in surveys. Michael Porter https://engineering.virginia.edu/facul ty/michael-d-porter event prediction, pattern and anomaly detection, and data linkage - applications for Criminology, Transportation, Terrorism, Defense, Security, Forensics, Business Mohammad Fallahi-Sichani new hire designing and building new experimental and computational tools to enable the analysis, interpretation and rational modulation of multi-scale processes that Jack Van Horn https://scholar.google.com/citations? user=i9bGqbgAAAAJ&hl=en Psychology and Data Science, Cognitive Neuroscience Pete Alonzi https://github.com/alonzi Vicente Ordonez https://engineering.virginia.edu/facul ty/vicente-ordonez-roman Computer Vision, Natural Language Processing and Machine Learning Tim Clark https://scholar.google.com/citations? user=k-iwlCUAAAAJ&hl=en next generation approaches for biomedical communications and data integration, including semantically integrated data repositories, claims and Gerard Learmonth https://www.researchgate.net/profil e/Gerard_Learmonth Generation and testing of pseudorandom number generators; Abstract database design; Strategic applications of information systems and technology Hongning Wang http://www.cs.virginia.edu/~hw5x/ data mining, machine learning, and information retrieval, with a special emphasis on computational user behavior modelin Stephen Adams http://www.nsfcvdi.org/wordpress/c vdi_personnel/steven-adams-ph-d/ Adaptive Decision Systems Lab at UVA and his research is applied to several domains including activity recognition, prognostics and health management for manufacturing Aidong Zhang https://engineering.virginia.edu/facul ty/aidong-zhang ML, Data mining, bioinformatics Jundong Li http://people.virginia.edu/~jl6qk/ Data Mining, Machine Learning, Social Computing, and Deep Learning Brian Wright https://www.linkedin.com/in/brian- wright-ph-d-90063027/