SlideShare a Scribd company logo
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Creating a Data Science Ecosystem for Scientific,
Societal and Educational Impact
İlkay ALTINTAŞ, Ph.D.
Chief Data Science Officer, San Diego Supercomputer Center
Division Director, Cyberinfrastructure Research, Education and Development
Founder and Director, Workflows for Data Science Center of Excellence
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego
Providing Cyberinfrastructure for Research and Education
• Established	as	a	national	supercomputer	
resource	center	in	1985	by	NSF
• A	world	leader	in	HPC,	data-intensive	computing,	
and	scientific	data	management
• Current	strategic	focus	on	“Big	Data”,	“versatile	
computing”,	and	“life	sciences	applications”
Recent Innovative Architectures
• Gordon: First Flash-based
Supercomputer for Data-intensive
Apps
• Comet: Serving the Long Tail of
Science
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Data Science Today is Both a Big Data and a Big Compute Discipline
BIG DATA
COMPUTING AT
SCALE
Enables dynamic data-driven applications
Smart Manufacturing
Computer-Aided Drug Discovery
Personalized Precision Medicine
Smart Cities
Smart Grid and Energy Management
Disaster Resilience and Response
Requires:
• Data management
• Data-driven methods
• Scalable & dynamic
process coordination
• Resource optimization
• Skilled interdisciplinary
workforce
New era of
data science!
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
What is Data Science?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Ultimate Goal
BigData
Insight
Action
Data Science
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How does successful data science happen?
Insight Data Product
“Big” Data
Question
Exploratory
Analysis
and
Modeling
Insight
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Customer
Demographic
Previous
Purchases
Book reviews
What kind of
books does this
customer like?
Book
recommendations
Example: Book Recommendations
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Model of customer’s
book preferences
New book
information
Who is likely to
like this book?
Find Potential Audience for a New Book
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Action to market
the book to the
right audience
Who is likely to
like this book?
Market a New Book
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Action to market
the book to the
right audience
Who is likely to
like this book?
Insight Action
Market a New Book
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Historical data Near real-time data
Prediction
Creating Actionable Information
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Prediction
Action
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Why is the increased interest
in Data Science?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
+
Big Data
Scalable Computing
Anywhere Anytime
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
What is and How Much Data Is Big Data?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
204 Million emails
200,000 photos
1.8 Million likes
2.78 Million video views
72 hours of video uploads
Every minute…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Velocity
Variety
Volume Scalable batch
processing
Stream processing
Extensible data storage,
access and integration
Big Data Characteristics
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Nearly every problem today is
transformed by big data.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Example: Geospatial Big Data
• Flood	of	new	data	sources	and	types
• Needs	new	data	management,	storage	and	analysis	
methods
• Too	big	for	a	single	server,	fast	growing	data	volume
• Requires	special	database	structures	that	can	handle	
data	variety
• Too	continuous	for	analysis	at	a	later	time,	with	
increasing	streaming	rate,	i.e.,	velocity
• Varying	degrees	of	uncertainty	in	measurements,	and	
other	veracity issues
• Provides	opportunities	for	scientific	understanding	at	
different	scales	more	than	ever,	i.e.,	potential	high	value
Real-time sensors
Weather forecast
Satellite imagery
Sea Surface Temperature
Measurements
Drone imagery
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Example: Biomedical Big Data http://nbcr.ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Scientific Big Data By the Numbers…
• HPWREN:	hpwren.ucsd.edu
• 30	TB	of	data	annually	
• MODIS:	modis.gsfc.nasa.gov
• 219	TB	of	data annually
• Precision	Medicine:	Genome	sequence
• 4	EB	(1018 bytes)	of	data	in	2016	(Ref:	www.fastcompany.com)
• LIGO,	Deep	Space	Network,	Protein	Data	Bank,	…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
100 MBs ~= couple of volumes
of Encyclopedias
A DVD ~= 5 GBs
1 TB ~= 300 hours of
good quality video
LHC ~= 15 PBs a year
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Exponential
data growth!
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
1021
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How do we find the connections
and answer questions that
benefit the society?
“We	are	drowning	in	
information	and	
starving	for	knowledge”	
– John	Naisbitt
Source: Megatrends, 1982
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How do we amplify the value of Big Data?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Create an Ecosystem that Enables
Needs and Best Practices
• data-driven
• scalable
• dynamic
• process-driven
• collaborative
• accountable
• reproducible
• interactive
• heterogeneous
• includes many different expertise
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
A Typical Collaborative Data Science Ecosystem
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
ACQUIRE PREPARE ANALYZE REPORT ACT
Approach:
Focus on the Process and Team Work
to Answer a Question
…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
ACQUIRE PREPARE ANALYZE REPORT ACT
Basic Steps
in a Data
Science
Process
• Import	raw	dataset	into	your	analytics	
platform
• Explore	&	Visualize
• Perform	Data	Cleaning
• Feature	Selection
• Model	Selection
• Analyze	the	results
• Present	your	findings
• Use	them
ACQUIRE
PREPARE
ANALYZE
REPORT
ACT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
Process-driven
Solution
Architectures
and the Role of
Workflows
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
…
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
COMMUNICATION AND FEEDBACK
EXPLORATION
SCALABILITY
PROVENANCE
SECURITY
ACQUIRE PREPARE ANALYZE REPORT ACT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
WORKFLOW MANAGEMENT
Application Integration, Coordination, Optimization,
Communication, Reporting
COMPOSABLE DATA SERVICES
Deep Learning, Analytics, HPC, Training, Notebooks
COMPOSABLE SYSTEMS
GPU, CPU, Big Data, Neuromorphic, Networks, Storage, …
PROVENANCE
SECURITY
RESOURCE MANAGEMENT
Kubernetes Container Cloud
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
SOLUTION	ARCHITECTURE
DOMAIN	KNOWLEDGE
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Using dynamic workflows for data
science…
… requires methodology,
research and tool development.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Workflows for Data Science
Center of Excellence at SDSC
Goal: Methodology and tool
development to build automated
and operational workflow-driven
solution architectures on big data
and HPC platforms.
Focus	on	the	
question,	
not	the	
technology!
Real-Time	Hazards	Management
wifire.ucsd.edu
Data-Parallel	Bioinformatics
bioKepler.org
Scalable	Automated	Molecular	Dynamics	and	Drug	Discovery
nbcr.ucsd.edu
WorDS.sdsc.edu
• Access and query data
• Support exploratory design
• Scale computational analysis
• Increase reuse and reproducibility
• Save time, energy and money
• Formalize and standardize
• Train
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Balance of:
• team building
• process management
• performance optimization
• provenance tracking
• training and education
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
While working with experts on…
• data modeling and integration
• data management services
• analytical methods
• communication and visualization
• domain expertise
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How can I get smart people
to collaborate and
communicate?
…to utilize data and computing to
generate insights and solve a question.
Focus	on	the	
question,	
not	the	
technology!
Team Building
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Process Management
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Process for Practice
of Data Science
Workflow
Design
Reporting
Workflow
Monitoring
Workflow
Execution
Workflow
Scheduling
and Execution
Planning
Execution
Review
Provenance
Analysis
Deploy
and
Publish
Programmability
Ease of use, iteration, interaction, re-use, re-purpose
Scalability
From local experiments to large-scale runs
Reproducibility
Ability to validate, re-run, re-play
BUILD
and
EXPLORE
SHARE SCALE
and
ITERATE
LEARN
and
REPORT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Some P’s in PPoDS
Platforms
Process
People
Problem
or
Purpose
?
Programmability
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Metrics for accountability should be
built into the process.
Timeline
Purpose
Expectations
Planning of deliverables
Cost
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Treat Each Step in the Solution Process
as a Conceptual Pod
Pod è sub-process
Defined by:
• Purpose and goal
• Stakeholders
• Expectations
• Key questions to be answered, production/consumption relationships, needs, dependencies, limits, …
• Contracts
• Performance, economic, accuracy, policy, privacy, reproducibility, political, …
• Knowns
• Known unknowns
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Zooming into a simple example…
PREPARE ANALYZE
Data	
Exploration
Schema	
Integration
Query	
Processing
Machine	
Learning
…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
The insights need to be evaluated to
turn them into action.
Platforms
Process
People
Purpose?
Programmability
Metrics Product
Insight
Action
?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Implementation of the actions needs
many things working together.
Process
StakeholdersAutomation
Action
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
GIS
Files
Sensor
NoSQL
Social
Database
Action
The impact of the
actions should be
monitored, measured
and evaluated.
Evaluation
Measure
Monitor
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Evaluation will determine
the next steps.
Favorable
Results?
Revisit?
Further
Opportunities?
Action
Evaluation
Real-time	
Action?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
COORDINATION AND WORKFLOW MANAGEMENT
…
http://kepler-project.org
National	
Resources
(Gordon) (Comet)
(Stampede)(Lonestar)
Cloud	
Resources
Execution Platforms
Local	Cluster	Resources
ACQUIRE PREPARE ANALYZE REPORT ACT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dynamic data-driven coordination
& resource optimization
Requires:
Ability to explore and scale on
multiple platforms
Workflows increasingly becoming the dynamic
operations research tool for science.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Where do we make use of such
capabilities?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Data Science for Social Good
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Smart City and Hazards IoT Applications
• Many	sensed	and	organizational	open	datasets
• Potential	to	improve	public	safety	and	quality	of	life
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How do we Better Predict Wildfire Behavior?
• Wildfires	are	critical	for	ecology,	but	volatile
• Fuel	load	is	high	due	to	fire	suppression	over	the	
last	century
• Drought,	higher	temperatures
• Better	prevention,	prediction	and	maintenance	of	
wildfires	is	needed
Photo of Harris Fire (2007) by former Fire Captain Bill
Clayton
Disaster management of (ongoing) wildfires heavily relies on
understanding their Direction and Rate of Spread (RoS).
Fire	is	Part	of	the	Natural	Ecology….	
…	but	requires	Monitoring,	Prediction	and	Resilience
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
What was lacking is…
a dynamic system integration of
real-time sensor networks, satellite imagery, near-real
time data management tools, wildfire simulation tools,
and connectivity to emergency command centers
.…. before, during and after a firestorm.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Big Data Fire Modeling
Visualization
Monitoring
WIFIRE: A Scalable Data-Driven Monitoring, Dynamic
Prediction and Resilience Cyberinfrastructure for Wildfires
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
High Performance Wireless
Research and Education
Network
FARSITE
http://hpwren.ucsd.edu/cameras
>160 Meteorological Sensors and Growing
Major	success	to	bring	
internet	to	incident	
command	in	the	field.	Used	
in	over	20	fires	over	time.
Most	popular	
operational	fire	
behavior	
modeling	system.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Closing the Loop using Big Data
-- Wildfire Behavior Modeling and Data Assimilation --
• Computational	costs	for	existing	
models	too	high	for	real-time	
analysis
• a	priori ->	a	posteriori
• Parameter	estimation	to	make	
adjustments	to	the	(input)	parameters	
• State	estimation	to	adjust	the	
simulated	fire	front	location	with	an	a	
posteriori	update/measurement	of	the	
actual	fire	front	location	Conceptual Data Assimilation Workflow with
Prediction and Update Steps using Sensor Data
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Fire Modeling Workflows in WIFIRE
Real-time sensors
Weather forecast
Fire perimeter
Landscape data
Monitoring &
fire mapping
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Firemap Tool
• A	web-based	GIS	
environment:
• access	information	
related	to	fire	behavior	
• analyze	what-if	
scenarios
• model	real-time	fire	
behavior
• generate	reports
• Powered	by	WIFIRE
Firemap	
Web	Interface
WIFIRE	Data	Interfaces WIFIRE		Workflows
Computing	Infrastructure
http://firemap.sdsc.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Data-Driven Fire Progression
Prediction Over Three Hours
Collaboration with LA and
SD Fire Departments
http://firemap.sdsc.edu
August 2016 – Blue Cut Fire
Tahoe	and	Nevada	Bureau	
of	Land	Management	
Cameras: 20	cameras	added	
with	field-of-view
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Northern CA Fires 10/09/17 through now…
300K+	unique	visitors	and	~3M	hits	in	5	days	
http://firemap.sdsc.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Some Machine Learning Case Studies
• Smoke	and	fire	perimeter	detection	based	on	imagery
• Prediction	of	Santa	Ana	and	fire	conditions	specific	to	location
• Prediction	of	fuel	build	up	based	on	fire	and	weather	history
• NLP	for	understanding	local	conditions	based	on	radio	
communications
• Deep	learning	on	multi-spectra	imagery	for	high	resolution	fuel	maps
• Classification	project	to	generate	more	accurate	fuel	maps	(using	
Planet	Labs	satellite	data)
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Classification project to generate more
accurate fuel maps
• Accurate	and	up-to-date	fuel	maps	are	critical	for	
modeling	wildfire	rate	of	speed	and	potential	burn	
areas.
• Challenge:	
• USGS	Landfire provides	the	best	available	fuel	maps	
every	two	years.	
• The	WIFIRE	system	is	limited	by	these	potentially	2-year	
old	inputs.	 Fuel	maps	created	at	a	higher	temporal	
frequency	is	desired.
• Approach:	
• Using	high-resolution	satellite	imagery	and	deep	
learning	methods,	produce	surface	fuel	maps	of	San	
Diego	County	and	other	regions	in	Southern	California.
• Use	LandFire fuel	maps	as	the	target	variable,	the	
objective	is	create	a	classification	model	that	will	
provide	fuel	maps	at	greater	frequency	with	a	measure	
of	uncertainty.
Cluster 1: Short Grass
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
WIFIRE Team: It takes a village!
• PhD	level	researchers	
• Professional	software	
developers
• 27	undergraduate	students
• UC	San	Diego
• UC	Merced
• MURPA	University
• University	of	Queensland	
• 1	high	school	student
• 5	MSc	and	5	MAS	students
• 2	PhD	students	(UMD)
• 1	postdoctoral	researcher
UMD - Fire modeling
UCSD MAE - Data assimilation
SDSC -
Cyberinfrastructure,
Workflows,
Data engineering,
Machine Learning,
Information Visualization,
HPWREN
Calit2/QI-
Cyberinfrastructure, GIS,
Advanced Visualization,
Machine Learning,
Urban Sustainability,
HPWREN
SIO - HPWREN
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Process for Precision Education
• How	are	the	students	performing?
• What	does	a	drop	out	process	really	start?	
What	are	early	signs?
• How	many	students	do	we	expect	for	a	
subject	next	year?	What	are	the	trends?	
• When	will	a	student	graduate?	
• What	are	personalized	learning	paths?
• When	is	the	best	time	to	take	a	course	to	
graduate	on	time?	
• How	does	the	curriculum	serve	the	local	
economy	and	workforce?
Some	
Questions
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Parts of the Solution
• Stakeholders
• Datasets
• Compliance	requirements
• Defined	actions
• Analytical	methods
• Technical	infrastructure
Bias
Transparency	
Verification
Accuracy
Ethics
Reproducibility
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Contact:		Ilkay	Altintas,	Ph.D.
Email:	ialtintas@ucsd.edu
Questions?
PartsofthepresentedworkisfundedbyNSF,DOE,
NIH,UCSanDiegoandvariousindustrypartners.

More Related Content

PDF
GTU GeekDay Data Science and Applications
DOCX
Self Study Business Approach to DS_01022022.docx
PPTX
Adding Open Data Value to 'Closed Data' Problems
PPTX
Data Science applications in business
PDF
Introduction to Data Science
PPTX
Introduction to data science
PDF
Programming for data science in python
PDF
Data Science in Action
GTU GeekDay Data Science and Applications
Self Study Business Approach to DS_01022022.docx
Adding Open Data Value to 'Closed Data' Problems
Data Science applications in business
Introduction to Data Science
Introduction to data science
Programming for data science in python
Data Science in Action

What's hot (20)

PDF
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
PPTX
Intro to Data Science by DatalentTeam at Data Science Clinic#11
PDF
Data Science
PPTX
Data science applications and usecases
PPTX
Introduction to Data Science by Datalent Team @Data Science Clinic #9
PDF
Unit 3 part 2
PDF
Introduction To Data Science
PDF
Data science
PPTX
Introduction to Big Data/Machine Learning
PDF
Data science presentation
PDF
1. introduction to data science —
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
PDF
Introduction to data science intro,ch(1,2,3)
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
PDF
Data science presentation 2nd CI day
PPTX
Session 01 designing and scoping a data science project
PPTX
A Practical-ish Introduction to Data Science
PPTX
Big data and data science overview
PPTX
Data science | What is Data science
PDF
Open Data, Big Data and Machine Learning
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Data Science
Data science applications and usecases
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Unit 3 part 2
Introduction To Data Science
Data science
Introduction to Big Data/Machine Learning
Data science presentation
1. introduction to data science —
Big Data and Data Science: The Technologies Shaping Our Lives
Introduction to data science intro,ch(1,2,3)
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Data science presentation 2nd CI day
Session 01 designing and scoping a data science project
A Practical-ish Introduction to Data Science
Big data and data science overview
Data science | What is Data science
Open Data, Big Data and Machine Learning
Ad

Similar to Creating a Data Science Ecosystem for Scientific, Societal and Educational Impact (20)

PDF
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
PDF
Collaborative Data Science In A Highly Networked World
PDF
Intro to Data Science for Non-Data Scientists
PPTX
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
PDF
BioStorage Technologies Case Study: How to build an informatics platform usin...
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PDF
Understanding What’s Possible: Getting Business Value from Big Data Quickly
PDF
What is a Data Scientist
PPTX
Data Science Intro.pptx
PPTX
Introduction to Data Science.pptx
PPTX
Rethink Analytics with an Enterprise Data Hub
PPTX
Make data simple in the cognitive era
PDF
Taming the Big Data Beast - Together
PPTX
20160414 23 Research Data Things
PDF
Bridging Big Data and Data Science Using Scalable Workflows
PPTX
Data Science and AI in Biomedicine: The World has Changed
PDF
The Strategic Vision: Visualizing Data From Multiple Sources
PPTX
JavaZone 2018 - A Practical(ish) Introduction to Data Science
PDF
What Managers Need to Know about Data Science
PPTX
Introduction to Data Science
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Collaborative Data Science In A Highly Networked World
Intro to Data Science for Non-Data Scientists
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
BioStorage Technologies Case Study: How to build an informatics platform usin...
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Understanding What’s Possible: Getting Business Value from Big Data Quickly
What is a Data Scientist
Data Science Intro.pptx
Introduction to Data Science.pptx
Rethink Analytics with an Enterprise Data Hub
Make data simple in the cognitive era
Taming the Big Data Beast - Together
20160414 23 Research Data Things
Bridging Big Data and Data Science Using Scalable Workflows
Data Science and AI in Biomedicine: The World has Changed
The Strategic Vision: Visualizing Data From Multiple Sources
JavaZone 2018 - A Practical(ish) Introduction to Data Science
What Managers Need to Know about Data Science
Introduction to Data Science
Ad

More from Ilkay Altintas, Ph.D. (6)

PDF
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
PDF
Using Cyberinfrastructure for Wildfire Resilience
PDF
Using Cyberinfrastructure for Wildfire Resilience
PDF
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
PDF
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
PDF
Invited Talk for EUDAT Workshop in Barcelona
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Using Cyberinfrastructure for Wildfire Resilience
Using Cyberinfrastructure for Wildfire Resilience
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
Invited Talk for EUDAT Workshop in Barcelona

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Global journeys: estimating international migration
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Lecture1 pattern recognition............
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Introduction to Business Data Analytics.
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
Global journeys: estimating international migration
Supervised vs unsupervised machine learning algorithms
Major-Components-ofNKJNNKNKNKNKronment.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx
.pdf is not working space design for the following data for the following dat...
Galatica Smart Energy Infrastructure Startup Pitch Deck
Lecture1 pattern recognition............
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Reliability_Chapter_ presentation 1221.5784
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Miokarditis (Inflamasi pada Otot Jantung)
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Moving the Public Sector (Government) to a Digital Adoption
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Business Data Analytics.

Creating a Data Science Ecosystem for Scientific, Societal and Educational Impact

  • 1. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Creating a Data Science Ecosystem for Scientific, Societal and Educational Impact İlkay ALTINTAŞ, Ph.D. Chief Data Science Officer, San Diego Supercomputer Center Division Director, Cyberinfrastructure Research, Education and Development Founder and Director, Workflows for Data Science Center of Excellence
  • 2. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego Providing Cyberinfrastructure for Research and Education • Established as a national supercomputer resource center in 1985 by NSF • A world leader in HPC, data-intensive computing, and scientific data management • Current strategic focus on “Big Data”, “versatile computing”, and “life sciences applications” Recent Innovative Architectures • Gordon: First Flash-based Supercomputer for Data-intensive Apps • Comet: Serving the Long Tail of Science
  • 3. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Data Science Today is Both a Big Data and a Big Compute Discipline BIG DATA COMPUTING AT SCALE Enables dynamic data-driven applications Smart Manufacturing Computer-Aided Drug Discovery Personalized Precision Medicine Smart Cities Smart Grid and Energy Management Disaster Resilience and Response Requires: • Data management • Data-driven methods • Scalable & dynamic process coordination • Resource optimization • Skilled interdisciplinary workforce New era of data science!
  • 4. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu What is Data Science?
  • 5. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Ultimate Goal BigData Insight Action Data Science
  • 6. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu How does successful data science happen? Insight Data Product “Big” Data Question Exploratory Analysis and Modeling Insight
  • 7. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Customer Demographic Previous Purchases Book reviews What kind of books does this customer like? Book recommendations Example: Book Recommendations
  • 8. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Model of customer’s book preferences New book information Who is likely to like this book? Find Potential Audience for a New Book
  • 9. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Action to market the book to the right audience Who is likely to like this book? Market a New Book
  • 10. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Action to market the book to the right audience Who is likely to like this book? Insight Action Market a New Book
  • 11. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Historical data Near real-time data Prediction Creating Actionable Information
  • 12. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Prediction Action
  • 13. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Why is the increased interest in Data Science?
  • 14. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu + Big Data Scalable Computing Anywhere Anytime
  • 15. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu What is and How Much Data Is Big Data?
  • 16. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu 204 Million emails 200,000 photos 1.8 Million likes 2.78 Million video views 72 hours of video uploads Every minute…
  • 17. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Velocity Variety Volume Scalable batch processing Stream processing Extensible data storage, access and integration Big Data Characteristics
  • 18. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Nearly every problem today is transformed by big data.
  • 19. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Example: Geospatial Big Data • Flood of new data sources and types • Needs new data management, storage and analysis methods • Too big for a single server, fast growing data volume • Requires special database structures that can handle data variety • Too continuous for analysis at a later time, with increasing streaming rate, i.e., velocity • Varying degrees of uncertainty in measurements, and other veracity issues • Provides opportunities for scientific understanding at different scales more than ever, i.e., potential high value Real-time sensors Weather forecast Satellite imagery Sea Surface Temperature Measurements Drone imagery
  • 20. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Example: Biomedical Big Data http://nbcr.ucsd.edu
  • 21. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Scientific Big Data By the Numbers… • HPWREN: hpwren.ucsd.edu • 30 TB of data annually • MODIS: modis.gsfc.nasa.gov • 219 TB of data annually • Precision Medicine: Genome sequence • 4 EB (1018 bytes) of data in 2016 (Ref: www.fastcompany.com) • LIGO, Deep Space Network, Protein Data Bank, …
  • 22. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu 100 MBs ~= couple of volumes of Encyclopedias A DVD ~= 5 GBs 1 TB ~= 300 hours of good quality video LHC ~= 15 PBs a year
  • 23. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Exponential data growth!
  • 24. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu 1021
  • 25. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu How do we find the connections and answer questions that benefit the society? “We are drowning in information and starving for knowledge” – John Naisbitt Source: Megatrends, 1982
  • 26. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu How do we amplify the value of Big Data?
  • 27. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Create an Ecosystem that Enables Needs and Best Practices • data-driven • scalable • dynamic • process-driven • collaborative • accountable • reproducible • interactive • heterogeneous • includes many different expertise
  • 28. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu A Typical Collaborative Data Science Ecosystem
  • 29. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu ACQUIRE PREPARE ANALYZE REPORT ACT Approach: Focus on the Process and Team Work to Answer a Question …
  • 30. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu ACQUIRE PREPARE ANALYZE REPORT ACT Basic Steps in a Data Science Process • Import raw dataset into your analytics platform • Explore & Visualize • Perform Data Cleaning • Feature Selection • Model Selection • Analyze the results • Present your findings • Use them ACQUIRE PREPARE ANALYZE REPORT ACT
  • 31. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu COORDINATION AND WORKFLOW MANAGEMENT DATA INTEGRATION AND PROCESSING DATA MANAGEMENT AND STORAGE Process-driven Solution Architectures and the Role of Workflows
  • 32. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu … COORDINATION AND WORKFLOW MANAGEMENT DATA INTEGRATION AND PROCESSING DATA MANAGEMENT AND STORAGE COMMUNICATION AND FEEDBACK EXPLORATION SCALABILITY PROVENANCE SECURITY ACQUIRE PREPARE ANALYZE REPORT ACT
  • 33. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu WORKFLOW MANAGEMENT Application Integration, Coordination, Optimization, Communication, Reporting COMPOSABLE DATA SERVICES Deep Learning, Analytics, HPC, Training, Notebooks COMPOSABLE SYSTEMS GPU, CPU, Big Data, Neuromorphic, Networks, Storage, … PROVENANCE SECURITY RESOURCE MANAGEMENT Kubernetes Container Cloud
  • 34. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu SOLUTION ARCHITECTURE DOMAIN KNOWLEDGE
  • 35. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Using dynamic workflows for data science… … requires methodology, research and tool development.
  • 36. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Workflows for Data Science Center of Excellence at SDSC Goal: Methodology and tool development to build automated and operational workflow-driven solution architectures on big data and HPC platforms. Focus on the question, not the technology! Real-Time Hazards Management wifire.ucsd.edu Data-Parallel Bioinformatics bioKepler.org Scalable Automated Molecular Dynamics and Drug Discovery nbcr.ucsd.edu WorDS.sdsc.edu • Access and query data • Support exploratory design • Scale computational analysis • Increase reuse and reproducibility • Save time, energy and money • Formalize and standardize • Train
  • 37. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Balance of: • team building • process management • performance optimization • provenance tracking • training and education
  • 38. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu While working with experts on… • data modeling and integration • data management services • analytical methods • communication and visualization • domain expertise
  • 39. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu How can I get smart people to collaborate and communicate? …to utilize data and computing to generate insights and solve a question. Focus on the question, not the technology! Team Building
  • 40. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Process Management
  • 41. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Process for Practice of Data Science Workflow Design Reporting Workflow Monitoring Workflow Execution Workflow Scheduling and Execution Planning Execution Review Provenance Analysis Deploy and Publish Programmability Ease of use, iteration, interaction, re-use, re-purpose Scalability From local experiments to large-scale runs Reproducibility Ability to validate, re-run, re-play BUILD and EXPLORE SHARE SCALE and ITERATE LEARN and REPORT
  • 42. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Some P’s in PPoDS Platforms Process People Problem or Purpose ? Programmability
  • 43. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Metrics for accountability should be built into the process. Timeline Purpose Expectations Planning of deliverables Cost
  • 44. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Treat Each Step in the Solution Process as a Conceptual Pod Pod è sub-process Defined by: • Purpose and goal • Stakeholders • Expectations • Key questions to be answered, production/consumption relationships, needs, dependencies, limits, … • Contracts • Performance, economic, accuracy, policy, privacy, reproducibility, political, … • Knowns • Known unknowns
  • 45. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Zooming into a simple example… PREPARE ANALYZE Data Exploration Schema Integration Query Processing Machine Learning …
  • 46. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu The insights need to be evaluated to turn them into action. Platforms Process People Purpose? Programmability Metrics Product Insight Action ?
  • 47. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Implementation of the actions needs many things working together. Process StakeholdersAutomation Action
  • 48. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu GIS Files Sensor NoSQL Social Database Action The impact of the actions should be monitored, measured and evaluated. Evaluation Measure Monitor
  • 49. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Evaluation will determine the next steps. Favorable Results? Revisit? Further Opportunities? Action Evaluation Real-time Action?
  • 50. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu COORDINATION AND WORKFLOW MANAGEMENT … http://kepler-project.org National Resources (Gordon) (Comet) (Stampede)(Lonestar) Cloud Resources Execution Platforms Local Cluster Resources ACQUIRE PREPARE ANALYZE REPORT ACT
  • 51. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dynamic data-driven coordination & resource optimization Requires: Ability to explore and scale on multiple platforms Workflows increasingly becoming the dynamic operations research tool for science.
  • 52. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Where do we make use of such capabilities?
  • 53. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Data Science for Social Good
  • 54. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Smart City and Hazards IoT Applications • Many sensed and organizational open datasets • Potential to improve public safety and quality of life
  • 55. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu How do we Better Predict Wildfire Behavior? • Wildfires are critical for ecology, but volatile • Fuel load is high due to fire suppression over the last century • Drought, higher temperatures • Better prevention, prediction and maintenance of wildfires is needed Photo of Harris Fire (2007) by former Fire Captain Bill Clayton Disaster management of (ongoing) wildfires heavily relies on understanding their Direction and Rate of Spread (RoS). Fire is Part of the Natural Ecology…. … but requires Monitoring, Prediction and Resilience
  • 56. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu What was lacking is… a dynamic system integration of real-time sensor networks, satellite imagery, near-real time data management tools, wildfire simulation tools, and connectivity to emergency command centers .…. before, during and after a firestorm.
  • 57. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu
  • 58. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Big Data Fire Modeling Visualization Monitoring WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires
  • 59. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu High Performance Wireless Research and Education Network FARSITE http://hpwren.ucsd.edu/cameras >160 Meteorological Sensors and Growing Major success to bring internet to incident command in the field. Used in over 20 fires over time. Most popular operational fire behavior modeling system.
  • 60. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Closing the Loop using Big Data -- Wildfire Behavior Modeling and Data Assimilation -- • Computational costs for existing models too high for real-time analysis • a priori -> a posteriori • Parameter estimation to make adjustments to the (input) parameters • State estimation to adjust the simulated fire front location with an a posteriori update/measurement of the actual fire front location Conceptual Data Assimilation Workflow with Prediction and Update Steps using Sensor Data
  • 61. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Fire Modeling Workflows in WIFIRE Real-time sensors Weather forecast Fire perimeter Landscape data Monitoring & fire mapping
  • 62. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Firemap Tool • A web-based GIS environment: • access information related to fire behavior • analyze what-if scenarios • model real-time fire behavior • generate reports • Powered by WIFIRE Firemap Web Interface WIFIRE Data Interfaces WIFIRE Workflows Computing Infrastructure http://firemap.sdsc.edu
  • 63. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Data-Driven Fire Progression Prediction Over Three Hours Collaboration with LA and SD Fire Departments http://firemap.sdsc.edu August 2016 – Blue Cut Fire Tahoe and Nevada Bureau of Land Management Cameras: 20 cameras added with field-of-view
  • 64. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Northern CA Fires 10/09/17 through now… 300K+ unique visitors and ~3M hits in 5 days http://firemap.sdsc.edu
  • 65. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Some Machine Learning Case Studies • Smoke and fire perimeter detection based on imagery • Prediction of Santa Ana and fire conditions specific to location • Prediction of fuel build up based on fire and weather history • NLP for understanding local conditions based on radio communications • Deep learning on multi-spectra imagery for high resolution fuel maps • Classification project to generate more accurate fuel maps (using Planet Labs satellite data)
  • 66. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Classification project to generate more accurate fuel maps • Accurate and up-to-date fuel maps are critical for modeling wildfire rate of speed and potential burn areas. • Challenge: • USGS Landfire provides the best available fuel maps every two years. • The WIFIRE system is limited by these potentially 2-year old inputs. Fuel maps created at a higher temporal frequency is desired. • Approach: • Using high-resolution satellite imagery and deep learning methods, produce surface fuel maps of San Diego County and other regions in Southern California. • Use LandFire fuel maps as the target variable, the objective is create a classification model that will provide fuel maps at greater frequency with a measure of uncertainty. Cluster 1: Short Grass
  • 67. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu WIFIRE Team: It takes a village! • PhD level researchers • Professional software developers • 27 undergraduate students • UC San Diego • UC Merced • MURPA University • University of Queensland • 1 high school student • 5 MSc and 5 MAS students • 2 PhD students (UMD) • 1 postdoctoral researcher UMD - Fire modeling UCSD MAE - Data assimilation SDSC - Cyberinfrastructure, Workflows, Data engineering, Machine Learning, Information Visualization, HPWREN Calit2/QI- Cyberinfrastructure, GIS, Advanced Visualization, Machine Learning, Urban Sustainability, HPWREN SIO - HPWREN
  • 68. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Process for Precision Education • How are the students performing? • What does a drop out process really start? What are early signs? • How many students do we expect for a subject next year? What are the trends? • When will a student graduate? • What are personalized learning paths? • When is the best time to take a course to graduate on time? • How does the curriculum serve the local economy and workforce? Some Questions
  • 69. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Parts of the Solution • Stakeholders • Datasets • Compliance requirements • Defined actions • Analytical methods • Technical infrastructure Bias Transparency Verification Accuracy Ethics Reproducibility
  • 70. Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Dr. ILKAY ALTINTAS ialtintas@ucsd.edu Contact: Ilkay Altintas, Ph.D. Email: ialtintas@ucsd.edu Questions? PartsofthepresentedworkisfundedbyNSF,DOE, NIH,UCSanDiegoandvariousindustrypartners.