Pistoia Alliance Demystifying AI & ML part 2

21 June, 2018
Demystifying AI – Part 2
An Introduction to AI in Life Sciences
Pistoia Alliance Centre of Excellence for AI in Life Sciences and Health
Prashant Natarajan (@BigDataCXO)
Moderator: Nick Lynch

This webinar is being recorded

Poll Question 1:
How much is your organisation planning to
increase investment into AI/ML* in the next
2-3 years? (tools/platform, people etc)
A. 0-25%
B. >25% - 50%
C. >50% - 75%
D. >75% - 100%
E. Not sure AI* = including machine
learning/deep learning/chat
bots)

©PistoiaAlliance
Webinars: AI in Life Sciences – Q2/Q3 2018
Pistoia Alliance Membership Introduction
4
• Webinar 1 (23 May 2018) Prashant Natarajan
– A Brief History
– Big Data/ML/DL/AI - fundamentals and concepts
– Data Fidelity & NFR Framework
– Best Practices from the Trenches
– Q&A
• Webinar 2: 21 June 2018 Prashant Natarajan
– Big Data Analytics & AI - 2 sides of the same coin
– A guided tour of learning algorithms for Healthcare
– Real-life use cases in health & life sciences from the book Q & A
– AI Solutions - Going Beyond Algorithms
– Q & A
• Webinar 3: July 2018 – (panel)
– Real World Evidence, the Big Data Connection
– The 3 P’s of RWE: Persons, Providers, and Pharma
• Webinar 4
– State of the Art in AI with working examples
• Etc – monthly
Like to give
a talk or
panel?
Boston
Community
Workshop
Oct 2018

©PistoiaAlliance
Poll Question 2:
Are you/is your organisation currently
looking to hire additional AI/ML* experts
or retrain existing staff?
A. Yes, now or soon
B. Yes in the next 12 months
C. Yes but later than 12 months
D. No
E. Don’t Know
AI* = including machine learning/deep learning/chat
bots etc)

Prashant Natarajan
• Senior Director of AI Applications at H2O.ai, Mountain View, CA, USA (www.h2o.ai)
• Undergraduate degree in Chemical Engineering; Master’s in Technical
Communications & Linguistics; PhD courses in Logic & Cognitive Psychology; AT&T-
Yahoo Chancellor’s Fellow
• 18+ years in health sciences industry – providers, pharma, payers, patients
• H2O.ai; Oracle Health Sciences; McKesson; Healthways; Siemens
• Lead author or contributor to books on big data analytics, business intelligence,
cancer, machine learning, AI (best-sellers in 2012, 2017, 2018)
• Co-Faculty Instructor, Stanford University School of Medicine, Palo Alto, CA
• Industry Advisor, CA Initiative to Advance Precision Medicine/San Francisco VA
@BigDataCXO | prashant.natarajan@gmail.com | www.BigDataCXO.com

©PistoiaAlliance
Agenda
721 June, 2018
• Considerations for Life Sciences
• ML 102
• TIE – Interpretability & Explainability
• Conversational AI: Bot Basics
• Q & A

©PistoiaAlliance
Consideration for Life Sciences
821 June, 2018
• Regulations and policy
• Innovation in a regulated environment
• TIE it up
• Organization and structural challenges in Life Sciences
• Resourcing
• Data fidelity and labeling
• MDM is critical as is data governance
• Ethics and privacy – human and machine morality are
not the same. Does a machine have morals?
• Clear demarcation or sharing of human & machine-
learning/CIA responsibilities when failure happens

Machine Learning 102
Mastering the Basics
Sources:
www.H2O.ai Driverless AI overview
“Demystifying Big Data and Machine Learning for Healthcare” (Taylor & Francis, 2017), Natarajan et al.
“Principles of Data Wrangling” (O’Reilly, 2017), Rattenbury et al.
AWS Sagemaker Developer Guide
Prashant Natarajan

©PistoiaAlliance
Typical Enterprise Machine Learning
Workflow
ModelModel
Building
Features
Target
Modeling
Table
Data Quality
& Transformation
Data
Integration
+
Driverless AI
Copyright 2018 H2O.ai Inc. All rights

©PistoiaAlliance
ML Workflows: from Data to Deployment

©PistoiaAlliance
Data Preparation & Wrangling
1221 June, 2018
• Ingest Data from RDBMS, files, distributed DBs, etc –
describe data - assess data utility
• Create & manage metadata
• Profile data – grain, structure, data fidelity, temporality,
scope
• Pre-visualization and outlier analysis
• Refine data – mastering, structuring (changing form or
schema), enriching (adding new info via joins, unions,
derived data), transforming (cleansing, addressing
missing/invalid values)
• Create production data for training and use/build
automated ML systems to process all the way to the
scoring pipeline (or) visualization

©PistoiaAlliance
Training & Scoring in H2O’s Driverless AI
"Confidential and property of H2O.ai. All rights reserved"
Data
Processing
Model
Tuning
Feature
Engineering
Final Model
Training
Scoring
Pipeline

©PistoiaAlliance
Deployment & Tracking
1421 June, 2018
• Monitor Ongoing Performance - How will you monitor
the performance of your algorithm on an ongoing
basis? Data drifts and systems evolve.
• Look for ability to connect to your existing visualization
– verify interpretability – make it easy for data
scientists/IT/business to collaborate via results and
code
• Keep Track Of Your Model Changes - Always track the
revision of your model and report it with your results.
As you improve different parts of your data analytics
pipeline, you will want to go back and re-analyze data.
Recording which model was used at which time helps
you understand what to recalculate.

©PistoiaAlliance
Interpretability
Why/Why not?
Prashant Natarajan

©PistoiaAlliance
Interpretability
*Source: https://christophm.github.io/interpretable-ml-book/interpretability-importance.html 1721 June, 2018
TIE: Interpretability is the degree to which a human can
understand the cause of a decision (Miller 2017)*
• If the ML model performs well, can’t we just trust it?
• “The problem is a single metric, such as classification
accuracy, is an incomplete description of most real-
world tasks” (Doshi-Velez & Kim)
• What v why/how of predictions: knowing the “why” can
help you understand more about the problem, data,
biases, leaks, debug/audit, and why a model might fail
• Facilitate learning and satisfy human curiosity
• The model becomes the source of insights and
knowledge – not just the raw data. Hence,
interpretability becomes important
• Interpretability is not the same as explainability

©PistoiaAlliance
Interpretability
Source: https://christophm.github.io/interpretable-ml-book/interpretability-importance.html 1821 June, 2018
If the ML model is interpretable and explainable, we can
check for the following traits:
• Fairness: why was “x” denied a credit limit upgrade? Is
there a racial bias in the data?
• Privacy: ensuring sensitive data in the information is
tracked and protected
• Robustness: testing that small changes in inputs don’t
lead to big changes in prediction
• Trust: humans trust a system that explains decisions
compared to a black box
When don’t we need interpretability?
• Problem is too well-studied
• Model has no significant impact
• Enable “gaming” of the ML system

©PistoiaAlliance
Conversational AI
Examining Bot Basics
Sources:
Demystifying Big Data & Machine Learning for Healthcare (Natarajan et al, CRC Press, 2017)
“Designing Bots: Creating Conversational Experiences” (Amir Shervat, O’Reilly Press 2017)
Prashant Natarajan

©PistoiaAlliance
AI & Bots: the Connections
2021 June, 2018
Conversational AI & Bots
• Most bots are powered by ML/AI – though not all of
them
• Designing a great conversation is orthogonal, in most
cases, to the decision to use AI or another technology
• What can AI do for bots today?
– Natural Language Understanding (extracting & converting free
text to entities)
– Conversation mgmt. and context switching
– Computer vision and image recognition
– Prediction – finding patterns and predicting outcomes based on
past data
– Sentiment analysis – understanding emotional state
• Bot types: personal v team, super v domain-specific,
business v consumer, text v voice, Net New Service v
New Interfaces

©PistoiaAlliance
Anatomy of a Bot
2121 June, 2018
• Bot anatomies are important given that the primary
purpose of a bot is to recognize and help accomplish
human intent
• Anatomical features of a bot include
– Branding, personality and human involvement
– AI
– Conversation management: onboarding, flows, feedback/error
handling, help and support
– Rich interactions via files, audio, images, buttons, helpful links,
emojis, typing indicators, Web views
– Context and memory
– Engagement methods: notifications, user-led, subscriptions

©PistoiaAlliance
Some Use Cases
2221 June, 2018
• Bot anatomies are important given that the primary
purpose of a bot is to recognize and help accomplish
human intent
• Conversational commerce – FB, Alexa, etc
• Bots for business – Slackbot, GitHub ChatOps
• Productivity and coaching – Lark, AHA, etc
• Alerts and notifications
• Router between humans (Uber, Lyft, scheduling bots)
• Customer service and FAQs
• 3rd party integration bots (Slack and CRM)
• Games and entertainment
• Brand bots

©PistoiaAlliance
Poll Question 3:
How important do you feel FAIR* data
principles are to ensuring successful
outputs from AI projects ?
A. Very important
B. Important
C. Neutral
D. Not important
E. Not Very important
FAIR : Findable Accessible, Interoperable & Reusable

©PistoiaAlliance
RWD and AI – how can work they
together?
The next Pistoia Alliance CoE AI Webinar:
Date: TBD July 2018
check http://www.pistoiaalliance.org/events/ for the latest information

info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org
Thanks for your engagement

Pistoia Alliance Demystifying AI & ML part 2

More Related Content

What's hot (20)

Similar to Pistoia Alliance Demystifying AI & ML part 2 (20)

More from Pistoia Alliance (20)

Recently uploaded (20)

Pistoia Alliance Demystifying AI & ML part 2