SlideShare a Scribd company logo
https://itcd.hq.nasa.gov/data-science-day/plenary1
AGENDA
• Demo 1 Use
Amazon
Rekognition
API connected
to a camera
for near
real-time
image
analyses.
• Demo 2
Install and
extend Google
TensorFlow to
retrain a
classifier to
classify
custom
images.
• Demo 3 Use
AutoML.
Applied ML - Harsh Prakash 2
WHAT IS ML?
• ML, a subset of AI and a superset of DL, enables a user to perform
specific tasks, like predicting outcomes and recognizing images, without
explicit instructions by analyzing and learning from data based on
patterns and inference, and with minimal human intervention.
• NLP, a subset of AI, helps a user read, analyze, interpret and understand
natural language data, and perform speech recognition.
• AI helps a user’s computer systems learn (acquire data and the rules
governing its use), reason (reach conclusions), problem-solve and self-
correct to inform its decisions.
• Neural Network is at the heart - Designed to recognize patterns (variables
that rise and fall together).
DEMO 1 – “I.C.U.”
• Web app on Apache uses AWS SDK for Rekognition API connected to a camera
for near real-time image analyses.
• ML assigns LABELS/TAGS, and returns raw JSON response from the Model API.
• Can adjust MAXLABELS, MINCONFIDENCE, etc., be ported to Lambda/S3 Bucket,
and send alerts.
• Latency issue between capture and analysis.
Applied ML - Harsh Prakash 3
4
BEST
APPLICATION?
Image Classification in Remote Sensing Image Tagging with ML
Applied ML - Harsh Prakash
STEPS: Amazon Web Services (AWS) + Google Cloud Platform (GCP)
At GlobalMapAid.org, as volunteer Directors, our focus is on mapping poverty
hotspots 
Used Cloud-based ML model with satellite images to detect development
indicators at the neighborhood level.
1. Opened account with GCP.
2. Enabled Google Maps API for project, and billing for project to fetch
more than 1 satellite image per day using API key.
3. Tuning model for known test areas. E.g. New York...
• Used satellite images from Google Maps API at their highest available
resolution (Zoom: 17, or 1x1 sq. mile).
• Used AWS-based ML model to classify satellite images by infrastructure
levels.
• Assumed correlation between infrastructure and visual indicators in
satellite images.
* https://www.globalmapaid.org/patron-directors/
Applied ML - Harsh Prakash 5
Bridge – New York City, NY
ML assigns labels:
Nature, Outdoors, Landscape, Scenery
Applied ML - Harsh Prakash 6
City Center – New York
City, NY
ML assigns labels:
Outdoors, Nature,
Landscape, Scenery,
Urban, Building,
Neighborhood, Road,
Housing, City, Town,
Intersection
Rural Town of Cazenovia, NY
ML assigns labels:
Landscape, Outdoors, Nature,
Scenery, Aerial View, Land, Urban,
Road, Housing, Building, Yard,
Neighborhood
KNOWN
TEST
AREAS
FINDINGS FROM KNOWN TEST AREAS
• For the City Center in New York City, NY – ML assigns labels Urban with a
94% confidence. For the rural Town of Cazenovia, NY – ML assigns labels
Urban with a 76% confidence: A typical gap of about 15% points between
True Positive (TP) and False Positive (FP).
• Hybrid, Roadmap and Terrain images add noise.
• Real world applicability – Helpful even if it reinforces what people on
the ground already know.
Applied ML - Harsh Prakash 7
Actual Values
Predicted
Values
POSITIVE NEGATIVE
POSITIVE True Positive
(TP)
False Positive
(FP)
NEGATIVE False Negative
(FN)
True Negative
(TN)
CONFUSION MATRIX
TODO
• Use K-Nearest neighbors algorithm (k-NN) for pattern recognition to
predict for blind spots
• Transform labels to vector.
• Identify patterns early and predict natural disasters using weather data,
food data and agricultural data.
• If ground volunteers or local mining companies confirm charcoal fires
and/or cooking burners on satellite images, then tune model further.
Applied ML - Harsh Prakash 8
Urban
Rural
End Goal: Addis Ababa, Ethiopia
Applied ML - Harsh Prakash 9
• Explore LANDSAT data, and satellite and HELIOS images.
• Use Modeling, Analysis and Prediction (MAP) Program’s
Black Marble maps of night lights to gain insight on
human activity.
• Use external datasets to augment data for BI apps.
E.g. Census or USAID data.
• Explore solar images.
• Auto-tag media.
• AIOps – Analyze logs and texts, towards actionable
persuasive analytics.
E.g. Storage – E.g. S3 Intelligent Tiering.
Regression for website
visitor profile using
Census data.
Automatic clustering of
popular searches on
medlineplus.gov for
May, 2015, using R
STAT, PostGIS.
Example of Data
Augmentation:
POTENTIAL AT NASA -- DESCRIPTIVE, PREDICTIVE & PRESCRIPTIVE
10
Earth’s
magnetic
field
deflects
cosmic
radiation,
keeping us
safe.
NASA tests
radiation
shield
materials to
keep
astronauts
safe on
missions away
from Earth’s
protective
bubble.
POTENTIAL AT NASA
Applied ML - Harsh Prakash
Applied ML - Harsh Prakash 11
Solar Storm
ML assigns labels:
Nature, Flare, Light,
Outdoors, Sun, Sky,
Night, Astronomy,
Universe, Outer Space,
Space, Moon, Sunrise,
Mountain, Planet
Solar Storm
ML assigns labels:
Night, Nature, Space,
Outdoors, Universe, Moon,
Astronomy, Outer Space,
Sun, Sky, Flare, Light,
Mountain, Photo,
Photography
Detecting Solar Flares, and
Coronal Mass Ejections (CMEs)
sdo.gsfc.nasa.gov
POTENTIAL AT NASA
Applied ML - Harsh Prakash 12
TensorBoard @ localhost:6006
POTENTIAL AT NASA
DEMO 2 – “14 Commandments”
Transfer Learning – 14
commands to install
and extend TensorFlow
to retrain a
classifier to classify
your own images.
Applied ML - Harsh Prakash 13
Auto-tagging Media
• Tested images.nasa.gov’s REST API.
• Testing io.jsc.nasa.gov’s REST API.
• Auto-tagged images in corresponding
JSON files – e.g. image.jpg and
image.jpg.label.json –
Took 70 sec/100 images, or
1 sec and $0.001/image
POTENTIAL AT NASA
# Detect Entities:
In "[Wed Jul 11 14:32:52 2019] [error]
[client 127.0.0.1] client denied by server
configuration: /apache/htdocs/test"
Query:
"Entities[?Type=='DATE' && contains(Text,
'2019')].Text"
Result:
"Wed Oct 11 14:32:52 2019"
Applied ML - Harsh Prakash 14
# Detect Key Noun Phrases:
In "127.0.0.1 - - [07/Jul/2019:16:05:49 -
0800] "GET
/apache/htdocs/test?topicparent=Main.Confi
gurationVariables HTTP/1.1" 401 12846"
Query:
"KeyPhrases[?contains(Text,
'topicparent')].*"
Result:
"Score": 0.5314911603927612,
"Text": "topicparent",
"BeginOffset": 68,
"EndOffset": 79
Natural Language Processing (NLP) using AWS Comprehend in Log Files
POTENTIAL AT NASA
# Detect Key Noun Phrases:
 Auto-tagging
"KeyPhrases":
"Score": 0.97,
"Text": "NASA",
"BeginOffset": 0,
"EndOffset": 4
# Detect Syntax and Parts of Speech:
"SyntaxTokens":
"TokenId": 1,
"Text": "NASA’s",
"BeginOffset": 0,
"EndOffset": 6,
"PartOfSpeech":
"Tag": "PROPN",
"Score": 0.59
# Detect Sentiment:
"Sentiment": "NEUTRAL",
"SentimentScore":
"Positive": 0.03,
"Negative": 0.00,
"Neutral": 0.96,
"Mixed": 0.00
# Detect Named Entities (real-world
objects with proper names):
 Auto-tagging
“Entities":
"Score": 0.99,
"Type": "ORGANIZATION",
"Text": "NASA",
"BeginOffset": 0,
"EndOffset": 4
Applied ML - Harsh Prakash 15
Natural Language Processing (NLP) using AWS Comprehend in
" NASA’s TESS Mission Scores ‘Hat Trick’ With 3 New Worlds"
POTENTIAL AT NASA
DEMO 3 – “You Do You”
AutoML enables teams with limited ML expertise
to build and train high-quality data models
2.
4. 3.
POTENTIAL AT NASA Applied ML - Harsh Prakash
Sample Data File
AutoML
1.
2.
4. 3.
POTENTIAL AT NASA
Applied ML - Harsh Prakash
NASA Use Case
Describe the NASA problem you're trying to solve - Why is it important to NASA?
What is the potential benefit to NASA of solving this problem?
(What action will NASA take based on your model’s output?)
Modeling Dataset Characteristics
Will a Data Dictionary be available?
What is the meaning of a record in this dataset?
(What does a row in the dataset correspond to)?
What is the target variable?
What is ideal model evaluation metric that you are considering (AUC, Gini, etc.)?
Machine Learning Solution
How have you approached solving this problem in the past?
Are there existing NASA models that are currently in use?
If so, what is the current model performance?
What Success Criteria will you use to evaluate results?
Model Deployment
Once the models are developed, how will they be used?
What will be the volume of prediction? Will real-time predictions be required?
How will these predictions be stored?Applied ML - Harsh Prakash 18
NEXT STEPS AT NASA
ML Life Cycle – Basic Steps
• Define Project Objectives –
• Specify problem, acquire SME, define target/unit of analysis, prioritize
modeling criteria, define success criteria/risks.
• Acquire/Explore Data –
• Find/format/explore data, Remove target leakage, Perform feature engineering.
• Model Data –
• Select features, build/validate models.
• Interpret/Communicate –
• Assess model quality, determine important features, identify relationships,
explain predictions.
• Implement/Document –
• Select model, document process, monitor/maintain model.
Applied ML - Harsh Prakash 19
NEXT STEPS AT NASA
20
RORSCHACH TEST
“MIT researchers: Amazon’s Rekognition shows gender and ethnic bias”
Applied ML - Harsh Prakash
RECAP
• ML Frameworks – AWS Rekognition, AWS Comprehend, AWS SageMaker, AWS
QuickSight, AWS Deep Learning AMI, Google Cloud Platform, Google Vision,
Microsoft Computer Vision, Microsoft Power BI, Salesforce Tableau,
TensorFlow, PyTorch, Jupyter Notebook, R STAT, Anaconda.
• Model as a Service – AutoML, Models on AWS Marketplace.
• Questions? Applied ML - Harsh Prakash 21
Shout Out!  AAAI 2019 FALL SYMPOSIUM SERIES
Applied ML - Harsh Prakash 22
REFERENCE: COLLEGE PROJECT *
Growth Study for Charlottesville VA, 2000-2030
Used satellite images and Census data to compute population growth
distribution –
• Divided study area of the county into 5,745 grid cells (250 meters x 250
meters).
• Traditional compute model assigned growth weights based on development
indicators at the neighborhood level.
* https://www.slideshare.net/gisblog/gis-growth-study-for-charlottesville-va-20002030-plan-885-vamlis-2001-38716260
Sample Development Indicator

More Related Content

PDF
Use Machine Learning to Get the Most out of Your Big Data Clusters
PDF
Energy analytics with Apache Spark workshop
PDF
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
PPTX
Situation Recognition from Multimodal Data Tutorial (ICME2016)
PDF
KunGao_Resume.
PPTX
EventShop Demo
PPT
Data Science in the Real World: Making a Difference
PPTX
10 Steps to Optimize Your Crime Analysis
Use Machine Learning to Get the Most out of Your Big Data Clusters
Energy analytics with Apache Spark workshop
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Situation Recognition from Multimodal Data Tutorial (ICME2016)
KunGao_Resume.
EventShop Demo
Data Science in the Real World: Making a Difference
10 Steps to Optimize Your Crime Analysis

What's hot (6)

PPTX
Social Life Networks (Eventshop and Personal Event Shop)
PPTX
EvIM: a real time complex event discovery platform for CPSS
PDF
Вебинар: Julia — A fresh approach to numerical computing and data science
PDF
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
PPTX
EventShop ISG talk 140213
PPTX
Apache Mahout: Driving the Yellow Elephant
Social Life Networks (Eventshop and Personal Event Shop)
EvIM: a real time complex event discovery platform for CPSS
Вебинар: Julia — A fresh approach to numerical computing and data science
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
EventShop ISG talk 140213
Apache Mahout: Driving the Yellow Elephant
Ad

Similar to NASA Data Science Day Plenary: Applied Machine Learning (ML) (20)

PPTX
Big Data Pipelines and Machine Learning at Uber
PPTX
20181128 satellogic @ barcelona ai
PPTX
UVA School of Data Science
PPTX
Introduction overviewmachinelearning sig Door Lucas Jellema
PPTX
understanding the planet using satellites and deep learning
PDF
Thamme Gowda's Summer2016- NASA JPL Internship
PPTX
The UVA School of Data Science
PPTX
Ai use cases
PDF
OpenML Tutorial ECMLPKDD 2015
PPTX
Introduction to Machine Learning - An overview and first step for candidate d...
PPTX
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
PDF
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
PDF
DevOps for DataScience
PDF
OpenML 2014
PDF
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
PDF
OpenML data@Sheffield
PDF
ML and Data Science at Uber - GITPro talk 2017
PDF
Machine Learning for Chemistry: Representing and Intervening
PDF
TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
Big Data Pipelines and Machine Learning at Uber
20181128 satellogic @ barcelona ai
UVA School of Data Science
Introduction overviewmachinelearning sig Door Lucas Jellema
understanding the planet using satellites and deep learning
Thamme Gowda's Summer2016- NASA JPL Internship
The UVA School of Data Science
Ai use cases
OpenML Tutorial ECMLPKDD 2015
Introduction to Machine Learning - An overview and first step for candidate d...
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
DevOps for DataScience
OpenML 2014
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
OpenML data@Sheffield
ML and Data Science at Uber - GITPro talk 2017
Machine Learning for Chemistry: Representing and Intervening
TensorFlow-KR 3rd meetup - Lightning Talk for SI Analytics
Ad

More from Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP) (14)

PPTX
Model Optimal Drilling Location (MODL)
PDF
GIS Planning - Using GIS for County Multi-Hazard Mitigation Plan (HMP)
PDF
GIS Market Study of Internet Mapping Server (IMS) - Summary - Requirements an...
PPT
3D GIS - Using ESRI 3D Analyst & ESRI ArcScene for Visualization
PDF
Report on Options for Division Webinars - Final (Version 7) - APA - DC - EC (11)
PDF
Performance Report - APA Technology Division (12)
PDF
GIS Growth Study for Charlottesville VA - 2000-2030 (PLAN 885) - VAMLIS
Model Optimal Drilling Location (MODL)
GIS Planning - Using GIS for County Multi-Hazard Mitigation Plan (HMP)
GIS Market Study of Internet Mapping Server (IMS) - Summary - Requirements an...
3D GIS - Using ESRI 3D Analyst & ESRI ArcScene for Visualization
Report on Options for Division Webinars - Final (Version 7) - APA - DC - EC (11)
Performance Report - APA Technology Division (12)
GIS Growth Study for Charlottesville VA - 2000-2030 (PLAN 885) - VAMLIS

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
A Presentation on Artificial Intelligence
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Machine Learning_overview_presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
A Presentation on Artificial Intelligence
A comparative analysis of optical character recognition models for extracting...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
Machine Learning_overview_presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

NASA Data Science Day Plenary: Applied Machine Learning (ML)

  • 1. https://itcd.hq.nasa.gov/data-science-day/plenary1 AGENDA • Demo 1 Use Amazon Rekognition API connected to a camera for near real-time image analyses. • Demo 2 Install and extend Google TensorFlow to retrain a classifier to classify custom images. • Demo 3 Use AutoML.
  • 2. Applied ML - Harsh Prakash 2 WHAT IS ML? • ML, a subset of AI and a superset of DL, enables a user to perform specific tasks, like predicting outcomes and recognizing images, without explicit instructions by analyzing and learning from data based on patterns and inference, and with minimal human intervention. • NLP, a subset of AI, helps a user read, analyze, interpret and understand natural language data, and perform speech recognition. • AI helps a user’s computer systems learn (acquire data and the rules governing its use), reason (reach conclusions), problem-solve and self- correct to inform its decisions. • Neural Network is at the heart - Designed to recognize patterns (variables that rise and fall together).
  • 3. DEMO 1 – “I.C.U.” • Web app on Apache uses AWS SDK for Rekognition API connected to a camera for near real-time image analyses. • ML assigns LABELS/TAGS, and returns raw JSON response from the Model API. • Can adjust MAXLABELS, MINCONFIDENCE, etc., be ported to Lambda/S3 Bucket, and send alerts. • Latency issue between capture and analysis. Applied ML - Harsh Prakash 3
  • 4. 4 BEST APPLICATION? Image Classification in Remote Sensing Image Tagging with ML Applied ML - Harsh Prakash
  • 5. STEPS: Amazon Web Services (AWS) + Google Cloud Platform (GCP) At GlobalMapAid.org, as volunteer Directors, our focus is on mapping poverty hotspots  Used Cloud-based ML model with satellite images to detect development indicators at the neighborhood level. 1. Opened account with GCP. 2. Enabled Google Maps API for project, and billing for project to fetch more than 1 satellite image per day using API key. 3. Tuning model for known test areas. E.g. New York... • Used satellite images from Google Maps API at their highest available resolution (Zoom: 17, or 1x1 sq. mile). • Used AWS-based ML model to classify satellite images by infrastructure levels. • Assumed correlation between infrastructure and visual indicators in satellite images. * https://www.globalmapaid.org/patron-directors/ Applied ML - Harsh Prakash 5
  • 6. Bridge – New York City, NY ML assigns labels: Nature, Outdoors, Landscape, Scenery Applied ML - Harsh Prakash 6 City Center – New York City, NY ML assigns labels: Outdoors, Nature, Landscape, Scenery, Urban, Building, Neighborhood, Road, Housing, City, Town, Intersection Rural Town of Cazenovia, NY ML assigns labels: Landscape, Outdoors, Nature, Scenery, Aerial View, Land, Urban, Road, Housing, Building, Yard, Neighborhood KNOWN TEST AREAS
  • 7. FINDINGS FROM KNOWN TEST AREAS • For the City Center in New York City, NY – ML assigns labels Urban with a 94% confidence. For the rural Town of Cazenovia, NY – ML assigns labels Urban with a 76% confidence: A typical gap of about 15% points between True Positive (TP) and False Positive (FP). • Hybrid, Roadmap and Terrain images add noise. • Real world applicability – Helpful even if it reinforces what people on the ground already know. Applied ML - Harsh Prakash 7 Actual Values Predicted Values POSITIVE NEGATIVE POSITIVE True Positive (TP) False Positive (FP) NEGATIVE False Negative (FN) True Negative (TN) CONFUSION MATRIX
  • 8. TODO • Use K-Nearest neighbors algorithm (k-NN) for pattern recognition to predict for blind spots • Transform labels to vector. • Identify patterns early and predict natural disasters using weather data, food data and agricultural data. • If ground volunteers or local mining companies confirm charcoal fires and/or cooking burners on satellite images, then tune model further. Applied ML - Harsh Prakash 8 Urban Rural End Goal: Addis Ababa, Ethiopia
  • 9. Applied ML - Harsh Prakash 9 • Explore LANDSAT data, and satellite and HELIOS images. • Use Modeling, Analysis and Prediction (MAP) Program’s Black Marble maps of night lights to gain insight on human activity. • Use external datasets to augment data for BI apps. E.g. Census or USAID data. • Explore solar images. • Auto-tag media. • AIOps – Analyze logs and texts, towards actionable persuasive analytics. E.g. Storage – E.g. S3 Intelligent Tiering. Regression for website visitor profile using Census data. Automatic clustering of popular searches on medlineplus.gov for May, 2015, using R STAT, PostGIS. Example of Data Augmentation: POTENTIAL AT NASA -- DESCRIPTIVE, PREDICTIVE & PRESCRIPTIVE
  • 10. 10 Earth’s magnetic field deflects cosmic radiation, keeping us safe. NASA tests radiation shield materials to keep astronauts safe on missions away from Earth’s protective bubble. POTENTIAL AT NASA Applied ML - Harsh Prakash
  • 11. Applied ML - Harsh Prakash 11 Solar Storm ML assigns labels: Nature, Flare, Light, Outdoors, Sun, Sky, Night, Astronomy, Universe, Outer Space, Space, Moon, Sunrise, Mountain, Planet Solar Storm ML assigns labels: Night, Nature, Space, Outdoors, Universe, Moon, Astronomy, Outer Space, Sun, Sky, Flare, Light, Mountain, Photo, Photography Detecting Solar Flares, and Coronal Mass Ejections (CMEs) sdo.gsfc.nasa.gov POTENTIAL AT NASA
  • 12. Applied ML - Harsh Prakash 12 TensorBoard @ localhost:6006 POTENTIAL AT NASA DEMO 2 – “14 Commandments” Transfer Learning – 14 commands to install and extend TensorFlow to retrain a classifier to classify your own images.
  • 13. Applied ML - Harsh Prakash 13 Auto-tagging Media • Tested images.nasa.gov’s REST API. • Testing io.jsc.nasa.gov’s REST API. • Auto-tagged images in corresponding JSON files – e.g. image.jpg and image.jpg.label.json – Took 70 sec/100 images, or 1 sec and $0.001/image POTENTIAL AT NASA
  • 14. # Detect Entities: In "[Wed Jul 11 14:32:52 2019] [error] [client 127.0.0.1] client denied by server configuration: /apache/htdocs/test" Query: "Entities[?Type=='DATE' && contains(Text, '2019')].Text" Result: "Wed Oct 11 14:32:52 2019" Applied ML - Harsh Prakash 14 # Detect Key Noun Phrases: In "127.0.0.1 - - [07/Jul/2019:16:05:49 - 0800] "GET /apache/htdocs/test?topicparent=Main.Confi gurationVariables HTTP/1.1" 401 12846" Query: "KeyPhrases[?contains(Text, 'topicparent')].*" Result: "Score": 0.5314911603927612, "Text": "topicparent", "BeginOffset": 68, "EndOffset": 79 Natural Language Processing (NLP) using AWS Comprehend in Log Files POTENTIAL AT NASA
  • 15. # Detect Key Noun Phrases:  Auto-tagging "KeyPhrases": "Score": 0.97, "Text": "NASA", "BeginOffset": 0, "EndOffset": 4 # Detect Syntax and Parts of Speech: "SyntaxTokens": "TokenId": 1, "Text": "NASA’s", "BeginOffset": 0, "EndOffset": 6, "PartOfSpeech": "Tag": "PROPN", "Score": 0.59 # Detect Sentiment: "Sentiment": "NEUTRAL", "SentimentScore": "Positive": 0.03, "Negative": 0.00, "Neutral": 0.96, "Mixed": 0.00 # Detect Named Entities (real-world objects with proper names):  Auto-tagging “Entities": "Score": 0.99, "Type": "ORGANIZATION", "Text": "NASA", "BeginOffset": 0, "EndOffset": 4 Applied ML - Harsh Prakash 15 Natural Language Processing (NLP) using AWS Comprehend in " NASA’s TESS Mission Scores ‘Hat Trick’ With 3 New Worlds" POTENTIAL AT NASA
  • 16. DEMO 3 – “You Do You” AutoML enables teams with limited ML expertise to build and train high-quality data models 2. 4. 3. POTENTIAL AT NASA Applied ML - Harsh Prakash Sample Data File
  • 17. AutoML 1. 2. 4. 3. POTENTIAL AT NASA Applied ML - Harsh Prakash
  • 18. NASA Use Case Describe the NASA problem you're trying to solve - Why is it important to NASA? What is the potential benefit to NASA of solving this problem? (What action will NASA take based on your model’s output?) Modeling Dataset Characteristics Will a Data Dictionary be available? What is the meaning of a record in this dataset? (What does a row in the dataset correspond to)? What is the target variable? What is ideal model evaluation metric that you are considering (AUC, Gini, etc.)? Machine Learning Solution How have you approached solving this problem in the past? Are there existing NASA models that are currently in use? If so, what is the current model performance? What Success Criteria will you use to evaluate results? Model Deployment Once the models are developed, how will they be used? What will be the volume of prediction? Will real-time predictions be required? How will these predictions be stored?Applied ML - Harsh Prakash 18 NEXT STEPS AT NASA
  • 19. ML Life Cycle – Basic Steps • Define Project Objectives – • Specify problem, acquire SME, define target/unit of analysis, prioritize modeling criteria, define success criteria/risks. • Acquire/Explore Data – • Find/format/explore data, Remove target leakage, Perform feature engineering. • Model Data – • Select features, build/validate models. • Interpret/Communicate – • Assess model quality, determine important features, identify relationships, explain predictions. • Implement/Document – • Select model, document process, monitor/maintain model. Applied ML - Harsh Prakash 19 NEXT STEPS AT NASA
  • 20. 20 RORSCHACH TEST “MIT researchers: Amazon’s Rekognition shows gender and ethnic bias” Applied ML - Harsh Prakash
  • 21. RECAP • ML Frameworks – AWS Rekognition, AWS Comprehend, AWS SageMaker, AWS QuickSight, AWS Deep Learning AMI, Google Cloud Platform, Google Vision, Microsoft Computer Vision, Microsoft Power BI, Salesforce Tableau, TensorFlow, PyTorch, Jupyter Notebook, R STAT, Anaconda. • Model as a Service – AutoML, Models on AWS Marketplace. • Questions? Applied ML - Harsh Prakash 21 Shout Out!  AAAI 2019 FALL SYMPOSIUM SERIES
  • 22. Applied ML - Harsh Prakash 22 REFERENCE: COLLEGE PROJECT * Growth Study for Charlottesville VA, 2000-2030 Used satellite images and Census data to compute population growth distribution – • Divided study area of the county into 5,745 grid cells (250 meters x 250 meters). • Traditional compute model assigned growth weights based on development indicators at the neighborhood level. * https://www.slideshare.net/gisblog/gis-growth-study-for-charlottesville-va-20002030-plan-885-vamlis-2001-38716260 Sample Development Indicator