NASA Data Science Day Plenary: Applied Machine Learning (ML)

https://itcd.hq.nasa.gov/data-science-day/plenary1
AGENDA
• Demo 1 Use
Amazon
Rekognition
API connected
to a camera
for near
real-time
image
analyses.
• Demo 2
Install and
extend Google
TensorFlow to
retrain a
classifier to
classify
custom
images.
• Demo 3 Use
AutoML.

Applied ML - Harsh Prakash 2
WHAT IS ML?
• ML, a subset of AI and a superset of DL, enables a user to perform
specific tasks, like predicting outcomes and recognizing images, without
explicit instructions by analyzing and learning from data based on
patterns and inference, and with minimal human intervention.
• NLP, a subset of AI, helps a user read, analyze, interpret and understand
natural language data, and perform speech recognition.
• AI helps a user’s computer systems learn (acquire data and the rules
governing its use), reason (reach conclusions), problem-solve and self-
correct to inform its decisions.
• Neural Network is at the heart - Designed to recognize patterns (variables
that rise and fall together).

DEMO 1 – “I.C.U.”
• Web app on Apache uses AWS SDK for Rekognition API connected to a camera
for near real-time image analyses.
• ML assigns LABELS/TAGS, and returns raw JSON response from the Model API.
• Can adjust MAXLABELS, MINCONFIDENCE, etc., be ported to Lambda/S3 Bucket,
and send alerts.
• Latency issue between capture and analysis.

4
BEST
APPLICATION?
Image Classification in Remote Sensing Image Tagging with ML
Applied ML - Harsh Prakash

STEPS: Amazon Web Services (AWS) + Google Cloud Platform (GCP)
At GlobalMapAid.org, as volunteer Directors, our focus is on mapping poverty
hotspots 
Used Cloud-based ML model with satellite images to detect development
indicators at the neighborhood level.
1. Opened account with GCP.
2. Enabled Google Maps API for project, and billing for project to fetch
more than 1 satellite image per day using API key.
3. Tuning model for known test areas. E.g. New York...
• Used satellite images from Google Maps API at their highest available
resolution (Zoom: 17, or 1x1 sq. mile).
• Used AWS-based ML model to classify satellite images by infrastructure
levels.
• Assumed correlation between infrastructure and visual indicators in
satellite images.
* https://www.globalmapaid.org/patron-directors/

Bridge – New York City, NY
ML assigns labels:
Nature, Outdoors, Landscape, Scenery
City Center – New York
City, NY
ML assigns labels:
Outdoors, Nature,
Landscape, Scenery,
Urban, Building,
Neighborhood, Road,
Housing, City, Town,
Intersection
Rural Town of Cazenovia, NY
ML assigns labels:
Landscape, Outdoors, Nature,
Scenery, Aerial View, Land, Urban,
Road, Housing, Building, Yard,
Neighborhood
KNOWN
TEST
AREAS

FINDINGS FROM KNOWN TEST AREAS
• For the City Center in New York City, NY – ML assigns labels Urban with a
94% confidence. For the rural Town of Cazenovia, NY – ML assigns labels
Urban with a 76% confidence: A typical gap of about 15% points between
True Positive (TP) and False Positive (FP).
• Hybrid, Roadmap and Terrain images add noise.
• Real world applicability – Helpful even if it reinforces what people on
the ground already know.
Actual Values
Predicted
Values
POSITIVE NEGATIVE
POSITIVE True Positive
(TP)
False Positive
(FP)
NEGATIVE False Negative
(FN)
True Negative
(TN)
CONFUSION MATRIX

TODO
• Use K-Nearest neighbors algorithm (k-NN) for pattern recognition to
predict for blind spots
• Transform labels to vector.
• Identify patterns early and predict natural disasters using weather data,
food data and agricultural data.
• If ground volunteers or local mining companies confirm charcoal fires
and/or cooking burners on satellite images, then tune model further.
Urban
Rural
End Goal: Addis Ababa, Ethiopia

• Explore LANDSAT data, and satellite and HELIOS images.
• Use Modeling, Analysis and Prediction (MAP) Program’s
Black Marble maps of night lights to gain insight on
human activity.
• Use external datasets to augment data for BI apps.
E.g. Census or USAID data.
• Explore solar images.
• Auto-tag media.
• AIOps – Analyze logs and texts, towards actionable
persuasive analytics.
E.g. Storage – E.g. S3 Intelligent Tiering.
Regression for website
visitor profile using
Census data.
Automatic clustering of
popular searches on
medlineplus.gov for
May, 2015, using R
STAT, PostGIS.
Example of Data
Augmentation:
POTENTIAL AT NASA -- DESCRIPTIVE, PREDICTIVE & PRESCRIPTIVE

10
Earth’s
magnetic
field
deflects
cosmic
radiation,
keeping us
safe.
NASA tests
radiation
shield
materials to
keep
astronauts
safe on
missions away
from Earth’s
protective
bubble.
POTENTIAL AT NASA

Solar Storm
ML assigns labels:
Nature, Flare, Light,
Outdoors, Sun, Sky,
Night, Astronomy,
Universe, Outer Space,
Space, Moon, Sunrise,
Mountain, Planet
Solar Storm
ML assigns labels:
Night, Nature, Space,
Outdoors, Universe, Moon,
Astronomy, Outer Space,
Sun, Sky, Flare, Light,
Mountain, Photo,
Photography
Detecting Solar Flares, and
Coronal Mass Ejections (CMEs)
sdo.gsfc.nasa.gov
POTENTIAL AT NASA

TensorBoard @ localhost:6006
POTENTIAL AT NASA
DEMO 2 – “14 Commandments”
Transfer Learning – 14
commands to install
and extend TensorFlow
to retrain a
classifier to classify
your own images.

Auto-tagging Media
• Tested images.nasa.gov’s REST API.
• Testing io.jsc.nasa.gov’s REST API.
• Auto-tagged images in corresponding
JSON files – e.g. image.jpg and
image.jpg.label.json –
Took 70 sec/100 images, or
1 sec and $0.001/image
POTENTIAL AT NASA

# Detect Entities:
In "[Wed Jul 11 14:32:52 2019] [error]
[client 127.0.0.1] client denied by server
configuration: /apache/htdocs/test"
Query:
"Entities[?Type=='DATE' && contains(Text,
'2019')].Text"
Result:
"Wed Oct 11 14:32:52 2019"
# Detect Key Noun Phrases:
In "127.0.0.1 - - [07/Jul/2019:16:05:49 -
0800] "GET
/apache/htdocs/test?topicparent=Main.Confi
gurationVariables HTTP/1.1" 401 12846"
Query:
"KeyPhrases[?contains(Text,
'topicparent')].*"
Result:
"Score": 0.5314911603927612,
"Text": "topicparent",
"BeginOffset": 68,
"EndOffset": 79
Natural Language Processing (NLP) using AWS Comprehend in Log Files
POTENTIAL AT NASA

# Detect Key Noun Phrases:
 Auto-tagging
"KeyPhrases":
"Score": 0.97,
"Text": "NASA",
"BeginOffset": 0,
"EndOffset": 4
# Detect Syntax and Parts of Speech:
"SyntaxTokens":
"TokenId": 1,
"Text": "NASA’s",
"BeginOffset": 0,
"EndOffset": 6,
"PartOfSpeech":
"Tag": "PROPN",
"Score": 0.59
# Detect Sentiment:
"Sentiment": "NEUTRAL",
"SentimentScore":
"Positive": 0.03,
"Negative": 0.00,
"Neutral": 0.96,
"Mixed": 0.00
# Detect Named Entities (real-world
objects with proper names):
 Auto-tagging
“Entities":
"Score": 0.99,
"Type": "ORGANIZATION",
"Text": "NASA",
"BeginOffset": 0,
"EndOffset": 4
Natural Language Processing (NLP) using AWS Comprehend in
" NASA’s TESS Mission Scores ‘Hat Trick’ With 3 New Worlds"
POTENTIAL AT NASA

DEMO 3 – “You Do You”
AutoML enables teams with limited ML expertise
to build and train high-quality data models
2.
4. 3.
POTENTIAL AT NASA Applied ML - Harsh Prakash
Sample Data File

AutoML
1.
2.
4. 3.
POTENTIAL AT NASA

NASA Use Case
Describe the NASA problem you're trying to solve - Why is it important to NASA?
What is the potential benefit to NASA of solving this problem?
(What action will NASA take based on your model’s output?)
Modeling Dataset Characteristics
Will a Data Dictionary be available?
What is the meaning of a record in this dataset?
(What does a row in the dataset correspond to)?
What is the target variable?
What is ideal model evaluation metric that you are considering (AUC, Gini, etc.)?
Machine Learning Solution
How have you approached solving this problem in the past?
Are there existing NASA models that are currently in use?
If so, what is the current model performance?
What Success Criteria will you use to evaluate results?
Model Deployment
Once the models are developed, how will they be used?
What will be the volume of prediction? Will real-time predictions be required?
How will these predictions be stored?Applied ML - Harsh Prakash 18
NEXT STEPS AT NASA

ML Life Cycle – Basic Steps
• Define Project Objectives –
• Specify problem, acquire SME, define target/unit of analysis, prioritize
modeling criteria, define success criteria/risks.
• Acquire/Explore Data –
• Find/format/explore data, Remove target leakage, Perform feature engineering.
• Model Data –
• Select features, build/validate models.
• Interpret/Communicate –
• Assess model quality, determine important features, identify relationships,
explain predictions.
• Implement/Document –
• Select model, document process, monitor/maintain model.
NEXT STEPS AT NASA

20
RORSCHACH TEST
“MIT researchers: Amazon’s Rekognition shows gender and ethnic bias”

RECAP
• ML Frameworks – AWS Rekognition, AWS Comprehend, AWS SageMaker, AWS
QuickSight, AWS Deep Learning AMI, Google Cloud Platform, Google Vision,
Microsoft Computer Vision, Microsoft Power BI, Salesforce Tableau,
TensorFlow, PyTorch, Jupyter Notebook, R STAT, Anaconda.
• Model as a Service – AutoML, Models on AWS Marketplace.
• Questions? Applied ML - Harsh Prakash 21
Shout Out!  AAAI 2019 FALL SYMPOSIUM SERIES

REFERENCE: COLLEGE PROJECT *
Growth Study for Charlottesville VA, 2000-2030
Used satellite images and Census data to compute population growth
distribution –
• Divided study area of the county into 5,745 grid cells (250 meters x 250
meters).
• Traditional compute model assigned growth weights based on development
indicators at the neighborhood level.
* https://www.slideshare.net/gisblog/gis-growth-study-for-charlottesville-va-20002030-plan-885-vamlis-2001-38716260
Sample Development Indicator

NASA Data Science Day Plenary: Applied Machine Learning (ML)

More Related Content

What's hot (6)

Similar to NASA Data Science Day Plenary: Applied Machine Learning (ML) (20)

More from Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP) (14)

Recently uploaded (20)

NASA Data Science Day Plenary: Applied Machine Learning (ML)