SlideShare a Scribd company logo
Mark Hamilton, Microsoft, marhamil@microsoft.com
Anand Raman, Microsoft, aram@microsoft.com
The Azure Cognitive Services on
Spark: Clusters with Embedded
Intelligent Services
#UnifiedAnalytics #SparkAISummit
Overview
• The Cognitive Services on Spark
– Basic Usage
– Fluent Design
• HTTP on Spark
– Architecture and Principles
• Clusters with Embedded Services
– Kubernetes, Databricks
• Examples
– GANs + the Metropolitan Museum of Art
2#UnifiedAnalytics #SparkAISummit
Motivation
• Azure Cognitive Services
provide high quality pre-
built intelligent services
• No need for time intensive
model training or
deployment
• Can quickly create
intelligent applications
• Leverage Microsoft
Research and Azure ML
3#UnifiedAnalytics #SparkAISummit
• http://www.seeingai.com
Object, scene, and
activity detection
Face recognition
and identification
Celebrity and landmark
recognition
Emotion recognition
Text and handwriting
recognition (OCR)
Customizable image
recognition
Video metadata, audio,
and keyframe extraction
and analysis
Explicit or offensive
content moderation
Speech transcription
(speech-to-text)
Custom speech models for
unique vocabularies or
complex environment
Text-to-speech
Custom Voice
Real-time speech translation
Customizable speech
transcription and translation
Speaker identification
and verification
Language detection
Named entity recognition
Key phrase extraction
Text sentiment analysis
Multilingual and contextual
spell checking
Explicit or offensive text
content moderation
PII detection for text
moderation
Text translation
Customizable text translation
Contextual language
understanding
Q&A extraction from
unstructured text
Knowledge base creation
from collections of Q&As
Semantic matching for
knowledge bases
Customizable content
personalization learning
Ad-free web, news, image,
and video search results
Trends for video, news
Image identification,
classification and
knowledge extraction
Identification of similar
images and products
Named entity recognition
and classification
Knowledge acquisition
for named entities
Search query autosuggest
Ad-free custom search
engine creation
Vision Speech Language Knowledge Search
Azure Cognitive Services on Spark
• Easy to use integration
between Spark and the
Azure Cognitive
Services
• Composable and
pipelinable with all other
SparkML models!
• Python, Scala, R (Beta)
5#UnifiedAnalytics #SparkAISummit
val df = new TextSentiment()
.setTextCol(“text”)
.setOutputCol(“sentiment”)
.transform(inputs)
http://www.seeingai.com
Fluent API for Advanced Orchestration
• Any parameter can be set with a dataframe
column or with a single value
7#UnifiedAnalytics #SparkAISummit
new BingImageSearch()
.setQueryCol(“queries”)
queries
Cat
Dog
Antelope
Car
Bob Ross
Get results for multiple search terms:
new BingImageSearch()
.setQuery(“cats”)
.setOffsetCol(“offsets”)
Fluent API for Advanced Orchestration
• Any parameter can be set with a dataframe
column or with a single value
8#UnifiedAnalytics #SparkAISummit
offsets
0
100
200
300
400
Get the first N pages of Bing for a
specific term:
Fluent API for Advanced Orchestration
• Any parameter can be set with a dataframe
column or with a single value
9#UnifiedAnalytics #SparkAISummit
offsets queries keys
0 Cat 17…
100 Cat 17…
0 Tree 3e…
100 Tree 4q…
0 Car G1…
Get the get fist 200 results for many
terms using several different accounts:
new BingImageSearch()
.setQueryCol(“queries”)
.setOffsetCol(“offsets”)
.setKeyCol(“keys”)
High Performance Capabilities OOTB
• Asynchronous Parallelism (P)
• Automatic Batching (B)
• Automatic Retries
– Exponential Back-offs
(EBO)
– Backpressure (BP)
10#UnifiedAnalytics #SparkAISummit
Features Time (s) Errors #
None 30.8 18993
EBO+BP 1163.0 0
EBO+BP+B 57.1 0
EBO+BP+B+P 49.7 0
10 nodes, 20k Requests, 1k req/min
limited service
• Full Integration between
HTTP Protocol and
Spark SQL
• Spark as a Microservice
Orchestrator
• Spark + X
11#UnifiedAnalytics #SparkAISummit
df = SimpleHTTPTransformer()
.setInputParser(JSONInputParser())
.setOutputParser(JSONOutputParser()
.setDataType(schema))
.setOutputCol("results")
.setUrl(…)
on
12#UnifiedAnalytics #SparkAISummit
on
Spark Worker
Partition Partition Partition
Client Client Client
Web Service
Spark Worker
Partition Partition Partition
Client Client Client
Local
Service
Local
Service
Local
Service
HTTP
Requests
and
Responses
Cognitive Service Containers
13#UnifiedAnalytics #SparkAISummit
Now In Public Preview
• No app changes & Compatible with full Cognitive Services
feature-set
• Support for 6 key AI capabilities:
• Key Phrase Extraction
• Language Detection
• Sentiment Analysis
• Face & Emotion Detection
• OCR / Text Recognition
• Language Understanding
• Run & manage locally, Try for free
• Connect to Billing service for report back, unified billing with
on-cloud and off-cloud transactions
• Additional Capabilities coming soon (e.g. Speech)
#UnifiedAnalytics #SparkAISummit 14
Clusters with Embedded Services
• Deploy cognitive
services directly onto
cluster worker nodes
• Bring the compute to the
data
• Use low latency in-
machine networking Spark Worker
Spark Scala Process
PySpark
Local
Cognitive
Service
Pyspark
Protocol HTTP
Azure Kubernetes Service + Helm
• Works on any k8s cluster
• Helm: Package Manager
for Kubernetes
15#UnifiedAnalytics #SparkAISummit
Kubernetes (AKS, ACS, GKE, On-Prem etc)
K8s workerK8s worker
Spark
Worker
Spark
Worker
K8s worker
Cognitive
Service
Container
HTTP on Spark
Spark
Worker
Cognitive
Service
Container
HTTP on Spark
Spark
Worker
Cognitive
Service
Container
HTTP on Spark
Spark
Serving
Load
Balancer
Jupyter,
Zepplin,
LIVY, or
Spark
Submit LB
Zepplin
Jupyter
Storage or
other
Databases
Cloud
Cognitive
Services
Spark Serving Hotpath
HTTP on Spark
Spark Readers
REST Requests to
Deployed Models
Submit Jobs, Run Notebooks,
Manage Cluster, etc
Users / Apps
helm repo add mmlspark 
https://dbanda.github.io/charts
helm install mmlspark/spark 
--set localTextApi=true
Dalitso Banda, dbanda@microsoft.com
Microsoft AI Development Acceleration Program
Creating a Visual Search Engine for
the Metropolitan Museum of Art
16#UnifiedAnalytics #SparkAISummit
https://gen.studio
Intelligent Image Annotation
• The MET
Released 400k
Images under
Open Access
• Pipe images
through
Computer Vision
API to annotate
image for
searching
17#UnifiedAnalytics #SparkAISummit
A picture
containing a
person
A picture
containing a
glass, cup
A fish
swimming
underwater
Query
Image:
Describe
Image
Output:
Deep
Feature
Nearest
Neighbors:
Reverse Image Search Architecture
18#UnifiedAnalytics #SparkAISummit
Filters from Zeiler + Fergus 2013
Query
Image
ResNet
Featurizer
Deep
Features
Closest
Match
Fast Nearest
Neighbor
Lookup
MMLSpark SparkML LSH or Annoy
Example Nearest Neighbors
19#UnifiedAnalytics #SparkAISummit
QueryImages
Nearest
Neighbors
Spark x Azure Search
• Azure Search Sink for
Spark
• Allows for pushing
thousands of documents
per second into Azure
Search instances
• Built on HTTP on Spark
• Use to create search APIs
on top of Spark Dataframe
20#UnifiedAnalytics #SparkAISummit
21#UnifiedAnalytics #SparkAISummit
Microsoft Machine Learning for
Apache Spark v0.16
Microsoft’s Open Source
Contributions to Apache Spark
www.aka.ms/spark Azure/mmlspark
Cognitive
Services
Spark
Serving
Model
Interpretability
LightGBM
Gradient Boosting
Deep Networks
with CNTK
HTTP on
Spark
Conclusions
• Can now embed
Cognitive Services into
Spark Workflows
• Can harness Spark
Cluster for
Microservices
• Get started now with
interactive examples!
22#UnifiedAnalytics #SparkAISummit
www.aka.ms/spark
Contact:
marhamil@microsoft.com
mmlspark-support@microsoft.com
Azure/mmlspark
Help us advance Spark:
Thanks To
• Sudarshan Raghunathan
• Ilya Matiach
• Microsoft NERD Garage Team + MIT Externship Program
• Microsoft Development Acceleration Team:
– Dalitso Banda, Casey Hong, Karthik Rajendran, Manon Knoertzer,
Tayo Amuneke, Alejandro Buendia
• Pablo Castro, Chris Hoder, Ryan Gaspar, Henrik Neilsen,
Joseph Sirosh, Andrew Schonhoffer, Daniel Ciborowski,
Markus Cosowicz
• Azure CAT, AzureML, and Azure Search Teams
23#UnifiedAnalytics #SparkAISummit
Backup Slides
24#UnifiedAnalytics #SparkAISummit
Real or
Generated
?
Noise
Vector
Generator
Generated
Image
Training Data
Discriminator
Real or
Generated
?
Learned
Noise
Vector
Generator Generated
Image
Target Image
Pretrained ResNet 50
𝐿𝑜𝑠𝑠 𝑝𝑖𝑥𝑒𝑙 + 𝐿𝑜𝑠𝑠𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐 × 𝜆
Inverted
Noise Vector
1
Inverted
Noise Vector
2
𝐺−1
𝐺−1
𝐺 𝐺 𝐺 𝐺 𝐺 𝐺
Code Space
Interpolation

More Related Content

PPTX
Databricks Platform.pptx
PPTX
Azure Cloud PPT
PPTX
Azure Overview Arc
PPTX
Intro to Apache Spark
PDF
Snowflake: The most cost-effective agile and scalable data warehouse ever!
PPTX
L1_RISE_with_SAP_NNN_V3.4.pptx
PPT
SaaS Presentation
PPTX
API Strategy Introduction
Databricks Platform.pptx
Azure Cloud PPT
Azure Overview Arc
Intro to Apache Spark
Snowflake: The most cost-effective agile and scalable data warehouse ever!
L1_RISE_with_SAP_NNN_V3.4.pptx
SaaS Presentation
API Strategy Introduction

What's hot (20)

PDF
Real time stock processing with apache nifi, apache flink and apache kafka
PPTX
Introduction to Google Cloud Platform
PDF
Slides-for-Benefits-for-Finance-moving-from-ECC-to-S4HANA-Final.pdf
PDF
Microservices Integration Patterns with Kafka
PPTX
Microsoft Azure Technical Overview
PDF
IBM API Connect - overview
PDF
SAP Business Technology Platform in a Nutshell
PDF
Migrate to Microsoft Azure with Confidence
PPTX
SAP Integration Suite L1
PDF
Data Ingestion in Big Data and IoT platforms
PDF
Cloud migration strategies
PPTX
PDF
Introduction to Azure Data Factory
PPTX
webMethods 10.5 and webMethods.io Integration: Everything You Must Know
PDF
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
PPTX
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
PDF
Cloud Migration: Moving Data and Infrastructure to the Cloud
PDF
Azure 101
PPTX
Azure migration
Real time stock processing with apache nifi, apache flink and apache kafka
Introduction to Google Cloud Platform
Slides-for-Benefits-for-Finance-moving-from-ECC-to-S4HANA-Final.pdf
Microservices Integration Patterns with Kafka
Microsoft Azure Technical Overview
IBM API Connect - overview
SAP Business Technology Platform in a Nutshell
Migrate to Microsoft Azure with Confidence
SAP Integration Suite L1
Data Ingestion in Big Data and IoT platforms
Cloud migration strategies
Introduction to Azure Data Factory
webMethods 10.5 and webMethods.io Integration: Everything You Must Know
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
Cloud Migration: Moving Data and Infrastructure to the Cloud
Azure 101
Azure migration
Ad

Similar to The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services (20)

PDF
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
PPTX
AI for Good at Microsoft
PDF
Self-Service Apache Spark Structured Streaming Applications and Analytics
PDF
Databricks with R: Deep Dive
PDF
Improving Apache Spark Downscaling
PDF
Performance Troubleshooting Using Apache Spark Metrics
PDF
Apache Spark for Cyber Security in an Enterprise Company
PDF
Infrastructure for Deep Learning in Apache Spark
PDF
Spark summit 2019 infrastructure for deep learning in apache spark 0425
PDF
Databricks + Snowflake: Catalyzing Data and AI Initiatives
PDF
Use O365 and Azure Cognitive Services for intelligent search
PPTX
In Memory Analytics with Apache Spark
PDF
Dev Ops Training
PDF
Deep Reality Simulation for Automated Poacher Detection with Mark Hamilton an...
PDF
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
PDF
Performance Analysis of Apache Spark and Presto in Cloud Environments
PDF
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
PDF
Ibm machine learning for z os
PDF
Scaling ML-Based Threat Detection For Production Cyber Attacks
PDF
Connecting the Dots: Integrating Apache Spark into Production Pipelines
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
AI for Good at Microsoft
Self-Service Apache Spark Structured Streaming Applications and Analytics
Databricks with R: Deep Dive
Improving Apache Spark Downscaling
Performance Troubleshooting Using Apache Spark Metrics
Apache Spark for Cyber Security in an Enterprise Company
Infrastructure for Deep Learning in Apache Spark
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Use O365 and Azure Cognitive Services for intelligent search
In Memory Analytics with Apache Spark
Dev Ops Training
Deep Reality Simulation for Automated Poacher Detection with Mark Hamilton an...
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Performance Analysis of Apache Spark and Presto in Cloud Environments
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Ibm machine learning for z os
Scaling ML-Based Threat Detection For Production Cyber Attacks
Connecting the Dots: Integrating Apache Spark into Production Pipelines
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Lecture1 pattern recognition............
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Introduction to machine learning and Linear Models
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to the R Programming Language
PPTX
Computer network topology notes for revision
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Clinical guidelines as a resource for EBP(1).pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Qualitative Qantitative and Mixed Methods.pptx
Lecture1 pattern recognition............
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Introduction to machine learning and Linear Models
[EN] Industrial Machine Downtime Prediction
Introduction-to-Cloud-ComputingFinal.pptx
ISS -ESG Data flows What is ESG and HowHow
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to the R Programming Language
Computer network topology notes for revision
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Clinical guidelines as a resource for EBP(1).pdf

The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services

  • 1. Mark Hamilton, Microsoft, marhamil@microsoft.com Anand Raman, Microsoft, aram@microsoft.com The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services #UnifiedAnalytics #SparkAISummit
  • 2. Overview • The Cognitive Services on Spark – Basic Usage – Fluent Design • HTTP on Spark – Architecture and Principles • Clusters with Embedded Services – Kubernetes, Databricks • Examples – GANs + the Metropolitan Museum of Art 2#UnifiedAnalytics #SparkAISummit
  • 3. Motivation • Azure Cognitive Services provide high quality pre- built intelligent services • No need for time intensive model training or deployment • Can quickly create intelligent applications • Leverage Microsoft Research and Azure ML 3#UnifiedAnalytics #SparkAISummit • http://www.seeingai.com
  • 4. Object, scene, and activity detection Face recognition and identification Celebrity and landmark recognition Emotion recognition Text and handwriting recognition (OCR) Customizable image recognition Video metadata, audio, and keyframe extraction and analysis Explicit or offensive content moderation Speech transcription (speech-to-text) Custom speech models for unique vocabularies or complex environment Text-to-speech Custom Voice Real-time speech translation Customizable speech transcription and translation Speaker identification and verification Language detection Named entity recognition Key phrase extraction Text sentiment analysis Multilingual and contextual spell checking Explicit or offensive text content moderation PII detection for text moderation Text translation Customizable text translation Contextual language understanding Q&A extraction from unstructured text Knowledge base creation from collections of Q&As Semantic matching for knowledge bases Customizable content personalization learning Ad-free web, news, image, and video search results Trends for video, news Image identification, classification and knowledge extraction Identification of similar images and products Named entity recognition and classification Knowledge acquisition for named entities Search query autosuggest Ad-free custom search engine creation Vision Speech Language Knowledge Search
  • 5. Azure Cognitive Services on Spark • Easy to use integration between Spark and the Azure Cognitive Services • Composable and pipelinable with all other SparkML models! • Python, Scala, R (Beta) 5#UnifiedAnalytics #SparkAISummit val df = new TextSentiment() .setTextCol(“text”) .setOutputCol(“sentiment”) .transform(inputs)
  • 7. Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value 7#UnifiedAnalytics #SparkAISummit new BingImageSearch() .setQueryCol(“queries”) queries Cat Dog Antelope Car Bob Ross Get results for multiple search terms:
  • 8. new BingImageSearch() .setQuery(“cats”) .setOffsetCol(“offsets”) Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value 8#UnifiedAnalytics #SparkAISummit offsets 0 100 200 300 400 Get the first N pages of Bing for a specific term:
  • 9. Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value 9#UnifiedAnalytics #SparkAISummit offsets queries keys 0 Cat 17… 100 Cat 17… 0 Tree 3e… 100 Tree 4q… 0 Car G1… Get the get fist 200 results for many terms using several different accounts: new BingImageSearch() .setQueryCol(“queries”) .setOffsetCol(“offsets”) .setKeyCol(“keys”)
  • 10. High Performance Capabilities OOTB • Asynchronous Parallelism (P) • Automatic Batching (B) • Automatic Retries – Exponential Back-offs (EBO) – Backpressure (BP) 10#UnifiedAnalytics #SparkAISummit Features Time (s) Errors # None 30.8 18993 EBO+BP 1163.0 0 EBO+BP+B 57.1 0 EBO+BP+B+P 49.7 0 10 nodes, 20k Requests, 1k req/min limited service
  • 11. • Full Integration between HTTP Protocol and Spark SQL • Spark as a Microservice Orchestrator • Spark + X 11#UnifiedAnalytics #SparkAISummit df = SimpleHTTPTransformer() .setInputParser(JSONInputParser()) .setOutputParser(JSONOutputParser() .setDataType(schema)) .setOutputCol("results") .setUrl(…) on
  • 12. 12#UnifiedAnalytics #SparkAISummit on Spark Worker Partition Partition Partition Client Client Client Web Service Spark Worker Partition Partition Partition Client Client Client Local Service Local Service Local Service HTTP Requests and Responses
  • 13. Cognitive Service Containers 13#UnifiedAnalytics #SparkAISummit Now In Public Preview • No app changes & Compatible with full Cognitive Services feature-set • Support for 6 key AI capabilities: • Key Phrase Extraction • Language Detection • Sentiment Analysis • Face & Emotion Detection • OCR / Text Recognition • Language Understanding • Run & manage locally, Try for free • Connect to Billing service for report back, unified billing with on-cloud and off-cloud transactions • Additional Capabilities coming soon (e.g. Speech)
  • 14. #UnifiedAnalytics #SparkAISummit 14 Clusters with Embedded Services • Deploy cognitive services directly onto cluster worker nodes • Bring the compute to the data • Use low latency in- machine networking Spark Worker Spark Scala Process PySpark Local Cognitive Service Pyspark Protocol HTTP
  • 15. Azure Kubernetes Service + Helm • Works on any k8s cluster • Helm: Package Manager for Kubernetes 15#UnifiedAnalytics #SparkAISummit Kubernetes (AKS, ACS, GKE, On-Prem etc) K8s workerK8s worker Spark Worker Spark Worker K8s worker Cognitive Service Container HTTP on Spark Spark Worker Cognitive Service Container HTTP on Spark Spark Worker Cognitive Service Container HTTP on Spark Spark Serving Load Balancer Jupyter, Zepplin, LIVY, or Spark Submit LB Zepplin Jupyter Storage or other Databases Cloud Cognitive Services Spark Serving Hotpath HTTP on Spark Spark Readers REST Requests to Deployed Models Submit Jobs, Run Notebooks, Manage Cluster, etc Users / Apps helm repo add mmlspark https://dbanda.github.io/charts helm install mmlspark/spark --set localTextApi=true Dalitso Banda, dbanda@microsoft.com Microsoft AI Development Acceleration Program
  • 16. Creating a Visual Search Engine for the Metropolitan Museum of Art 16#UnifiedAnalytics #SparkAISummit https://gen.studio
  • 17. Intelligent Image Annotation • The MET Released 400k Images under Open Access • Pipe images through Computer Vision API to annotate image for searching 17#UnifiedAnalytics #SparkAISummit A picture containing a person A picture containing a glass, cup A fish swimming underwater Query Image: Describe Image Output: Deep Feature Nearest Neighbors:
  • 18. Reverse Image Search Architecture 18#UnifiedAnalytics #SparkAISummit Filters from Zeiler + Fergus 2013 Query Image ResNet Featurizer Deep Features Closest Match Fast Nearest Neighbor Lookup MMLSpark SparkML LSH or Annoy
  • 19. Example Nearest Neighbors 19#UnifiedAnalytics #SparkAISummit QueryImages Nearest Neighbors
  • 20. Spark x Azure Search • Azure Search Sink for Spark • Allows for pushing thousands of documents per second into Azure Search instances • Built on HTTP on Spark • Use to create search APIs on top of Spark Dataframe 20#UnifiedAnalytics #SparkAISummit
  • 21. 21#UnifiedAnalytics #SparkAISummit Microsoft Machine Learning for Apache Spark v0.16 Microsoft’s Open Source Contributions to Apache Spark www.aka.ms/spark Azure/mmlspark Cognitive Services Spark Serving Model Interpretability LightGBM Gradient Boosting Deep Networks with CNTK HTTP on Spark
  • 22. Conclusions • Can now embed Cognitive Services into Spark Workflows • Can harness Spark Cluster for Microservices • Get started now with interactive examples! 22#UnifiedAnalytics #SparkAISummit www.aka.ms/spark Contact: marhamil@microsoft.com mmlspark-support@microsoft.com Azure/mmlspark Help us advance Spark:
  • 23. Thanks To • Sudarshan Raghunathan • Ilya Matiach • Microsoft NERD Garage Team + MIT Externship Program • Microsoft Development Acceleration Team: – Dalitso Banda, Casey Hong, Karthik Rajendran, Manon Knoertzer, Tayo Amuneke, Alejandro Buendia • Pablo Castro, Chris Hoder, Ryan Gaspar, Henrik Neilsen, Joseph Sirosh, Andrew Schonhoffer, Daniel Ciborowski, Markus Cosowicz • Azure CAT, AzureML, and Azure Search Teams 23#UnifiedAnalytics #SparkAISummit
  • 26. Learned Noise Vector Generator Generated Image Target Image Pretrained ResNet 50 𝐿𝑜𝑠𝑠 𝑝𝑖𝑥𝑒𝑙 + 𝐿𝑜𝑠𝑠𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐 × 𝜆
  • 27. Inverted Noise Vector 1 Inverted Noise Vector 2 𝐺−1 𝐺−1 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 Code Space Interpolation