SlideShare a Scribd company logo
Introducing the Feature Store in Hopsworks
Bay Area AI Meetup @ Mesosphere
March 5th, 2019
jim_dowling
CEO @ Logical Clocks
Assoc Prof @ KTH
Today’s Agenda
1. What is a Feature Store and why do you need one?
2. The Hopsworks’ Feature Store
3. Demo
2
Become a Data Scientist!
3
Eureka! This will
give a 12% increase
in the efficiency of
this wind farm!
Data Scientists are not Data Engineers
4
HDFSGCS Storage CosmosDB
How do I find features in this sea of data sources?
This tastes like dairy
in my Latte!
What is a Feature?
A measurable property of a phenomena under observation
•A raw word, a pixel, a sensor value a feature
•A column in a datastore
•An aggregate
(mean, max, sum, min)
•A derived representation
(embedding or cluster)
5
©2018 Logical Clocks AB. All Rights Reserved
6
A More
Complex
Feature
Pipeline
Data Science with the Feature Store
7
HDFSGCS Storage CosmosDB
Feature Warehouse Store
Feature Pipelines (Select, Transform, Aggregate, ..)
Now, I can change
the world - one click-
through at a time.
Features need to be first-class entities
•Features should be discoverable and reused.
•Features should be access controlled,
versioned, and governed.
- Enable reproducibility.
•Ability to pre-compute and
automatically backfill features.
- Aggregates, embeddings - avoid expensive re-computation.
- On-demand computation of features should also be possible.
•The Feature Store should help “solve the data problem, so that Data
Scientists don’t have to.” [uber]
8
Hopsworks’ Feature Store
- Reusability of features
between models and teams
- Automatic backfilling of
features
- Automatic feature
documentation and analysis
- Feature versioning
- Standardized access of
features between training
and serving
- Feature discovery
- Access control for
Feature Stores
9
There are other advantages to the Feature Store …
10
Just select and type text.
Use control handle to
adjust line spacing.
Bert
Features
Bert
Features
Bert
Features
Marketing Research Analytics
Prevent Duplicated Feature Engineering
11
DUPLICATED
Prevent Inconsistent Features– Training/Serving
12
Feature implementations
may not be consistent –
correctness problems!
Known Feature Stores in Production
•Logical Clocks – Hopsworks (open source)
•Uber Michelangelo
•Airbnb – Bighead/Zipline
•Comcast
•Twitter
•GO-JEK Feast (open source on GCE)
13
The API Between Data Science and Data Engineering
14
Data Engineer
Data Scientist
A Feature Store for Hopsworks
15
©2018 Logical Clocks AB. All Rights Reserved
Short History of the Hops Project
16
3/5/2019
2017 2018
Publish world’s fastest
HDFS (HopsFS) at USENIX
FAST with Spotify
Winner of IEEE Scale
Challenge 2017 for
HopsFS – 1.2m ops/sec
World’s First Distributed
Filesystem to store small files in
metadata on NVMe disks
World’s first open-
source Feature Store
for Machine Learning
2019
“If you’re working with big data and Hadoop, this one paper could repay your investment
in the Morning Paper many times over.... HopFS is a huge win.”
Adrian Colyer, The Morning Paper
World’s first Hadoop
platform to support
GPUs-as-a-Resource
©2018 Logical Clocks AB. All Rights Reserved
Hopsworks – Batch, Streaming, Deep Learning
Data
Sources
HopsFS
Kafka
Airflow
Spark /
Flink
Spark
Feature
Store
Hive
Deep
Learning
BI Tools &
Reporting
Notebooks
Serving w/
Kubernetes
Hopsworks
On-Premise, AWS, Azure, GCE
Elastic
External
Service
Hopsworks
Service
©2018 Logical Clocks AB. All Rights Reserved
Data
Sources
HopsFS
Kafka
Airflow
Spark /
Flink
Spark
Feature
Store
Hive
Deep
Learning
BI Tools &
Reporting
Notebooks
Serving w/
Kubernetes
Hopsworks
On-Premise, AWS, Azure, GCE
Elastic
External
Service
Hopsworks
Service
BATCH ANALYTICS
STREAMING
ML & DEEP LEARNING
Hopsworks – Batch, Streaming, Deep Learning
©2018 Logical Clocks AB. All Rights Reserved
19
Hopsworks: Multi-Tenancy with Projects
Proj-42 Proj-X
Shared TopicFeatureStore /Projs/My/Data
Proj-AllCompanyDB
Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017
TLS Certificates
• User certs
• Application certs
• Service certs
Models
©2018 Logical Clocks AB. All Rights Reserved
20
Distributed Deep Learning in Hopsworks
Executor 1 Executor N
Driver
conda_env
conda_env conda_env
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Replicated Conda Environments
•Every project can create its own conda environment,
replicated at all hosts in the cluster
-Base environments for Python2 and Python3 mostly adequate
•Hopsworks ensures consistent conda command log
replication to all hosts in the cluster using a local agent
21
Hopsworks
conda commands
Kagent
envs
Kagent
envs
Host A Host B
©2018 Logical Clocks AB. All Rights Reserved
ML Infrastructure in Hopsworks
22
MODEL TRAINING
Feature
Store
HopsML API
& Airflow
[Diagram adapted from “technical debt of machine learning”]
Feature Store Concepts
23
•The Feature Store API: For
writing/reading to/from the feature store
•The Feature Registry: A user interface
to share and discover features
•The Metadata Layer: For storing
feature metadata (versioning, feature
analysis, documentation, jobs)
•The Feature Engineering Jobs: For
computing features
•The Storage Layer: For storing feature
data in the feature store
Building Blocks of a Feature Store
24
Feature Storage
Feature Metadata Jobs
Feature Registry API
©2018 Logical Clocks AB. All Rights Reserved
25
Reading from the Feature Store (Data Scientist)
from hops import featurestore
raw_data = spark.read.parquet(filename)
polynomial_features = raw_data.map(lambda x: x^2)
featurestore.insert_into_featuregroup(polynomial_features,
"polynomial_featuregroup")
from hops import featurestore
df = featurestore.get_features([
"average_attendance", "average_player_age“])
df.create_training_dataset(df, “players_td”)
Writing to the Feature Store (Data Engineer)
Scala API also
available
tfrecords, numpy, petastorm, hdf5, csv
Feature Storage
26
Parquet
HDFS
GCS Storage
Feature Metadata
27
Parquet
HDFS
GCS Storage
HopsML Feature Store Pipelines
28
©2018 Logical Clocks AB. All Rights Reserved
Raw Data
Event Data
Monitor
HopsFS
Feature
Store Serving
StorePre-ProcessIngest DeployExperiment/Train
Airflow
logs
logs
©2018 Logical Clocks AB. All Rights Reserved
Model Serving and Monitoring
30
Hopsworks
Inference
Request Response
1. Access Control
Model Serving Images
Model
Server
Kubernetes
Data Lake
Monitor
2. Log Prediction/Result
Link Predictions with Outcomes to measure Model Performance
Feature Store Demo
31
Summary and Roadmap
•Hopsworks is a new Data Platform with first-class support
for Python / Deep Learning / ML / Data Governance / GPUs
-Hopsworks has an open-source Feature Store
•Ongoing Work
-Online Feature Store
-Feature Transformation Library/DSK
-Automated Data Provenance
-Feature Store Incremental Updates with Hudi on Hive
32/36
©2018 Logical Clocks AB. All Rights Reserved
33
Upcoming Hopsworks Events in the Bay Area:
-April 1st at Stanford, SysML
-April 23rd – Hopsworks Hands-on in Palo Alto
-April 25th in Moscone Center, Databricks Spark/AI Summit
Read More:
http://www.logicalclocks.com/feature-store by Kim Hammar
©2018 Logical Clocks AB. All Rights Reserved
34
@logicalclocks
www.logicalclocks.com
Try it Out!
1. Register for an account at: www.hops.site
2. Enter your Firstname/lastname here: https://bit.ly/2UEixTr

More Related Content

PDF
Making Apache Spark Better with Delta Lake
PPTX
Feature store: Solving anti-patterns in ML-systems
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
PDF
Introduction to DataFusion An Embeddable Query Engine Written in Rust
PDF
The delta architecture
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Making Apache Spark Better with Delta Lake
Feature store: Solving anti-patterns in ML-systems
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Introduction to DataFusion An Embeddable Query Engine Written in Rust
The delta architecture
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Apache Iceberg - A Table Format for Hige Analytic Datasets

What's hot (20)

PDF
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
PDF
SeaweedFS introduction
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PPTX
Frame - Feature Management for Productive Machine Learning
PPT
Parquet overview
PDF
Enabling Vectorized Engine in Apache Spark
PDF
Managed Feature Store for Machine Learning
PDF
Apache Hudi: The Path Forward
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
PDF
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
PPTX
Apache Flink and what it is used for
PDF
Hyperspace for Delta Lake
PPTX
Hashicorp Vault Open Source vs Enterprise
PDF
The automation challenge: Kubernetes Operators vs Helm Charts
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
PDF
PDF
Kappa vs Lambda Architectures and Technology Comparison
PDF
Adf presentation
PDF
Intro to Telegraf
PDF
Apache Camel v3, Camel K and Camel Quarkus
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
SeaweedFS introduction
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Frame - Feature Management for Productive Machine Learning
Parquet overview
Enabling Vectorized Engine in Apache Spark
Managed Feature Store for Machine Learning
Apache Hudi: The Path Forward
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Apache Flink and what it is used for
Hyperspace for Delta Lake
Hashicorp Vault Open Source vs Enterprise
The automation challenge: Kubernetes Operators vs Helm Charts
Building robust CDC pipeline with Apache Hudi and Debezium
Kappa vs Lambda Architectures and Technology Comparison
Adf presentation
Intro to Telegraf
Apache Camel v3, Camel K and Camel Quarkus
Ad

Similar to The Feature Store in Hopsworks (20)

PDF
Jfokus 2019-dowling-logical-clocks
PDF
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PDF
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
PDF
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
PDF
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
PDF
Data Science with the Help of Metadata
PDF
Streaming Solutions for Real time problems
PDF
Hopsworks in the cloud Berlin Buzzwords 2019
PPTX
Automating Big Data with the Automic Hadoop Agent
PDF
Hamburg Data Science Meetup - MLOps with a Feature Store
PDF
Enterprise Data Lakes
PDF
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
PDF
Hopsworks - ExtremeEarth Open Workshop
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
PDF
Hopsworks Feature Store 2.0 a new paradigm
PDF
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
PDF
Building a Feature Store around Dataframes and Apache Spark
PDF
Hopsworks at Google AI Huddle, Sunnyvale
PDF
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jfokus 2019-dowling-logical-clocks
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Data Science with the Help of Metadata
Streaming Solutions for Real time problems
Hopsworks in the cloud Berlin Buzzwords 2019
Automating Big Data with the Automic Hadoop Agent
Hamburg Data Science Meetup - MLOps with a Feature Store
Enterprise Data Lakes
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
Hopsworks - ExtremeEarth Open Workshop
Metadata and Provenance for ML Pipelines with Hopsworks
Hopsworks Feature Store 2.0 a new paradigm
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Building a Feature Store around Dataframes and Apache Spark
Hopsworks at Google AI Huddle, Sunnyvale
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Ad

More from Jim Dowling (20)

PDF
ARVC and flecainide case report[EI] Jim.docx.pdf
PDF
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PDF
Serverless ML Workshop with Hopsworks at PyData Seattle
PDF
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PDF
_Python Ireland Meetup - Serverless ML - Dowling.pdf
PDF
Building Hopsworks, a cloud-native managed feature store for machine learning
PDF
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
PDF
Ml ops and the feature store with hopsworks, DC Data Science Meetup
PDF
Hops fs huawei internal conference july 2021
PDF
Hopsworks MLOps World talk june 21
PDF
GANs for Anti Money Laundering
PDF
Berlin buzzwords 2020-feature-store-dowling
PDF
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
PDF
Hopsworks data engineering melbourne april 2020
PDF
The Bitter Lesson of ML Pipelines
PDF
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
PDF
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
PDF
Berlin buzzwords 2018 TensorFlow on Hops
PPTX
All AI Roads lead to Distribution - Dot AI
PDF
Distributed TensorFlow on Hops (Papis London, April 2018)
ARVC and flecainide case report[EI] Jim.docx.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
Serverless ML Workshop with Hopsworks at PyData Seattle
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Building Hopsworks, a cloud-native managed feature store for machine learning
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Hops fs huawei internal conference july 2021
Hopsworks MLOps World talk june 21
GANs for Anti Money Laundering
Berlin buzzwords 2020-feature-store-dowling
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Hopsworks data engineering melbourne april 2020
The Bitter Lesson of ML Pipelines
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
Berlin buzzwords 2018 TensorFlow on Hops
All AI Roads lead to Distribution - Dot AI
Distributed TensorFlow on Hops (Papis London, April 2018)

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Big Data Technologies - Introduction.pptx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Modernizing your data center with Dell and AMD
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Big Data Technologies - Introduction.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
Modernizing your data center with Dell and AMD
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
NewMind AI Monthly Chronicles - July 2025
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding

The Feature Store in Hopsworks

  • 1. Introducing the Feature Store in Hopsworks Bay Area AI Meetup @ Mesosphere March 5th, 2019 jim_dowling CEO @ Logical Clocks Assoc Prof @ KTH
  • 2. Today’s Agenda 1. What is a Feature Store and why do you need one? 2. The Hopsworks’ Feature Store 3. Demo 2
  • 3. Become a Data Scientist! 3 Eureka! This will give a 12% increase in the efficiency of this wind farm!
  • 4. Data Scientists are not Data Engineers 4 HDFSGCS Storage CosmosDB How do I find features in this sea of data sources? This tastes like dairy in my Latte!
  • 5. What is a Feature? A measurable property of a phenomena under observation •A raw word, a pixel, a sensor value a feature •A column in a datastore •An aggregate (mean, max, sum, min) •A derived representation (embedding or cluster) 5
  • 6. ©2018 Logical Clocks AB. All Rights Reserved 6 A More Complex Feature Pipeline
  • 7. Data Science with the Feature Store 7 HDFSGCS Storage CosmosDB Feature Warehouse Store Feature Pipelines (Select, Transform, Aggregate, ..) Now, I can change the world - one click- through at a time.
  • 8. Features need to be first-class entities •Features should be discoverable and reused. •Features should be access controlled, versioned, and governed. - Enable reproducibility. •Ability to pre-compute and automatically backfill features. - Aggregates, embeddings - avoid expensive re-computation. - On-demand computation of features should also be possible. •The Feature Store should help “solve the data problem, so that Data Scientists don’t have to.” [uber] 8
  • 9. Hopsworks’ Feature Store - Reusability of features between models and teams - Automatic backfilling of features - Automatic feature documentation and analysis - Feature versioning - Standardized access of features between training and serving - Feature discovery - Access control for Feature Stores 9
  • 10. There are other advantages to the Feature Store … 10
  • 11. Just select and type text. Use control handle to adjust line spacing. Bert Features Bert Features Bert Features Marketing Research Analytics Prevent Duplicated Feature Engineering 11 DUPLICATED
  • 12. Prevent Inconsistent Features– Training/Serving 12 Feature implementations may not be consistent – correctness problems!
  • 13. Known Feature Stores in Production •Logical Clocks – Hopsworks (open source) •Uber Michelangelo •Airbnb – Bighead/Zipline •Comcast •Twitter •GO-JEK Feast (open source on GCE) 13
  • 14. The API Between Data Science and Data Engineering 14 Data Engineer Data Scientist
  • 15. A Feature Store for Hopsworks 15
  • 16. ©2018 Logical Clocks AB. All Rights Reserved Short History of the Hops Project 16 3/5/2019 2017 2018 Publish world’s fastest HDFS (HopsFS) at USENIX FAST with Spotify Winner of IEEE Scale Challenge 2017 for HopsFS – 1.2m ops/sec World’s First Distributed Filesystem to store small files in metadata on NVMe disks World’s first open- source Feature Store for Machine Learning 2019 “If you’re working with big data and Hadoop, this one paper could repay your investment in the Morning Paper many times over.... HopFS is a huge win.” Adrian Colyer, The Morning Paper World’s first Hadoop platform to support GPUs-as-a-Resource
  • 17. ©2018 Logical Clocks AB. All Rights Reserved Hopsworks – Batch, Streaming, Deep Learning Data Sources HopsFS Kafka Airflow Spark / Flink Spark Feature Store Hive Deep Learning BI Tools & Reporting Notebooks Serving w/ Kubernetes Hopsworks On-Premise, AWS, Azure, GCE Elastic External Service Hopsworks Service
  • 18. ©2018 Logical Clocks AB. All Rights Reserved Data Sources HopsFS Kafka Airflow Spark / Flink Spark Feature Store Hive Deep Learning BI Tools & Reporting Notebooks Serving w/ Kubernetes Hopsworks On-Premise, AWS, Azure, GCE Elastic External Service Hopsworks Service BATCH ANALYTICS STREAMING ML & DEEP LEARNING Hopsworks – Batch, Streaming, Deep Learning
  • 19. ©2018 Logical Clocks AB. All Rights Reserved 19 Hopsworks: Multi-Tenancy with Projects Proj-42 Proj-X Shared TopicFeatureStore /Projs/My/Data Proj-AllCompanyDB Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017 TLS Certificates • User certs • Application certs • Service certs Models
  • 20. ©2018 Logical Clocks AB. All Rights Reserved 20 Distributed Deep Learning in Hopsworks Executor 1 Executor N Driver conda_env conda_env conda_env HopsFS (HDFS) TensorBoard ModelsExperiments Training Data Logs
  • 21. Replicated Conda Environments •Every project can create its own conda environment, replicated at all hosts in the cluster -Base environments for Python2 and Python3 mostly adequate •Hopsworks ensures consistent conda command log replication to all hosts in the cluster using a local agent 21 Hopsworks conda commands Kagent envs Kagent envs Host A Host B
  • 22. ©2018 Logical Clocks AB. All Rights Reserved ML Infrastructure in Hopsworks 22 MODEL TRAINING Feature Store HopsML API & Airflow [Diagram adapted from “technical debt of machine learning”]
  • 24. •The Feature Store API: For writing/reading to/from the feature store •The Feature Registry: A user interface to share and discover features •The Metadata Layer: For storing feature metadata (versioning, feature analysis, documentation, jobs) •The Feature Engineering Jobs: For computing features •The Storage Layer: For storing feature data in the feature store Building Blocks of a Feature Store 24 Feature Storage Feature Metadata Jobs Feature Registry API
  • 25. ©2018 Logical Clocks AB. All Rights Reserved 25 Reading from the Feature Store (Data Scientist) from hops import featurestore raw_data = spark.read.parquet(filename) polynomial_features = raw_data.map(lambda x: x^2) featurestore.insert_into_featuregroup(polynomial_features, "polynomial_featuregroup") from hops import featurestore df = featurestore.get_features([ "average_attendance", "average_player_age“]) df.create_training_dataset(df, “players_td”) Writing to the Feature Store (Data Engineer) Scala API also available tfrecords, numpy, petastorm, hdf5, csv
  • 28. HopsML Feature Store Pipelines 28
  • 29. ©2018 Logical Clocks AB. All Rights Reserved Raw Data Event Data Monitor HopsFS Feature Store Serving StorePre-ProcessIngest DeployExperiment/Train Airflow logs logs
  • 30. ©2018 Logical Clocks AB. All Rights Reserved Model Serving and Monitoring 30 Hopsworks Inference Request Response 1. Access Control Model Serving Images Model Server Kubernetes Data Lake Monitor 2. Log Prediction/Result Link Predictions with Outcomes to measure Model Performance
  • 32. Summary and Roadmap •Hopsworks is a new Data Platform with first-class support for Python / Deep Learning / ML / Data Governance / GPUs -Hopsworks has an open-source Feature Store •Ongoing Work -Online Feature Store -Feature Transformation Library/DSK -Automated Data Provenance -Feature Store Incremental Updates with Hudi on Hive 32/36
  • 33. ©2018 Logical Clocks AB. All Rights Reserved 33 Upcoming Hopsworks Events in the Bay Area: -April 1st at Stanford, SysML -April 23rd – Hopsworks Hands-on in Palo Alto -April 25th in Moscone Center, Databricks Spark/AI Summit Read More: http://www.logicalclocks.com/feature-store by Kim Hammar
  • 34. ©2018 Logical Clocks AB. All Rights Reserved 34 @logicalclocks www.logicalclocks.com Try it Out! 1. Register for an account at: www.hops.site 2. Enter your Firstname/lastname here: https://bit.ly/2UEixTr