SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Kim Hammar, Logical Clocks AB KimHammar1
Jim Dowling, Logical Clocks AB jim_dowling
End-to-End ML Pipelines
with Databricks Delta and
Hopsworks Feature Store
#UnifiedDataAnalytics #SparkAISummit
Machine Learning in the Abstract
3
Where does the Data come from?
4
Where does the Data come from?
5
“Data is the hardest part of ML and the most important piece to get
right. Modelers spend most of their time selecting and transforming
features at training time and then building the pipelines to deliver
those features to production models.” [Uber on Michelangelo]
Data comes from the Feature Store
6
How do we feed the Feature Store?
7
Outline
8
1. Hopsworks
2. Databricks Delta
3. Hopsworks Feature Store
4. Demo
5. Summary
9
Datasources
Applications
API
Dashboards
Hopsworks
Apache Beam
Apache Spark Pip
Conda
Tensorflow
scikit-learn
Keras
Jupyter
Notebooks
Tensorboard
Apache Beam
Apache Spark
Apache Flink
Kubernetes
Batch Distributed
ML & DL
Model
Serving
Hopsworks
Feature Store
Kafka +
Spark
Streaming
Model
Monitoring
Orchestration in Airflow
Data Preparation
& Ingestion
Experimentation
& Model Training
Deploy
& Productionalize
Streaming
Filesystem and Metadata storage
HopsFS
10
11
12
13
14
15
Next-Gen Data Lakes
Data Lakes are starting to resemble databases:
– Apache Hudi, Delta, and Apache Iceberg add:
• ACID transactional layers on top of the data lake
• Indexes to speed up queries (data skipping)
• Incremental Ingestion (late data, delete existing records)
• Time-travel queries
16
Problems: No Incremental Updates, No rollback
on failure, No Time-Travel, No Isolation.
17
Solution: Incremental ETL with ACID
Transactions
18
Upsert & Time Travel Example
19
Upsert & Time Travel Example
20
Upsert ==Insert or Update
21
Version Data By Commits
22
Delta Lake by Databricks
• Delta Lake is a Transactional Layer that sits on
top of your Data Lake:
– ACID Transactions with Optimistic Concurrency
Control
– Log-Structured Storage
– Open Format (Parquet-based storage)
– Time-travel
23
Delta Datasets
24
Optimistic Concurrency Control
25
Optimistic Concurrency Control
26
Mutual Exclusion for Writers
27
Optimistic Retry
28
Scalable Metadata Management
29
Other Frameworks: Apache Hudi,
Apache Iceberg
• Hudi was developed by Uber for their Hadoop
Data Lake (HDFS first, then S3 support)
• Iceberg was developed by Netflix with S3 as
target storage layer
• All three frameworks (Delta, Hudi, Iceberg)
have common goals of adding ACID updates,
incremental ingestion, efficient queries.
30
Next-Gen Data Lakes Compared
31
Delta Hudi Iceberg
Incremental Ingestion Spark Spark Spark
ACID updates HDFS, S3* HDFS S3, HDFS
File Formats Parquet Avro, Parquet Parquet, ORC
Data Skipping
(File-Level Indexes)
Min-Max Stats+Z-Order
Clustering*
File-Level Max-Min
stats + Bloom Filter
File-Level
Max-Min Filtering
Concurrency Control Optimistic Optimistic Optimistic
Data Validation Expectations (coming soon) In Hopsworks N/A
Merge-on-Read No Yes (coming soon) No
Schema Evolution Yes Yes Yes
File I/O Cache Yes* No No
Cleanup Manual Automatic, Manual No
Compaction Manual Automatic No
*Databricks version only (not open-source)
32
How can a Feature Store
leverage Log-Structured Storage
(e.g., Delta or Hudi or Iceberg)?
Hopsworks Feature Store
33
Feature Mgmt Storage Access
Statistics
Online
Features
Discovery
Offline
Features
Data Scientist
Online Apps
Data Engineer
MySQL Cluster
(Metadata,
Online Features)
Apache Hive
Columnar DB
(Offline Features)
Feature Data
Ingestion
Hopsworks Feature Store
Training Data
(S3, HDFS)
Batch Apps
Discover features,
create training data,
save models,
read online/offline/on-
demand features,
historical feature values.
Models
HopsFS
JDBC
(SAS, R, etc)
Feature
CRUD
Add/remove features,
access control,
feature data validation.
Access
Control
Time Travel
Data
Validation
Pandas or
PySpark
DataFrame
External DB
Feature Defn
select ..
AWS Sagemaker and Databricks Integration
• Computation
engine (Spark)
• Incremental
ACID Ingestion
• Time-Travel
• Data Validation
• On-Demand or
Cached Features
• Online or Offline
Features
Incremental Feature Engineering with Hudi
34
Point-in-Time Correct Feature Data
35
Feature Time Travel with Hudi
and Hopsworks Feature Store
36
Demo: Hopsworks Featurestore
+ Databricks Platform
37
Summary
• Delta, Hudi, Iceberg bring Reliability, Upserts & Time-Travel to
Data Lakes
– Functionalities that are well suited for Feature Stores
• Hopsworks Feature Store builds on Hudi/Hive and is the world’s
first open-source Feature Store (released 2018)
• The Hopsworks Platform also supports End-to-End ML pipelines
using the Feature Store and Spark/Beam/Flink, Tensorflow/PyTorch,
and Airflow
38
Thank you!
470 Ramona St, Palo Alto
Kista, Stockholm
https://www.logicalclocks.com
Register for a free account at
www.hops.site
Twitter
@logicalclocks
@hopsworks
GitHub
https://github.com/logicalclocks/hopswo
rks
https://github.com/hopshadoop/hops
References
• Feature Store: the missing data layer in ML pipelines?
https://www.logicalclocks.com/feature-store/
• Python-First ML Pipelines with Hopsworks
https://hops.readthedocs.io/en/latest/hopsml/hopsML.html.
• Hopsworks white paper.
https://www.logicalclocks.com/whitepapers/hopsworks
• HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases.
https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi
• Open Source:
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
• Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso,
Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis,
Robin Andersson, Alex Ormenisan, Rasmus Toivonen, Steffen Grohsschmiedt, and Moritz
Meister
40
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Hoodie - DataEngConf 2017
PPTX
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
PDF
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
PPTX
Reshape Data Lake (as of 2020.07)
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
PDF
Deep Dive into GPU Support in Apache Spark 3.x
PDF
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Hoodie - DataEngConf 2017
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Reshape Data Lake (as of 2020.07)
Building robust CDC pipeline with Apache Hudi and Debezium
Deep Dive into GPU Support in Apache Spark 3.x
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark

What's hot (20)

PDF
Hoodie: How (And Why) We built an analytical datastore on Spark
PPTX
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
PDF
Hadoop Strata Talk - Uber, your hadoop has arrived
PDF
Spark Meetup at Uber
PPTX
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
PDF
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
PDF
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
PDF
IEEE International Conference on Data Engineering 2015
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
PDF
Managing ADLS gen2 using Apache Spark
PDF
Powering Interactive BI Analytics with Presto and Delta Lake
PDF
Acid ORC, Iceberg and Delta Lake
PPTX
Time-oriented event search. A new level of scale
PDF
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
PDF
Tachyon and Apache Spark
PDF
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
PDF
Care and Feeding of Catalyst Optimizer
PDF
Top 5 mistakes when writing Streaming applications
PDF
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Hoodie: How (And Why) We built an analytical datastore on Spark
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Hadoop Strata Talk - Uber, your hadoop has arrived
Spark Meetup at Uber
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
IEEE International Conference on Data Engineering 2015
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Managing ADLS gen2 using Apache Spark
Powering Interactive BI Analytics with Presto and Delta Lake
Acid ORC, Iceberg and Delta Lake
Time-oriented event search. A new level of scale
Introduction to Apache Amaterasu (Incubating): CD Framework For Your Big Data...
Tachyon and Apache Spark
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Care and Feeding of Catalyst Optimizer
Top 5 mistakes when writing Streaming applications
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Ad

Similar to Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6 (20)

PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
PDF
The Feature Store in Hopsworks
PPTX
Feature Store as a Data Foundation for Machine Learning
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
PDF
Managed Feature Store for Machine Learning
PDF
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
PDF
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
PDF
Berlin buzzwords 2020-feature-store-dowling
PDF
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
PDF
Building a Feature Store around Dataframes and Apache Spark
PDF
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PDF
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
PDF
Hamburg Data Science Meetup - MLOps with a Feature Store
PDF
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
PPTX
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
PDF
Hopsworks at Google AI Huddle, Sunnyvale
PPTX
Databricks Platform.pptx
PDF
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
PDF
Hopsworks data engineering melbourne april 2020
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
The Feature Store in Hopsworks
Feature Store as a Data Foundation for Machine Learning
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Metadata and Provenance for ML Pipelines with Hopsworks
Managed Feature Store for Machine Learning
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Kim Hammar - Feature Store: the missing data layer in ML pipelines? - HopsML ...
Berlin buzzwords 2020-feature-store-dowling
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Building a Feature Store around Dataframes and Apache Spark
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Hamburg Data Science Meetup - MLOps with a Feature Store
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Hopsworks at Google AI Huddle, Sunnyvale
Databricks Platform.pptx
Kim Hammar - FOSDEM 2019 Brussels - Hopsworks Feature store
Hopsworks data engineering melbourne april 2020
Ad

More from Kim Hammar (20)

PDF
Approximation in Value Space using Aggregation, with Applications to POMDPs a...
PDF
Adaptive Security Policies via Belief Aggregation and Rollout
PDF
Optimal Security Response to Network Intrusions in IT Systems
PDF
Intrusion Tolerance as a Two-Level Game - GameSec24
PDF
Intrusion Tolerance for Networked Systems through Two-Level Feedback Control
PDF
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
PDF
Automated Intrusion Response - CDIS Spring Conference 2024
PDF
Automated Security Response through Online Learning with Adaptive Con jectures
PDF
Självlärande System för Cybersäkerhet. KTH
PDF
Learning Automated Intrusion Response
PDF
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
PDF
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
PDF
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
PDF
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
PDF
Learning Optimal Intrusion Responses via Decomposition
PDF
Digital Twins for Security Automation
PDF
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
PDF
Självlärande system för cyberförsvar.
PDF
Intrusion Response through Optimal Stopping
PDF
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
Approximation in Value Space using Aggregation, with Applications to POMDPs a...
Adaptive Security Policies via Belief Aggregation and Rollout
Optimal Security Response to Network Intrusions in IT Systems
Intrusion Tolerance as a Two-Level Game - GameSec24
Intrusion Tolerance for Networked Systems through Two-Level Feedback Control
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Automated Intrusion Response - CDIS Spring Conference 2024
Automated Security Response through Online Learning with Adaptive Con jectures
Självlärande System för Cybersäkerhet. KTH
Learning Automated Intrusion Response
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Optimal Intrusion Responses via Decomposition
Digital Twins for Security Automation
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Självlärande system för cyberförsvar.
Intrusion Response through Optimal Stopping
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...

Recently uploaded (20)

PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Introduction to Business Data Analytics.
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Lecture1 pattern recognition............
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Mega Projects Data Mega Projects Data
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Computer network topology notes for revision
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Supervised vs unsupervised machine learning algorithms
climate analysis of Dhaka ,Banglades.pptx
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Business Data Analytics.
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Fluorescence-microscope_Botany_detailed content
Business Ppt On Nestle.pptx huunnnhhgfvu
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Lecture1 pattern recognition............
Business Acumen Training GuidePresentation.pptx
Mega Projects Data Mega Projects Data
Reliability_Chapter_ presentation 1221.5784
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Computer network topology notes for revision
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Foundation of Data Science unit number two notes
Data_Analytics_and_PowerBI_Presentation.pptx
Clinical guidelines as a resource for EBP(1).pdf

Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Kim Hammar, Logical Clocks AB KimHammar1 Jim Dowling, Logical Clocks AB jim_dowling End-to-End ML Pipelines with Databricks Delta and Hopsworks Feature Store #UnifiedDataAnalytics #SparkAISummit
  • 3. Machine Learning in the Abstract 3
  • 4. Where does the Data come from? 4
  • 5. Where does the Data come from? 5 “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models.” [Uber on Michelangelo]
  • 6. Data comes from the Feature Store 6
  • 7. How do we feed the Feature Store? 7
  • 8. Outline 8 1. Hopsworks 2. Databricks Delta 3. Hopsworks Feature Store 4. Demo 5. Summary
  • 9. 9 Datasources Applications API Dashboards Hopsworks Apache Beam Apache Spark Pip Conda Tensorflow scikit-learn Keras Jupyter Notebooks Tensorboard Apache Beam Apache Spark Apache Flink Kubernetes Batch Distributed ML & DL Model Serving Hopsworks Feature Store Kafka + Spark Streaming Model Monitoring Orchestration in Airflow Data Preparation & Ingestion Experimentation & Model Training Deploy & Productionalize Streaming Filesystem and Metadata storage HopsFS
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. 15
  • 16. Next-Gen Data Lakes Data Lakes are starting to resemble databases: – Apache Hudi, Delta, and Apache Iceberg add: • ACID transactional layers on top of the data lake • Indexes to speed up queries (data skipping) • Incremental Ingestion (late data, delete existing records) • Time-travel queries 16
  • 17. Problems: No Incremental Updates, No rollback on failure, No Time-Travel, No Isolation. 17
  • 18. Solution: Incremental ETL with ACID Transactions 18
  • 19. Upsert & Time Travel Example 19
  • 20. Upsert & Time Travel Example 20
  • 21. Upsert ==Insert or Update 21
  • 22. Version Data By Commits 22
  • 23. Delta Lake by Databricks • Delta Lake is a Transactional Layer that sits on top of your Data Lake: – ACID Transactions with Optimistic Concurrency Control – Log-Structured Storage – Open Format (Parquet-based storage) – Time-travel 23
  • 27. Mutual Exclusion for Writers 27
  • 30. Other Frameworks: Apache Hudi, Apache Iceberg • Hudi was developed by Uber for their Hadoop Data Lake (HDFS first, then S3 support) • Iceberg was developed by Netflix with S3 as target storage layer • All three frameworks (Delta, Hudi, Iceberg) have common goals of adding ACID updates, incremental ingestion, efficient queries. 30
  • 31. Next-Gen Data Lakes Compared 31 Delta Hudi Iceberg Incremental Ingestion Spark Spark Spark ACID updates HDFS, S3* HDFS S3, HDFS File Formats Parquet Avro, Parquet Parquet, ORC Data Skipping (File-Level Indexes) Min-Max Stats+Z-Order Clustering* File-Level Max-Min stats + Bloom Filter File-Level Max-Min Filtering Concurrency Control Optimistic Optimistic Optimistic Data Validation Expectations (coming soon) In Hopsworks N/A Merge-on-Read No Yes (coming soon) No Schema Evolution Yes Yes Yes File I/O Cache Yes* No No Cleanup Manual Automatic, Manual No Compaction Manual Automatic No *Databricks version only (not open-source)
  • 32. 32 How can a Feature Store leverage Log-Structured Storage (e.g., Delta or Hudi or Iceberg)?
  • 33. Hopsworks Feature Store 33 Feature Mgmt Storage Access Statistics Online Features Discovery Offline Features Data Scientist Online Apps Data Engineer MySQL Cluster (Metadata, Online Features) Apache Hive Columnar DB (Offline Features) Feature Data Ingestion Hopsworks Feature Store Training Data (S3, HDFS) Batch Apps Discover features, create training data, save models, read online/offline/on- demand features, historical feature values. Models HopsFS JDBC (SAS, R, etc) Feature CRUD Add/remove features, access control, feature data validation. Access Control Time Travel Data Validation Pandas or PySpark DataFrame External DB Feature Defn select .. AWS Sagemaker and Databricks Integration • Computation engine (Spark) • Incremental ACID Ingestion • Time-Travel • Data Validation • On-Demand or Cached Features • Online or Offline Features
  • 36. Feature Time Travel with Hudi and Hopsworks Feature Store 36
  • 37. Demo: Hopsworks Featurestore + Databricks Platform 37
  • 38. Summary • Delta, Hudi, Iceberg bring Reliability, Upserts & Time-Travel to Data Lakes – Functionalities that are well suited for Feature Stores • Hopsworks Feature Store builds on Hudi/Hive and is the world’s first open-source Feature Store (released 2018) • The Hopsworks Platform also supports End-to-End ML pipelines using the Feature Store and Spark/Beam/Flink, Tensorflow/PyTorch, and Airflow 38
  • 39. Thank you! 470 Ramona St, Palo Alto Kista, Stockholm https://www.logicalclocks.com Register for a free account at www.hops.site Twitter @logicalclocks @hopsworks GitHub https://github.com/logicalclocks/hopswo rks https://github.com/hopshadoop/hops
  • 40. References • Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/ • Python-First ML Pipelines with Hopsworks https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. • Hopsworks white paper. https://www.logicalclocks.com/whitepapers/hopsworks • HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi • Open Source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops • Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, Rasmus Toivonen, Steffen Grohsschmiedt, and Moritz Meister 40
  • 41. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT