SlideShare a Scribd company logo
How Mobile.de brings Data
Science to Production for a
Personalized Web Experience
Dr. Markus Schüler & Dr. Florian Wilhelm
2018-07-08, PyData 2018, Berlin
2
Introduction
@FlorianWilhelm
FlorianWilhelm
florianwilhelm.info
Dr. Florian Wilhelm
Data Scientist
inovex GmbH
Dr. Markus Schüler
Data Scientist & Team Lead
mobile.de GmbH
3
Agenda
• General Introduction
• Personalization Use Cases at mobile.de
• Predicting Car Buying Intent
• Python for Big Data Processing
• Optimizing Performance
4
5
MOBILE.DE
GERMAN MARKET
LEADER
13.5 MIO
UNIQUE USER
PER MONTH
1.6 MIO
VEHICLES
290
EMPLOYEES
DREILINDEN /
FRIEDRICHSHAIN
BERLIN
HEADQUARTERS
Part of
ebay Tech
6
IT-project house for digital transformation:
‣ Agile Development & Management
‣ Web · UI/UX · Replatforming · Microservices
‣ Mobile · Apps · Smart Devices · Robotics
‣ Big Data & Business Intelligence Platforms
‣ Data Science · Data Products · Search · Deep Learning
‣ Data Center Automation · DevOps · Cloud · Hosting
‣ Trainings & Coachings
Using technology to inspire our
clients. And ourselves.
inovex offices in
Karlsruhe · Cologne · Munich ·
Pforzheim · Hamburg · Stuttgart.
www.inovex.de
7
Why Recommendations?Why Personalization?
Inspiration
Engagement
Memory of past interactions
You are unique!
8
Why Personalization?
Data-Driven
Personalization
Improves:
User
Experience
User
Engagement
Source: https://www.kleinerperkins.com/perspectives/internet-trends-report-2018
9
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
10
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Marketing
Last Action: Yesterday
Frequent User
User 12345
User Preferences based on User’s interactions
User Car Preference Example
User Preferences
Anonymous
11
Uncertainty Quantification
Number of
user events
Impact of prior
(avg. user)
User profile
à
Posterior User Profile
+
Posterior probability∝Likelihood×Prior probability
Bayesian Approach
30% Volkswagen25% gray 50% automatic8% SUV10,000 €
Prior based on all users
User Preferences
Posterior User Preferences
Impact of Prior
(avg. user)
Number of
user events
12
Recommendation
All Listings
Content-based Information
(User Preferences)
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Collaborative Information
P
P P
P
P
Mobile.de Recommendation Engine
Features of vehicle
13
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
14
Different User Intents
“I have no idea about
cars. I need basic
information and
guidance.”
“I’m a car expert.
Lead me to the
best deals in the
fastest way.”
“I love to browse
expensive cars,
yet I have
no buying intent.”
“As a dealer, I need
detailed data to
compare my own
listings with my
competitor’s”
15
Events of a Car Buying Journey
contacts
parkings
views
16
control buyers
events total 72,621,069 2,500,771
median events 153 188
median days active 22 15
Analysing events of car buyers
17
User Events: Event counts
0.0 0.2 0.4 0.6 0.8 1.0
0.000.050.100.150.200.25
Event count over user journey
contact
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.815e−22 ***
Control intercept diff p = 9.823e−02 .
Control slope diff p = 9.956e−04 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
Event count over user journey
parking
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 7.999e−06 ***
Control intercept diff p = 1.399e−21 ***
Control slope diff p = 6.702e−06 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
051015202530
Event count over user journey
search
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 6.694e−51 ***
Control intercept diff p = 1.141e−01
Control slope diff p = 9.044e−07 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0510152025
Event count over user journey
view
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.824e−08 ***
Control intercept diff p = 2.506e−45 ***
Control slope diff p = 2.824e−02 *
local mean
linear model
lowess
contactparking
viewsearch
18
User Events: Duplicated views
0.2
0.4
0.6
0.0 0.2 0.4 0.6 0.8 1.0
Position in user journey
• Buyers look
more often at
cars they have
seen already
than the control
group and their
ratio increases
faster (both
significant)
Amountofduplicatedviews
Buyer
Control
19
When did buyers interact with the car they bought?
§ Buyers view
“their” car the
most 4/5th
along their user
journey
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
When do buyers view the car they buy?
Position in user journey
%ofusers
0
5
10
15
Position in user journey
%ofusers
20
ML Model: How close to buy?
§ Aim: predict how likely
a user is to make his
buying decision today
§ Personalization
§ Highlight dealer contact
details
§ Provide car buying
assistance
21
Feature Generation
Features:
§ Event counts (view, search, contact, parking)
§ % event of all events (like %views among all event)
§ a=Number of active days, b=Max-diff active days, a/b
§ Additional features:
§ Views/(Search+View)
§ % of duplicated views among all views
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
ratio
22
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
23
Window size optimization
§ Used window size and number as optimization criterion
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
0 days1-9 days10-30 days
0 days1-7 days8-30 days
0-9 days10-19 days20-30 days
0 days1-4 days10-30 days 5-9 days
0 days1-7 days8-30 days
24
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
§ Cross-Validation (15 fold, 70/30 train/test split)
25
closeToBuy_now_0−1−10−30_cid
closeToBuy_now_0−1−7−30_cid
loseToBuy_now_0−10−20−30_cid
closeToBuy_now_0−3−10−30_cid
closeToBuy_now_0−5−10−30_cid
Modelling statistics: closeToBuy_now_cid
0.65
0.70
0.75
0.80 Accuracy Sensitivity Specificity
Results
Prediction: The user made his buying decision today
Best Model:
72% Accuracy / 68% Sensitivity / 76% Specificity
Model1
Model2
Model3
Model4
Model5
26
Buys tomorrow, next week, next two weeks
0%
10%
20%
30%
40%
50%
60%
70%
80%
Buy Today Buy Tomorrow Buy in a Week Buy in two Weeks
Accuracy Sensitivity Specificity
Considerable
lower predictive
power when
predicting more
distant future
events
Still room for
improvement
27
Python & Big Data
BIG
DATA
28
Hive for heavy lifting
• Apache project
• built on top of Hadoop
• SQL interface to your data
• basically map&reduce abstraction layer
• robust and matured
• but slow and surely not “interactive”
Data Team:
• used for batch-processing of user preferences,
user-segmentation etc.
• PyHive by Dropbox for Python support
• usage of Python-based UD(A)Fs
29
User Defined Functions (UDFs)
User defined (aggregation) functions:
§ needed when native functions aren‘t sufficient
§ are always much slower than native functions
§ work on a column or multiple (grouped) columns
§ are vector-valued operations and/or aggregations
transform aggregate apply
30
fast and general engine for
large-scale data processing
PySpark for fast analysis and machine learning
+ =
pyspark
31
Conversion Example of User Preferences
Hive:
• 2483 lines of code
• Jinja2 to generate SQL queries
• Temporary tables for performance
• Runtime 5-10h
• Logic hard to understand at times
Spark:
• 1745 lines of code
• programatic definition of queries
• No temporary tables needed
• Runtime 1-2 h
• Quite easy to understand
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
32
How Spark works
e.g. Jupyter lab
Source: Spark documentation
33
How do Python UD(A)Fs work?
Source: Spark documentation 7
34
Apache Arrow
Source: Arrow documentation
35
PySpark & Pandas
Vectorized UDFs for Spark 2.3:
§build on top of Apache Arrow,
§avoid high serialization and invocation overhead,
§allows row-at-a-time UFDs and cumulative UDAFs
§as flexible as Pandas` apply
Source: databricks blog
36
Performance gains
Source: https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
37
But what if Spark < 2.3?
It‘s possible to write flexible UD(A)Fs by
•using RDD functionality, df.rdd.mapPartitions(my_func)
•convert low-level Row objects to Pandas dataframe
•wrap everything into a nice decorator
Detailed information under:
https://www.inovex.de/blog/efficient-udafs-with-pyspark/
38
Isolated environments with PySpark
39
Concept
§ create a local environment based on wheels,
§ upload unpacked wheels with to HDFS,
§ read and distribute these Python packages from the Spark
driver to the executores with sc.addFile,
§ use the packages on the executors, e.g. in a UDF.
Detailed information under:
https://www.inovex.de/blog/managing-isolated-environments-with-pyspark/
40
Architecture
41
Summary
PyData Stack
Interesting & Challenging Use Cases
Data Science
Data Engineering
Business Impact
42
Any Questions?
How mobile.de brings Data Science to Production for a Personalized Web Experience

More Related Content

PPTX
Data mining to improve e-mail marketing
PDF
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
PDF
Which car fits my life? Mobile.de’s approach to recommendations
PDF
Used BMW in Delhi
PPTX
Internship report
PDF
Which car fits my life? - PyData Berlin 2017
PDF
Maximize Sales to the Digital Shopper
PDF
Google query path details
Data mining to improve e-mail marketing
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Which car fits my life? Mobile.de’s approach to recommendations
Used BMW in Delhi
Internship report
Which car fits my life? - PyData Berlin 2017
Maximize Sales to the Digital Shopper
Google query path details

Similar to How mobile.de brings Data Science to Production for a Personalized Web Experience (20)

PPTX
Promotions, Events and Survey on Individual Perception on Maruti Suzuki
PPTX
Impact of Digital Media on Vehicle Buying Decision Making Process
PDF
Car buying landscape: the road ahead by Quora
PDF
Used BMW
PDF
Automotive industry research
PPTX
Bob Schullman
PDF
Crowning the King of Luxury Cars - Social Media and the Automotive Industry
PDF
Understanding 2010 automotive shoppers
PPSX
Force motors Report presentation
PPTX
Market Analysis of Consumer Durables(Automobile Industry)
PDF
BMW Social Listening & Digital Analysis 2019
PDF
BMW Social Listening & Digital Analysis
PDF
BMW is the most social automobile brand in India
PPTX
Car pricing prediction ppt.pptx ppt car study
PPTX
Consumer buying behaviour
PPTX
Scott Pechstein – Why buying 3rd party is still the most efficient advertisement
PDF
Driving Through The Consumer’s Mind: Steps In The Buying Process
PDF
Social Leaderboard_Indian luxury car brands_14 September 2012
PPTX
Customers Decision Making Process for Buying a Four Wheeler Passenger Vehicle
Promotions, Events and Survey on Individual Perception on Maruti Suzuki
Impact of Digital Media on Vehicle Buying Decision Making Process
Car buying landscape: the road ahead by Quora
Used BMW
Automotive industry research
Bob Schullman
Crowning the King of Luxury Cars - Social Media and the Automotive Industry
Understanding 2010 automotive shoppers
Force motors Report presentation
Market Analysis of Consumer Durables(Automobile Industry)
BMW Social Listening & Digital Analysis 2019
BMW Social Listening & Digital Analysis
BMW is the most social automobile brand in India
Car pricing prediction ppt.pptx ppt car study
Consumer buying behaviour
Scott Pechstein – Why buying 3rd party is still the most efficient advertisement
Driving Through The Consumer’s Mind: Steps In The Buying Process
Social Leaderboard_Indian luxury car brands_14 September 2012
Customers Decision Making Process for Buying a Four Wheeler Passenger Vehicle
Ad

More from Florian Wilhelm (16)

PDF
Why Exceptions are just sophisticated GoTos ... and How to Move Beyond
PDF
Vodafone Mathematical Modelling 2024.pdf
PDF
Streamlining Python Development: A Guide to a Modern Project Setup
PDF
Unlocking the Power of Integer Programming
PDF
WALD: A Modern & Sustainable Analytics Stack
PDF
Forget about AI and do Mathematical Modelling instead!
PDF
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
PDF
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
PDF
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
PDF
Uncertainty Quantification in AI
PDF
Performance evaluation of GANs in a semisupervised OCR use case
PDF
Bridging the Gap: from Data Science to Production
PDF
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
PDF
Declarative Thinking and Programming
PDF
PyData Meetup Berlin 2017-04-19
PDF
Explaining the idea behind automatic relevance determination and bayesian int...
Why Exceptions are just sophisticated GoTos ... and How to Move Beyond
Vodafone Mathematical Modelling 2024.pdf
Streamlining Python Development: A Guide to a Modern Project Setup
Unlocking the Power of Integer Programming
WALD: A Modern & Sustainable Analytics Stack
Forget about AI and do Mathematical Modelling instead!
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Uncertainty Quantification in AI
Performance evaluation of GANs in a semisupervised OCR use case
Bridging the Gap: from Data Science to Production
Deep Learning-based Recommendations for Germany's Biggest Vehicle Marketplace
Declarative Thinking and Programming
PyData Meetup Berlin 2017-04-19
Explaining the idea behind automatic relevance determination and bayesian int...
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Chapter 3 Spatial Domain Image Processing.pdf

How mobile.de brings Data Science to Production for a Personalized Web Experience

  • 1. How Mobile.de brings Data Science to Production for a Personalized Web Experience Dr. Markus Schüler & Dr. Florian Wilhelm 2018-07-08, PyData 2018, Berlin
  • 2. 2 Introduction @FlorianWilhelm FlorianWilhelm florianwilhelm.info Dr. Florian Wilhelm Data Scientist inovex GmbH Dr. Markus Schüler Data Scientist & Team Lead mobile.de GmbH
  • 3. 3 Agenda • General Introduction • Personalization Use Cases at mobile.de • Predicting Car Buying Intent • Python for Big Data Processing • Optimizing Performance
  • 4. 4
  • 5. 5 MOBILE.DE GERMAN MARKET LEADER 13.5 MIO UNIQUE USER PER MONTH 1.6 MIO VEHICLES 290 EMPLOYEES DREILINDEN / FRIEDRICHSHAIN BERLIN HEADQUARTERS Part of ebay Tech
  • 6. 6 IT-project house for digital transformation: ‣ Agile Development & Management ‣ Web · UI/UX · Replatforming · Microservices ‣ Mobile · Apps · Smart Devices · Robotics ‣ Big Data & Business Intelligence Platforms ‣ Data Science · Data Products · Search · Deep Learning ‣ Data Center Automation · DevOps · Cloud · Hosting ‣ Trainings & Coachings Using technology to inspire our clients. And ourselves. inovex offices in Karlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart. www.inovex.de
  • 9. 9 Personalization at mobile.de User Event Tracking & Storage Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily preference profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily activity profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Recommendations Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Segmentation User Car Preferences User Interactions
  • 10. 10 Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Marketing Last Action: Yesterday Frequent User User 12345 User Preferences based on User’s interactions User Car Preference Example User Preferences Anonymous
  • 11. 11 Uncertainty Quantification Number of user events Impact of prior (avg. user) User profile à Posterior User Profile + Posterior probability∝Likelihood×Prior probability Bayesian Approach 30% Volkswagen25% gray 50% automatic8% SUV10,000 € Prior based on all users User Preferences Posterior User Preferences Impact of Prior (avg. user) Number of user events
  • 12. 12 Recommendation All Listings Content-based Information (User Preferences) Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Collaborative Information P P P P P Mobile.de Recommendation Engine Features of vehicle
  • 13. 13 Personalization at mobile.de User Event Tracking & Storage Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily preference profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily activity profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Recommendations Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Segmentation User Car Preferences User Interactions
  • 14. 14 Different User Intents “I have no idea about cars. I need basic information and guidance.” “I’m a car expert. Lead me to the best deals in the fastest way.” “I love to browse expensive cars, yet I have no buying intent.” “As a dealer, I need detailed data to compare my own listings with my competitor’s”
  • 15. 15 Events of a Car Buying Journey contacts parkings views
  • 16. 16 control buyers events total 72,621,069 2,500,771 median events 153 188 median days active 22 15 Analysing events of car buyers
  • 17. 17 User Events: Event counts 0.0 0.2 0.4 0.6 0.8 1.0 0.000.050.100.150.200.25 Event count over user journey contact Position in user journey Averagecount Buyer Control Buyer slope p = 1.815e−22 *** Control intercept diff p = 9.823e−02 . Control slope diff p = 9.956e−04 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.0 Event count over user journey parking Position in user journey Averagecount Buyer Control Buyer slope p = 7.999e−06 *** Control intercept diff p = 1.399e−21 *** Control slope diff p = 6.702e−06 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 051015202530 Event count over user journey search Position in user journey Averagecount Buyer Control Buyer slope p = 6.694e−51 *** Control intercept diff p = 1.141e−01 Control slope diff p = 9.044e−07 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 0510152025 Event count over user journey view Position in user journey Averagecount Buyer Control Buyer slope p = 1.824e−08 *** Control intercept diff p = 2.506e−45 *** Control slope diff p = 2.824e−02 * local mean linear model lowess contactparking viewsearch
  • 18. 18 User Events: Duplicated views 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.8 1.0 Position in user journey • Buyers look more often at cars they have seen already than the control group and their ratio increases faster (both significant) Amountofduplicatedviews Buyer Control
  • 19. 19 When did buyers interact with the car they bought? § Buyers view “their” car the most 4/5th along their user journey 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% When do buyers view the car they buy? Position in user journey %ofusers 0 5 10 15 Position in user journey %ofusers
  • 20. 20 ML Model: How close to buy? § Aim: predict how likely a user is to make his buying decision today § Personalization § Highlight dealer contact details § Provide car buying assistance
  • 21. 21 Feature Generation Features: § Event counts (view, search, contact, parking) § % event of all events (like %views among all event) § a=Number of active days, b=Max-diff active days, a/b § Additional features: § Views/(Search+View) § % of duplicated views among all views Buying date (=0) 30 days 0-2 days3-9 days10-30 days ratio
  • 22. 22 Modelling § Logistic Regression § Automatic Feature Selection § start from different sub-selections of features (like “all”, “no ratios”, etc.) § allow addition and subtraction of features based on maximizing AIC § needed to prevent overfitting § Window optimization
  • 23. 23 Window size optimization § Used window size and number as optimization criterion Buying date (=0) 30 days 0-2 days3-9 days10-30 days 0 days1-9 days10-30 days 0 days1-7 days8-30 days 0-9 days10-19 days20-30 days 0 days1-4 days10-30 days 5-9 days 0 days1-7 days8-30 days
  • 24. 24 Modelling § Logistic Regression § Automatic Feature Selection § start from different sub-selections of features (like “all”, “no ratios”, etc.) § allow addition and subtraction of features based on maximizing AIC § needed to prevent overfitting § Window optimization § Cross-Validation (15 fold, 70/30 train/test split)
  • 25. 25 closeToBuy_now_0−1−10−30_cid closeToBuy_now_0−1−7−30_cid loseToBuy_now_0−10−20−30_cid closeToBuy_now_0−3−10−30_cid closeToBuy_now_0−5−10−30_cid Modelling statistics: closeToBuy_now_cid 0.65 0.70 0.75 0.80 Accuracy Sensitivity Specificity Results Prediction: The user made his buying decision today Best Model: 72% Accuracy / 68% Sensitivity / 76% Specificity Model1 Model2 Model3 Model4 Model5
  • 26. 26 Buys tomorrow, next week, next two weeks 0% 10% 20% 30% 40% 50% 60% 70% 80% Buy Today Buy Tomorrow Buy in a Week Buy in two Weeks Accuracy Sensitivity Specificity Considerable lower predictive power when predicting more distant future events Still room for improvement
  • 27. 27 Python & Big Data BIG DATA
  • 28. 28 Hive for heavy lifting • Apache project • built on top of Hadoop • SQL interface to your data • basically map&reduce abstraction layer • robust and matured • but slow and surely not “interactive” Data Team: • used for batch-processing of user preferences, user-segmentation etc. • PyHive by Dropbox for Python support • usage of Python-based UD(A)Fs
  • 29. 29 User Defined Functions (UDFs) User defined (aggregation) functions: § needed when native functions aren‘t sufficient § are always much slower than native functions § work on a column or multiple (grouped) columns § are vector-valued operations and/or aggregations transform aggregate apply
  • 30. 30 fast and general engine for large-scale data processing PySpark for fast analysis and machine learning + = pyspark
  • 31. 31 Conversion Example of User Preferences Hive: • 2483 lines of code • Jinja2 to generate SQL queries • Temporary tables for performance • Runtime 5-10h • Logic hard to understand at times Spark: • 1745 lines of code • programatic definition of queries • No temporary tables needed • Runtime 1-2 h • Quite easy to understand Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 %
  • 32. 32 How Spark works e.g. Jupyter lab Source: Spark documentation
  • 33. 33 How do Python UD(A)Fs work? Source: Spark documentation 7
  • 35. 35 PySpark & Pandas Vectorized UDFs for Spark 2.3: §build on top of Apache Arrow, §avoid high serialization and invocation overhead, §allows row-at-a-time UFDs and cumulative UDAFs §as flexible as Pandas` apply Source: databricks blog
  • 37. 37 But what if Spark < 2.3? It‘s possible to write flexible UD(A)Fs by •using RDD functionality, df.rdd.mapPartitions(my_func) •convert low-level Row objects to Pandas dataframe •wrap everything into a nice decorator Detailed information under: https://www.inovex.de/blog/efficient-udafs-with-pyspark/
  • 39. 39 Concept § create a local environment based on wheels, § upload unpacked wheels with to HDFS, § read and distribute these Python packages from the Spark driver to the executores with sc.addFile, § use the packages on the executors, e.g. in a UDF. Detailed information under: https://www.inovex.de/blog/managing-isolated-environments-with-pyspark/
  • 41. 41 Summary PyData Stack Interesting & Challenging Use Cases Data Science Data Engineering Business Impact