SlideShare a Scribd company logo
Real Time Recommender System
with
Jan 22, 2014

Daqing Zhao, Director of Advanced Analytics

Macy’s.com
Agenda

 Big data analytics versus traditional BI

 Macy’s Advanced Analytics Team
 Our analytics projects
 Example: site recommendations using Kiji
 High level architecture
 Kiji Schema table structure
 Model deployment using Kiji
 Key benefits of Kiji and WibiData team

1
Traditional BI process
Knowledge
Discovery

Segmentation and
Predictive Modeling
Most companies
Stay in this area
Multidimensional Report

Standard Report

Schema definition, ETL into RDMS

Baseline Consulting

 Data can be accessed and analyzed only after ETL
 Schema definition may not be optimal
2
Hadoop/NoSQL: paradigm shift

Decisions

Insights

Models

Decision Agent

Segmentation
and
Predictive
Modeling

Multi
dimensional
Report

Reports

Standard
Report

Hive, Mahout, Cascading, Scalding, Kiji, …

MapReduce
Raw
data

Volume
Velocity
Variety

Write
Append
Read

Distributed
storage

Computation
near data

Hadoop, HBase, avro, …

 We can access raw data and analyze using MapReduce
 With pros and cons
3
Macy.com’s Advanced Analytics Group
 We are at the frontiers of Big Data science:
• Using Big Data technology
• Machine learning and Statistical algorithms

 We have predictive modeling, experimental design and data science
teams

 Our team members have very strong background in
• Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and cs
• We collaborate with systems and IT teams internally as well as 3rd party vendors like WibiData,
SAS Research, IBM Research…

 We use a wide range of tools
• Hadoop, SAS, R, Mahout, and others, as well as Kiji Models

 We are data scientists with keen focus on domain problems

4
Customer acquisition and retention
 Targeting the right message to the right customer at the right time
• Build predictive models of purchase behavior and identify drivers

 Site recommendation algorithms
• Recommend products based on items that are added to bag for cross- and up-sell
• We also look at market basket analysis
• Most work is in batch mode, expanding slowly into real time

 Rapid-prototyping and testing of algorithms and policies
• All done in short development cycles

 Output of the team’s work support other marketing teams to identify,
and reach best customers
• Search, display, social network, affiliates, retention, customer services, …

5
Some other projects
 Data organization or data munging
•
•
•
•
•

Data collections, individual and event level, 360 degrees, …
Segmentation of customers
Customer value, revenue, costs
Multiple channel attribution of marketing contacts
Product attributes

 Experimentation platform
• Success of online marketing depends highly on testing, learning and optimization
• Both for site layout as well as contents and recommendations

 Forecast and optimization
• Prediction, simulation, and search and optimize

 Big data refinement and scalability
• New data sources, more efficient ways of accessing data, and organizing and
processing data

6
Example: similar and complementary products

7
Example: customer segmentation

Demographic
Socio-economic
Behavioral
Values and styles
Channels
Modality

8
Example: product social network

Demographic
Style
Size

Brand
Price range
Season

9
Example: site product recommendation
 Customer Adds to Bag one or more products

 We recommend in real time similar/complementary products
• Based on product associations and customer profile

 We use various machine learning algorithms
•
•
•
•
•
•

Association rules
Collaborative filtering
Predictive modeling
Business rules
And others, …
Models built offline

 Real time data, real time model scoring and real time decision
 Champion/challenger tests, models evolve quickly in time
 Frequent model updates, add new data

10
Architecture

Real Time
Data access, Scoring
Decisions

Others
data mining
Kiji Express
environment
data mining
Mahout
environment
data mining
R
environment
SAS
Environment

products

Kiji Model
Kiji Kiji Scoring
Scoring
Kiji Kiji Rest
Rest
Kiji Kiji Rest
Rest

Hadoop
HBase

11
Kiji Schema table structure

Customer table

entity id

customer

email

metadata

order

Product table

entity id

product

category

metadata

inventory

Schema have column names and types, compared to bits stored in HBase
Group column families are structured, while Map column families are flexible
Accessible as collections from Kiji Express
Scala code focuses on model and business logic
Scalding underneath takes care of generating MapReduce jobs

12
Model Build and Deployment

Model
Model
building
Model
building
Model
building
Model
building
building

Kiji Express
Kiji Scoring
Kiji PMML
Kiji MR
Deployment

Kiji
Schema
HBase
Hadoop

Offline
Kiji Modeling
R, SAS, Mahout, …

Real time data update
Real time scoring
Real time decisions

13
Key benefits of partnership with WibiData

 Open source, Kiji suite, abstracted with focus in modeling
• Kiji Schema, KijiMR, Kiji Model, Kiji Scoring, Kiji Express, Kiji REST
• Allow quick development cycle

 Package popular open source projects
• Hadoop, HBase, Avro, Cascading, Scalding, Scala

 Better organization
• Create tables, query by field name, flexibility, …, more DB like than HBase

 WibiData professional services team help develop, integrate, maintain,
train in-house team, consult,…
• Competence, knowledge
• Support infrastructure, so that we can focus on the science

 Real time model deployment environment and scalable
• Interactive
• In milliseconds

14
Acknowledgement

 Macy’s teams

 Analytics team: Kerem Tomak, Albert Zhai
 Infrastructure team: Winslow Holmes, Rakesh Sharma, Cherry Peng

 WibiData team
 Professional Services team: Adam, Christophe, Renuka, Lynn

15

More Related Content

PDF
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
PPTX
Data Driven Decisions Google Business Group (GBG) Mumbai by @sachinuppal
PDF
Big Data LDN 2018: AGILE DATA MASTERING: THE RIGHT APPROACH FOR DATAOPS
PDF
SpeedTrack Tech Overview 2015
PPTX
Data Modeling for Security, Privacy and Data Protection
PDF
Watson Analytics Presentation
PPTX
Watson Analytic
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
Data Driven Decisions Google Business Group (GBG) Mumbai by @sachinuppal
Big Data LDN 2018: AGILE DATA MASTERING: THE RIGHT APPROACH FOR DATAOPS
SpeedTrack Tech Overview 2015
Data Modeling for Security, Privacy and Data Protection
Watson Analytics Presentation
Watson Analytic

What's hot (20)

PPTX
Predictive Analytics - Big Data Warehousing Meetup
PPTX
Watson Analytics for HSE - Copy
PDF
Evaluating Big Data Predictive Analytics Platforms
PPT
Future of Data - Big Data
PDF
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
PPT
Ecr presentation ss chain - jeffrey - final
PDF
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
PPTX
Data analytics
PPTX
Personalized Search at Sandia National Labs
PPTX
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
PPTX
Analytics & Data Strategy 101 by Deko Dimeski
PDF
Strategic Value from Enterprise Search and Insights - Viren Patel, PwC
PPTX
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
PDF
A Dynamic Data Catalog for Autonomy and Self-Service
PDF
Mastering Customer Data on Apache Spark
PDF
What Watson Explorer is and How it works
PDF
Guiding through a typical Machine Learning Pipeline
PDF
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
PPTX
Tips for Effective Data Science in the Enterprise
PDF
Consumer Data Management
Predictive Analytics - Big Data Warehousing Meetup
Watson Analytics for HSE - Copy
Evaluating Big Data Predictive Analytics Platforms
Future of Data - Big Data
“Big Picture”: Mixed-Initiative Visual Analytics of Big Data (VINCI 2013 Keyn...
Ecr presentation ss chain - jeffrey - final
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Data analytics
Personalized Search at Sandia National Labs
Webinar | Using Big Data and Predictive Analytics to Empower Distribution and...
Analytics & Data Strategy 101 by Deko Dimeski
Strategic Value from Enterprise Search and Insights - Viren Patel, PwC
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
A Dynamic Data Catalog for Autonomy and Self-Service
Mastering Customer Data on Apache Spark
What Watson Explorer is and How it works
Guiding through a typical Machine Learning Pipeline
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Tips for Effective Data Science in the Enterprise
Consumer Data Management
Ad

Viewers also liked (10)

PDF
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
PDF
Cgc2 cdn gamingsummit-real-time-customer-analytics
PDF
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
PPTX
Real-Time Personalization
KEY
Near-realtime analytics with Kafka and HBase
PDF
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
PDF
Big Data Predictive Analytics for Retail businesses
KEY
Rainbird: Realtime Analytics at Twitter (Strata 2011)
PPTX
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
PPTX
Customer Journey Analytics and Big Data
Lufei Kui: Pioneer of China's Information Revolution, 陆费逵:中国信息革命先驱
Cgc2 cdn gamingsummit-real-time-customer-analytics
Realtime Analytics With Elasticsearch [New Media Inspiration 2013]
Real-Time Personalization
Near-realtime analytics with Kafka and HBase
H2O World - Advanced Analytics at Macys.com - Daqing Zhao
Big Data Predictive Analytics for Retail businesses
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Customer Journey Analytics and Big Data
Ad

Similar to Real Time Recommendation System using Kiji (20)

PDF
1000 track3 Zhao
PPT
Retail Design
PDF
Turning Big Data to Business Advantage
PPTX
Understanding customer behaviour and segmentation
PPTX
Big data analytics
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
PDF
Analytics&IoT
PPTX
Bdml ecom
PPTX
Data Science in E-commerce
PDF
Building data pipelines: from simple to more advanced - hands-on experience /...
PPTX
roll no 38 for all topic presentation.pptx
PPTX
Data-Science-Fundamentals- Session 2.pptx
PDF
Site market-analysis
PPTX
How your favorite retailers make money out of analytics
PDF
Big Data for Retail
PDF
[Webinar] High Speed Retail Analytics
PDF
AI Ukraine'17 (eng) - Oleksii Potapenko
PPTX
datadynamos presented by Abhijeet shinde.pptx
PPTX
1. Introduction of big data in mca .pptx
PDF
Data Science in Retail-as-a-Service (RaaS)
1000 track3 Zhao
Retail Design
Turning Big Data to Business Advantage
Understanding customer behaviour and segmentation
Big data analytics
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Analytics&IoT
Bdml ecom
Data Science in E-commerce
Building data pipelines: from simple to more advanced - hands-on experience /...
roll no 38 for all topic presentation.pptx
Data-Science-Fundamentals- Session 2.pptx
Site market-analysis
How your favorite retailers make money out of analytics
Big Data for Retail
[Webinar] High Speed Retail Analytics
AI Ukraine'17 (eng) - Oleksii Potapenko
datadynamos presented by Abhijeet shinde.pptx
1. Introduction of big data in mca .pptx
Data Science in Retail-as-a-Service (RaaS)

Recently uploaded (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Approach and Philosophy of On baking technology
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
A Presentation on Touch Screen Technology
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
project resource management chapter-09.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
1 - Historical Antecedents, Social Consideration.pdf
A comparative study of natural language inference in Swahili using monolingua...
Approach and Philosophy of On baking technology
Univ-Connecticut-ChatGPT-Presentaion.pdf
Web App vs Mobile App What Should You Build First.pdf
A Presentation on Touch Screen Technology
TLE Review Electricity (Electricity).pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Assigned Numbers - 2025 - Bluetooth® Document
NewMind AI Weekly Chronicles - August'25-Week II
Encapsulation_ Review paper, used for researhc scholars
project resource management chapter-09.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Hindi spoken digit analysis for native and non-native speakers
OMC Textile Division Presentation 2021.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Zenith AI: Advanced Artificial Intelligence
1 - Historical Antecedents, Social Consideration.pdf

Real Time Recommendation System using Kiji

  • 1. Real Time Recommender System with Jan 22, 2014 Daqing Zhao, Director of Advanced Analytics Macy’s.com
  • 2. Agenda  Big data analytics versus traditional BI  Macy’s Advanced Analytics Team  Our analytics projects  Example: site recommendations using Kiji  High level architecture  Kiji Schema table structure  Model deployment using Kiji  Key benefits of Kiji and WibiData team 1
  • 3. Traditional BI process Knowledge Discovery Segmentation and Predictive Modeling Most companies Stay in this area Multidimensional Report Standard Report Schema definition, ETL into RDMS Baseline Consulting  Data can be accessed and analyzed only after ETL  Schema definition may not be optimal 2
  • 4. Hadoop/NoSQL: paradigm shift Decisions Insights Models Decision Agent Segmentation and Predictive Modeling Multi dimensional Report Reports Standard Report Hive, Mahout, Cascading, Scalding, Kiji, … MapReduce Raw data Volume Velocity Variety Write Append Read Distributed storage Computation near data Hadoop, HBase, avro, …  We can access raw data and analyze using MapReduce  With pros and cons 3
  • 5. Macy.com’s Advanced Analytics Group  We are at the frontiers of Big Data science: • Using Big Data technology • Machine learning and Statistical algorithms  We have predictive modeling, experimental design and data science teams  Our team members have very strong background in • Quantitative fields, math, stat, physics, bioinformatics, decision sciences, and cs • We collaborate with systems and IT teams internally as well as 3rd party vendors like WibiData, SAS Research, IBM Research…  We use a wide range of tools • Hadoop, SAS, R, Mahout, and others, as well as Kiji Models  We are data scientists with keen focus on domain problems 4
  • 6. Customer acquisition and retention  Targeting the right message to the right customer at the right time • Build predictive models of purchase behavior and identify drivers  Site recommendation algorithms • Recommend products based on items that are added to bag for cross- and up-sell • We also look at market basket analysis • Most work is in batch mode, expanding slowly into real time  Rapid-prototyping and testing of algorithms and policies • All done in short development cycles  Output of the team’s work support other marketing teams to identify, and reach best customers • Search, display, social network, affiliates, retention, customer services, … 5
  • 7. Some other projects  Data organization or data munging • • • • • Data collections, individual and event level, 360 degrees, … Segmentation of customers Customer value, revenue, costs Multiple channel attribution of marketing contacts Product attributes  Experimentation platform • Success of online marketing depends highly on testing, learning and optimization • Both for site layout as well as contents and recommendations  Forecast and optimization • Prediction, simulation, and search and optimize  Big data refinement and scalability • New data sources, more efficient ways of accessing data, and organizing and processing data 6
  • 8. Example: similar and complementary products 7
  • 10. Example: product social network Demographic Style Size Brand Price range Season 9
  • 11. Example: site product recommendation  Customer Adds to Bag one or more products  We recommend in real time similar/complementary products • Based on product associations and customer profile  We use various machine learning algorithms • • • • • • Association rules Collaborative filtering Predictive modeling Business rules And others, … Models built offline  Real time data, real time model scoring and real time decision  Champion/challenger tests, models evolve quickly in time  Frequent model updates, add new data 10
  • 12. Architecture Real Time Data access, Scoring Decisions Others data mining Kiji Express environment data mining Mahout environment data mining R environment SAS Environment products Kiji Model Kiji Kiji Scoring Scoring Kiji Kiji Rest Rest Kiji Kiji Rest Rest Hadoop HBase 11
  • 13. Kiji Schema table structure Customer table entity id customer email metadata order Product table entity id product category metadata inventory Schema have column names and types, compared to bits stored in HBase Group column families are structured, while Map column families are flexible Accessible as collections from Kiji Express Scala code focuses on model and business logic Scalding underneath takes care of generating MapReduce jobs 12
  • 14. Model Build and Deployment Model Model building Model building Model building Model building building Kiji Express Kiji Scoring Kiji PMML Kiji MR Deployment Kiji Schema HBase Hadoop Offline Kiji Modeling R, SAS, Mahout, … Real time data update Real time scoring Real time decisions 13
  • 15. Key benefits of partnership with WibiData  Open source, Kiji suite, abstracted with focus in modeling • Kiji Schema, KijiMR, Kiji Model, Kiji Scoring, Kiji Express, Kiji REST • Allow quick development cycle  Package popular open source projects • Hadoop, HBase, Avro, Cascading, Scalding, Scala  Better organization • Create tables, query by field name, flexibility, …, more DB like than HBase  WibiData professional services team help develop, integrate, maintain, train in-house team, consult,… • Competence, knowledge • Support infrastructure, so that we can focus on the science  Real time model deployment environment and scalable • Interactive • In milliseconds 14
  • 16. Acknowledgement  Macy’s teams  Analytics team: Kerem Tomak, Albert Zhai  Infrastructure team: Winslow Holmes, Rakesh Sharma, Cherry Peng  WibiData team  Professional Services team: Adam, Christophe, Renuka, Lynn 15