SlideShare a Scribd company logo
Making Sense of Data




        Lily goes shopping –
real-time recommendations with HBase
                         HBaseCon, May 2012




         Steven Noels – VP Product – @stevenn


                             WWW.NGDATA.COM
Lily Core 2’ recap
•  HBase-backed data repository,
   with batteries included
•  Data model:
    •  high-level data model on top of HBase’s
                                                       client app
       byte[]’s
    •  schema
    •  versioning (schema and data)                         Lily
    •  links, variants
                                                           RowLog
•  Java & REST API's
•  Indexing:                                       HBase           Solr et al.

    •  through configuration, not implementation
    •  incremental and batch index maintenance
•  RowLog: distributed, durable queue for sec.
   actions
•  Open Source: www.lilyproject.org (Apache
   License)


                                                            WWW.NGDATA.COM
Why HBase?
•  BigTable model
•  sparseness
•  atomic row updates aka concistency
•  auto-partitioning
•  Apache license
•  A great community led by a Saint J




                                         WWW.NGDATA.COM
Portfolio Overview

                                               Real-time AI
                                               Recommendations
                                               Industry algorithms and rules


                                             commercial availability	
  
                 Trend Analytics
               Pattern Detection



          Profile Development
  Context and Activity Tracking              open source	
  
       Social Stream Ingestion


                                   Schema and Data Management
                                   Total Data Aggregation
                                   Real-time Index and Retrieval
                                   Security and Enterprise Connectors




                                                              WWW.NGDATA.COM
Lily (=HBase) In Use
Some of the larger Lily deployments

•  media
    •  aggregation, database publishing and online archives
•  finance
     •  real-time identity fraud detection
•  retail banking
     •  contextualized (time+loc+person) mobile coupons
•  retail
    •  e-commerce platform:
       product catalog, consumer data store, real-time
       indexing




                                                              WWW.NGDATA.COM
Collaborative Filtering?

  Recommend items similar to a user’s highly-preferred items




                                                          WWW.NGDATA.COM
Collaborative Filtering is … Matrixes


   Sean likes “Scarface” a lot             (123,654,5.0)!
   Robin likes “Scarface” somewhat         (789,654,3.0)!
   Grant likes “The Notebook” not at all   (345,876,1.0)!
   …                                       …!

                                              (Magic)




   Grant may like “Scarface” quite a bit   (345,654,4.5)!
   …                                       …!



                                                    WWW.NGDATA.COM
Contextualized recommendations


                                  Personalized
                                     offers




                                                        shops & merchants
             Profile   Acitvity                  Item   product families
                                                        offers/coupons




creditcard
statements

                                                             WWW.NGDATA.COM
Fitting Recommendations into the Lily
Architecture

            LILY CRUD API

                                                       Lily/HBase Secondary Indexes


       read/write demultiplexer

                                                                                        co-occurence
                                                                                        lookup matrix


               rowlog                       activity store
                                                                               Steven Noels
                                                                           stevenn@ngdata.com
                                                                             www.ngdata.com
                                                                        telephone: +32 9 33 engine
                                                                               LILY recommender 88 220
                data        profile   data, activity, profile scoring
  indexes
                store       store                                             Gent (Belgium)




                                                                                                     propensity


                                                                                                                   custom ...
                                                                                           k-means
                                                                                  ALS
                                                                                                                                Makers of


    Lily Core Repository
                                                                                        algorithm support



                                                                                                                  WWW.NGDATA.COM
Preferencing aka Feeding the Matrix
•  Transaction-based preferencing
     •  Pluggable preference strategies, using Lily-based data
        (HBase&Solr) for decision making
        •  e.g. credit card statement = transactions between users and product
           families
    •  Preference weighting
    •  Ingest: REST API, bulk support
    •  Real-time updating of the recommendation model



•  Profile Store
     •  Profile activities can be preferenced
    •  Support for Profile behavior analysis



                                                                   WWW.NGDATA.COM
Making recommendations
•  Recommender
    •  Pluggable recommender strategies, using Lily-based data
       (HBase&Solr) for decision making
    •  Multi-model support: user-item & item-user recommendations
    •  Estimation of both preferenced and non-preferenced items
    •  Geolocation-based recommendations
    •  Re-scoring
    •  REST API



•  (Planned)
     •  Support for Classifications
        (scenario - Recommend me all (possible) coffee drinkers)
     •  Matrix / recommendation indexing


                                                              WWW.NGDATA.COM
Other upcoming Lily Features
•  Secondary indexes (= Lily Core!)
    •  indexes are defined through configuration
    •  single or multi-field indexes
    •  range queries and prefix queries
    •  asc or desc sorted results
    •  can read huge, sorted lists
    •  synchronously updated: index updates are applied by rowlog
       secondary actions
    •  online building of new indexes (no table locks)
    •  MapReduce integration


•  SolrCloud integration
    •  Index shards and configuration managed through ZooKeeper



                                                          WWW.NGDATA.COM
Making Sense of Data




Questions? Thank you!




               WWW.NGDATA.COM

More Related Content

PPTX
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
PDF
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
PPTX
Content Identification using HBase
PPTX
A Survey of HBase Application Archetypes
PPT
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
PPTX
Austin Scales- Clickstream Analytics at Bazaarvoice
PDF
Realtime Analytics with Hadoop and HBase
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Content Identification using HBase
A Survey of HBase Application Archetypes
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
Austin Scales- Clickstream Analytics at Bazaarvoice
Realtime Analytics with Hadoop and HBase

What's hot (20)

PDF
Building a Hadoop Data Warehouse with Impala
KEY
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
PDF
Engineering practices in big data storage and processing
PDF
HBase Status Report - Hadoop Summit Europe 2014
PDF
What database
PPTX
Apache Drill
PDF
New Security Features in Apache HBase 0.98: An Operator's Guide
PDF
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
PDF
Cloudera Impala
PDF
HBase and Impala Notes - Munich HUG - 20131017
PPTX
Architecting Applications with Hadoop
PDF
Impala: Real-time Queries in Hadoop
PPTX
In Search of Database Nirvana: Challenges of Delivering HTAP
PDF
An introduction to apache drill presentation
PDF
Application architectures with hadoop – big data techcon 2014
PPTX
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
PDF
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
PDF
SQL Engines for Hadoop - The case for Impala
PDF
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Building a Hadoop Data Warehouse with Impala
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Design Patterns for Building 360-degree Views with HBase and Kiji
Engineering practices in big data storage and processing
HBase Status Report - Hadoop Summit Europe 2014
What database
Apache Drill
New Security Features in Apache HBase 0.98: An Operator's Guide
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala
HBase and Impala Notes - Munich HUG - 20131017
Architecting Applications with Hadoop
Impala: Real-time Queries in Hadoop
In Search of Database Nirvana: Challenges of Delivering HTAP
An introduction to apache drill presentation
Application architectures with hadoop – big data techcon 2014
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
SQL Engines for Hadoop - The case for Impala
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Ad

Viewers also liked (20)

PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
PPTX
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
PDF
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
PDF
HBaseCon 2013: Scalable Network Designs for Apache HBase
PPTX
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
PPTX
HBaseCon 2013: Full-Text Indexing for Apache HBase
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
PPTX
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
PPTX
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
PPTX
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
PPTX
HBaseCon 2013: Near Real Time Indexing for eBay Search
PDF
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
PPT
HBase for Dealing with Large Matrices
ZIP
Google
PDF
20130404 emacs conf 2013 sketchnotes
PDF
Quantified Awesome: Tracking Clothes, Groceries, and Other Small Things
PDF
Auto Focus
PDF
Emacs Modes I can't work without
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBase for Dealing with Large Matrices
Google
20130404 emacs conf 2013 sketchnotes
Quantified Awesome: Tracking Clothes, Groceries, and Other Small Things
Auto Focus
Emacs Modes I can't work without
Ad

Similar to HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata (20)

PDF
Streaming Hadoop for Enterprise Adoption
PDF
Common MongoDB Use Cases
PPT
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh
PPTX
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
PPTX
Bigdata antipatterns
PPTX
Common MongoDB Use Cases Webinar
PDF
Next Generation Data Platforms - Deon Thomas
PDF
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
PPTX
Millions quotes per second in pure java
PPTX
The Microsoft BigData Story
PPTX
7 Databases in 70 minutes
PPSX
2011 - TDWI Big Data Forum - The New Analytics
PPT
Big Data Paris : Hadoop and NoSQL
KEY
Processing Big Data
PPTX
No Sql Movement
PPTX
How we use Hive at SnowPlow, and how the role of HIve is changing
PPTX
Big Data with Not Only SQL
PDF
NoSQL-Overview
PDF
Reporting _ Paul Vella _ OBI Analytics for JDE.pdf
PPTX
BI, Reporting and Analytics on Apache Cassandra
Streaming Hadoop for Enterprise Adoption
Common MongoDB Use Cases
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Bigdata antipatterns
Common MongoDB Use Cases Webinar
Next Generation Data Platforms - Deon Thomas
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
Millions quotes per second in pure java
The Microsoft BigData Story
7 Databases in 70 minutes
2011 - TDWI Big Data Forum - The New Analytics
Big Data Paris : Hadoop and NoSQL
Processing Big Data
No Sql Movement
How we use Hive at SnowPlow, and how the role of HIve is changing
Big Data with Not Only SQL
NoSQL-Overview
Reporting _ Paul Vella _ OBI Analytics for JDE.pdf
BI, Reporting and Analytics on Apache Cassandra

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
KodekX | Application Modernization Development
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
KodekX | Application Modernization Development
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

  • 1. Making Sense of Data Lily goes shopping – real-time recommendations with HBase HBaseCon, May 2012 Steven Noels – VP Product – @stevenn WWW.NGDATA.COM
  • 2. Lily Core 2’ recap •  HBase-backed data repository, with batteries included •  Data model: •  high-level data model on top of HBase’s client app byte[]’s •  schema •  versioning (schema and data) Lily •  links, variants RowLog •  Java & REST API's •  Indexing: HBase Solr et al. •  through configuration, not implementation •  incremental and batch index maintenance •  RowLog: distributed, durable queue for sec. actions •  Open Source: www.lilyproject.org (Apache License) WWW.NGDATA.COM
  • 3. Why HBase? •  BigTable model •  sparseness •  atomic row updates aka concistency •  auto-partitioning •  Apache license •  A great community led by a Saint J WWW.NGDATA.COM
  • 4. Portfolio Overview Real-time AI Recommendations Industry algorithms and rules commercial availability   Trend Analytics Pattern Detection Profile Development Context and Activity Tracking open source   Social Stream Ingestion Schema and Data Management Total Data Aggregation Real-time Index and Retrieval Security and Enterprise Connectors WWW.NGDATA.COM
  • 5. Lily (=HBase) In Use Some of the larger Lily deployments •  media •  aggregation, database publishing and online archives •  finance •  real-time identity fraud detection •  retail banking •  contextualized (time+loc+person) mobile coupons •  retail •  e-commerce platform: product catalog, consumer data store, real-time indexing WWW.NGDATA.COM
  • 6. Collaborative Filtering? Recommend items similar to a user’s highly-preferred items WWW.NGDATA.COM
  • 7. Collaborative Filtering is … Matrixes Sean likes “Scarface” a lot (123,654,5.0)! Robin likes “Scarface” somewhat (789,654,3.0)! Grant likes “The Notebook” not at all (345,876,1.0)! … …! (Magic) Grant may like “Scarface” quite a bit (345,654,4.5)! … …! WWW.NGDATA.COM
  • 8. Contextualized recommendations Personalized offers shops & merchants Profile Acitvity Item product families offers/coupons creditcard statements WWW.NGDATA.COM
  • 9. Fitting Recommendations into the Lily Architecture LILY CRUD API Lily/HBase Secondary Indexes read/write demultiplexer co-occurence lookup matrix rowlog activity store Steven Noels stevenn@ngdata.com www.ngdata.com telephone: +32 9 33 engine LILY recommender 88 220 data profile data, activity, profile scoring indexes store store Gent (Belgium) propensity custom ... k-means ALS Makers of Lily Core Repository algorithm support WWW.NGDATA.COM
  • 10. Preferencing aka Feeding the Matrix •  Transaction-based preferencing •  Pluggable preference strategies, using Lily-based data (HBase&Solr) for decision making •  e.g. credit card statement = transactions between users and product families •  Preference weighting •  Ingest: REST API, bulk support •  Real-time updating of the recommendation model •  Profile Store •  Profile activities can be preferenced •  Support for Profile behavior analysis WWW.NGDATA.COM
  • 11. Making recommendations •  Recommender •  Pluggable recommender strategies, using Lily-based data (HBase&Solr) for decision making •  Multi-model support: user-item & item-user recommendations •  Estimation of both preferenced and non-preferenced items •  Geolocation-based recommendations •  Re-scoring •  REST API •  (Planned) •  Support for Classifications (scenario - Recommend me all (possible) coffee drinkers) •  Matrix / recommendation indexing WWW.NGDATA.COM
  • 12. Other upcoming Lily Features •  Secondary indexes (= Lily Core!) •  indexes are defined through configuration •  single or multi-field indexes •  range queries and prefix queries •  asc or desc sorted results •  can read huge, sorted lists •  synchronously updated: index updates are applied by rowlog secondary actions •  online building of new indexes (no table locks) •  MapReduce integration •  SolrCloud integration •  Index shards and configuration managed through ZooKeeper WWW.NGDATA.COM
  • 13. Making Sense of Data Questions? Thank you! WWW.NGDATA.COM