SlideShare a Scribd company logo
Developing Real Time Analytics
Applications Using HBase in the Cloud

                        May 22, 2012
                         Rick Tucker
                      tech@sproxil.com




   tech@sproxil.com        May 22,2012   © 2012 Sproxil, Inc.
About Sproxil
• Brand protection,
  specializing in anti-
                                                 1
                                          SCRATCH
  counterfeiting solutions

• Solution requires a
  scalable and high-
  throughput text                                2
  message processing                           TEXT
  engine

• Supports a real-time
  analytics web interface                         3
                                             VERIFY



      tech@sproxil.com   May 22,2012   © 2012 Sproxil, Inc.
Why HBase?

 USER SENDS                TEXT MESSAGE              CALCULATE
TEXT MESSAGE               IS PROCESSED              ANALYTICS




    USER                    Amazon EC2
  RECEIVES                    Cloud
   REPLY




        tech@sproxil.com       May 22,2012   © 2012 Sproxil, Inc.
Real-Time Analytics Engine
 • MapReduce too slow to maintain data in true real time

 • As data arrives, analytical data is updated through
   counters

Text Message                    Message                     Increment
   Arrives                      Analyzed                     Counters

                            Genuine Product      +1 Increment Counter for
                            Authentication          Genuine Authentications


                            Repeat Customer      +1 Increment Counter for
                                                    Repeat Customers


         tech@sproxil.com          May 22,2012        © 2012 Sproxil, Inc.
Schema Design: Example 1

• Example: View log of text messages in
  chronological order
        • Rowkey: row prefix + timestamp

      Row
      transaction 2012-05-22 12:00:00
      transaction 2012-05-22 12:01:14
      transaction 2012-05-22 12:02:03

Note: HBase sorts rowkeys lexicographically so scans return data in reverse
chronological order
         tech@sproxil.com          May 22,2012              © 2012 Sproxil, Inc.   5
•
         •


    Row
    transaction userID 1 2012-05-22 12:00:00
    transaction userID 1 2012-05-22 12:01:14
    transaction userID 2 2012-05-22 12:00:54
    transaction userID 2 2012-05-22 12:01:22
    transaction userID 2 2012-05-22 12:02:01
Note: Hbase sorts rows lexicographically so scans return data in reverse
chronological order

          tech@sproxil.com             May 22,2012                 © 2012 Sproxil, Inc.
Critical Findings
• Schema design is crucial for successful HBase
  implementation
  – Pack as much info as possible into row keys


• Use caution with Filters
  – E.g. Regex filters can be costly
  – Alternatives:
     • Directly query for data you need
     • Use efficient filters when filtering large data sets




      tech@sproxil.com         May 22,2012             © 2012 Sproxil, Inc.
Thank You!                                 Your global brand
                                                 protection specialists
                                                     – spanning 3
                                                    continents and
  Making Counterfeiting Unprofitable™            speaking 9 languages




                                                   tech@sproxil.com

                                                    +1 617 682 9577

America | Asia | Africa     Sproxil.com



         tech@sproxil.com          May 22,2012           © 2012 Sproxil, Inc.   8

More Related Content

PPTX
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
PPTX
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
PDF
NoSQL and Spatial Database Capabilities using PostgreSQL
 
PDF
High-Scale Entity Resolution in Hadoop
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
PPTX
Insights into Real World Data Management Challenges
PPTX
Free Servers to Build Big Data System on: Bing’s Approach
PPTX
In Search of Database Nirvana: Challenges of Delivering HTAP
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
NoSQL and Spatial Database Capabilities using PostgreSQL
 
High-Scale Entity Resolution in Hadoop
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Insights into Real World Data Management Challenges
Free Servers to Build Big Data System on: Bing’s Approach
In Search of Database Nirvana: Challenges of Delivering HTAP

What's hot (20)

PPTX
Azure data lakes
PPTX
Hadoop data access layer v4.0
PPTX
Querying Druid in SQL with Superset
PPTX
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
PPTX
Built-In Security for the Cloud
PPTX
HTAP Queries
PPTX
Ravi Namboori 's Open stack framework introduction
PPTX
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
PPTX
Atlanta Data Science Meetup | Qubole slides
PPTX
Spark and Couchbase– Augmenting the Operational Database with Spark
PPTX
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
POTX
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
PPTX
Hadoop vs. RDBMS for Advanced Analytics
PDF
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
PDF
Big Telco - Yousun Jeong
PDF
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
PPTX
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
PPTX
Enabling Modern Application Architecture using Data.gov open government data
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
PPTX
Securing your Big Data Environments in the Cloud
Azure data lakes
Hadoop data access layer v4.0
Querying Druid in SQL with Superset
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Built-In Security for the Cloud
HTAP Queries
Ravi Namboori 's Open stack framework introduction
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Atlanta Data Science Meetup | Qubole slides
Spark and Couchbase– Augmenting the Operational Database with Spark
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Hadoop vs. RDBMS for Advanced Analytics
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Big Telco - Yousun Jeong
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Enabling Modern Application Architecture using Data.gov open government data
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Securing your Big Data Environments in the Cloud
Ad

Viewers also liked (13)

PDF
Impala: A Modern, Open-Source SQL Engine for Hadoop
PPTX
BIG Data & Hadoop Applications in E-Commerce
PPTX
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
PPTX
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
PDF
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
PDF
Real-World NoSQL Schema Design
PPTX
How we solved Real-time User Segmentation using HBase
PPTX
MongoDB Schema Design: Four Real-World Examples
PDF
Magento scalability from the trenches (Meet Magento Sweden 2016)
PDF
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
PDF
Omnichannel Customer Experience
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Impala: A Modern, Open-Source SQL Engine for Hadoop
BIG Data & Hadoop Applications in E-Commerce
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Real-World NoSQL Schema Design
How we solved Real-time User Segmentation using HBase
MongoDB Schema Design: Four Real-World Examples
Magento scalability from the trenches (Meet Magento Sweden 2016)
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
Omnichannel Customer Experience
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Ad

Similar to HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil (20)

PPTX
Introducing MongoDB into your Organization
PPTX
Mongodb Presentation
PPTX
Mongodb hashim shaikh
PPTX
Mongodb Presentation
PPTX
Overcoming Today's Data Challenges with MongoDB
PDF
Overcoming Today's Data Challenges with MongoDB
PDF
2022 Trends in Enterprise Analytics
PDF
Big Data
PDF
Destroying Data Silos
PDF
How to Get Started with Your MongoDB Pilot Project
PPT
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
PDF
Dremio introduction
PDF
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
PPTX
mongoDB: Driving a data revolution
PPTX
New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C...
 
PDF
Final_CloudEventFrankfurt2017 (1).pdf
PDF
Case Study: Using EDMCS to Solve Master Data Challenges
PPTX
An Evening with MongoDB Detroit 2013
PDF
Optimize with Open Source
 
Introducing MongoDB into your Organization
Mongodb Presentation
Mongodb hashim shaikh
Mongodb Presentation
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
2022 Trends in Enterprise Analytics
Big Data
Destroying Data Silos
How to Get Started with Your MongoDB Pilot Project
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
Dremio introduction
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
mongoDB: Driving a data revolution
New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C...
 
Final_CloudEventFrankfurt2017 (1).pdf
Case Study: Using EDMCS to Solve Master Data Challenges
An Evening with MongoDB Detroit 2013
Optimize with Open Source
 

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
KodekX | Application Modernization Development
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Spectroscopy.pptx food analysis technology
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KodekX | Application Modernization Development
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Spectroscopy.pptx food analysis technology
Teaching material agriculture food technology
Cloud computing and distributed systems.
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

  • 1. Developing Real Time Analytics Applications Using HBase in the Cloud May 22, 2012 Rick Tucker tech@sproxil.com tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 2. About Sproxil • Brand protection, specializing in anti- 1 SCRATCH counterfeiting solutions • Solution requires a scalable and high- throughput text 2 message processing TEXT engine • Supports a real-time analytics web interface 3 VERIFY tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 3. Why HBase? USER SENDS TEXT MESSAGE CALCULATE TEXT MESSAGE IS PROCESSED ANALYTICS USER Amazon EC2 RECEIVES Cloud REPLY tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 4. Real-Time Analytics Engine • MapReduce too slow to maintain data in true real time • As data arrives, analytical data is updated through counters Text Message Message Increment Arrives Analyzed Counters Genuine Product +1 Increment Counter for Authentication Genuine Authentications Repeat Customer +1 Increment Counter for Repeat Customers tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 5. Schema Design: Example 1 • Example: View log of text messages in chronological order • Rowkey: row prefix + timestamp Row transaction 2012-05-22 12:00:00 transaction 2012-05-22 12:01:14 transaction 2012-05-22 12:02:03 Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc. 5
  • 6. • Row transaction userID 1 2012-05-22 12:00:00 transaction userID 1 2012-05-22 12:01:14 transaction userID 2 2012-05-22 12:00:54 transaction userID 2 2012-05-22 12:01:22 transaction userID 2 2012-05-22 12:02:01 Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 7. Critical Findings • Schema design is crucial for successful HBase implementation – Pack as much info as possible into row keys • Use caution with Filters – E.g. Regex filters can be costly – Alternatives: • Directly query for data you need • Use efficient filters when filtering large data sets tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 8. Thank You! Your global brand protection specialists – spanning 3 continents and Making Counterfeiting Unprofitable™ speaking 9 languages tech@sproxil.com +1 617 682 9577 America | Asia | Africa Sproxil.com tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc. 8

Editor's Notes

  • #3: Processed large volume of text messages, has even led to arrest of counterfeiters
  • #4: High speed transactional operations criticalHandle large volumes of text messages quicklyLarge volume of dataMillions of recordsSchema supports sparse data
  • #8: Explain why regex is costly