SlideShare a Scribd company logo
Hypercubes in HBase
 Fredrik Möllerstrand <fredrik@last.fm>
 Hadoop User Group UK, April 14 2009
Hello.

• Per Andersson, Fredrik Möllerstrand
• Chalmers University of Technology, Sweden
• Master thesis at last.fm
stats.last.fm
• Web statistics for in-house use.
• Served out of mysql.
stats.last.fm

• y-axis: pageviews.
• x-axis: time.
• also: countries.
stats.last.fm

SELECT pageviews, country
FROM webstatistics
GROUP BY country;
SQL: Star schema
• Facts table & dimension tables.
• Joins!
Data Cube Foundations

• n-dimensional cube
• attribute => dimension
• attribute value => measurement
Data Cube Foundations


• dimensionality reductions
• projections
Data Cube Foundations

• Aggregation: sum, count, average, &c.
• Data cubes modeled as:
    in RDBMSs, modeled as star schemas.
    in HBase, modeled with column-families.
Data Cubes in HBase

• Store projections,
  not distinct dimensions.
• Pre-compute *everything*.
Data Cubes in HBase

• Rowkey: unit + time
  i.e. ‘pageviews-20090414’
• One column-family for every projection.
  i.e. ‘country-useragent’
• One qualifier per point in n-space.
  i.e. ‘US-safari’, ‘NO-opera’, &c.
The SQL-DB Problem

• Too much data to keep in memory.
• Plenty of joins makes queries complex.
• Can’t serve at mouse click rate.
The Solution;
  A data store that is:

• Distributed
• Multi-dimensional
• Magnetic(!)
• Just general enough
Enter: Zohmg.
Zohmg;
  A data store that is:
• Distributed
• Multi-dimensional
• Time-series-based
• Magnetic(!)
• Just general enough
Tech

• Rides on the back of Dumbo.
• Stores aggregates in HBase.
• Serves JSON.
Zohmg

• $> setup.py
  # create hbase database.
• $> import.py --mapper weblogs.py
  # run dumbo job.
• $> serve.py
  # start web server.
Developers, developers.

• Configuration - yaml.
• Mapper - python.
User’s configuration.
  project_name: webmetrics

  dimensions:
    - country
    - domain
    - useragent
    - usertype

  units:
    - pageviews

  projections:
      country:
        - country
      domain-usertype:
        - domain
        - usertype
      country-domain-useragent-usertype:
        - country
        - domain
        - useragent
        - usertype
User’s mapper.
def map(key, value):
    from lfm.data.parse import web

   log = web.parse(value)

   dimensions = {'country'    :   geoip(log.host),
                 'domain'     :   log.domain,
                 'useragent' :    classify(log.useragent),
                 'usertype' :     ("user", "anon")[log.userid == None]
                }
   values = {'pageviews' : 1}

   yield log.timestamp, dimensions, values
Example.
Dimensions in HBase
• Column-family:
  country-useragent-domain
• Qualifier:
  US-firefox-last.fm
Questions?

More Related Content

PPT
Hadoop hbase introduction
PDF
Hadoop and MapReduce
PPTX
The NoSQL Geospatial Landscape
PPTX
A Hands-on Introduction to MapReduce (in Python)
PPTX
RTree Spatial Indexing with MongoDB - MongoDC
PDF
hbaseconasia2017: HBase on Beam
PPTX
2012 apache hadoop_map_reduce_windows_azure
PPT
5 Things You Didn't Know You Could do with CouchDB
Hadoop hbase introduction
Hadoop and MapReduce
The NoSQL Geospatial Landscape
A Hands-on Introduction to MapReduce (in Python)
RTree Spatial Indexing with MongoDB - MongoDC
hbaseconasia2017: HBase on Beam
2012 apache hadoop_map_reduce_windows_azure
5 Things You Didn't Know You Could do with CouchDB

What's hot (19)

PPTX
Refactoring HUBzero for Linked Data
PPT
Responsive with SASS and compass
PDF
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
PDF
Introduction to mongo db
ODP
Cache and Drupal
PPTX
Hadoop And Big Data - My Presentation To Selective Audience
PPTX
מיכאל
PPTX
A journey through cosmos - 5th el
PDF
Breaking Free from Bootstrap: Custom Responsive Grids with Sass Susy
PPTX
Introduction to Apache HBase
PDF
Nosql databases for the .net developer
PDF
Adaptive theming using compass susy grid
PDF
Fasten RWD Development with Sass
PDF
TriHUG 3/14: HBase in Production
PDF
莫拉克颱風災情支援網
PDF
Empowering Semantic Zooming with Hadoop and HBase
PDF
Davraz - A graph visualization and exploration software.
PPTX
KDB+ Lite
Refactoring HUBzero for Linked Data
Responsive with SASS and compass
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Introduction to mongo db
Cache and Drupal
Hadoop And Big Data - My Presentation To Selective Audience
מיכאל
A journey through cosmos - 5th el
Breaking Free from Bootstrap: Custom Responsive Grids with Sass Susy
Introduction to Apache HBase
Nosql databases for the .net developer
Adaptive theming using compass susy grid
Fasten RWD Development with Sass
TriHUG 3/14: HBase in Production
莫拉克颱風災情支援網
Empowering Semantic Zooming with Hadoop and HBase
Davraz - A graph visualization and exploration software.
KDB+ Lite
Ad

Viewers also liked (11)

PPTX
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
PDF
Analysis and design of a half hypercube interconnection network topology
PPTX
Broadcast in Hypercube
PPTX
Linked Data Hypercubes
ODP
Chapter - 04 Basic Communication Operation
PPT
Parallel Computing
PPTX
Interconnection Network
PDF
Parallel Algorithms
PDF
Parallel Algorithms
PPT
Parallel Computing
PPT
Parallel computing
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Analysis and design of a half hypercube interconnection network topology
Broadcast in Hypercube
Linked Data Hypercubes
Chapter - 04 Basic Communication Operation
Parallel Computing
Interconnection Network
Parallel Algorithms
Parallel Algorithms
Parallel Computing
Parallel computing
Ad

Similar to Hypercubes In Hbase (20)

PDF
rhbase_tutorial
PPTX
Hbasepreso 111116185419-phpapp02
ODP
HBase introduction talk
PDF
HBaseCon 2015- HBase @ Flipboard
PDF
HBaseCon 2015: HBase @ Flipboard
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PPTX
Apache HBase - Introduction & Use Cases
PPTX
PPTX
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
PPT
Chicago Data Summit: Apache HBase: An Introduction
PDF
NoSQL Solutions - a comparative study
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PDF
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
PDF
Intro to HBase - Lars George
PPT
HBASE Overview
PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
PPTX
Apache HBase™
PPT
Big data hbase
PPTX
Introduction to HBase
PPTX
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
rhbase_tutorial
Hbasepreso 111116185419-phpapp02
HBase introduction talk
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015: HBase @ Flipboard
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Apache HBase - Introduction & Use Cases
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
Chicago Data Summit: Apache HBase: An Introduction
NoSQL Solutions - a comparative study
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Intro to HBase - Lars George
HBASE Overview
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Apache HBase™
Big data hbase
Introduction to HBase
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...

More from George Ang (20)

PDF
Wrapper induction construct wrappers automatically to extract information f...
PDF
Opinion mining and summarization
PPT
Huffman coding
PPT
Do not crawl in the dust 
different ur ls similar text
PPT
大规模数据处理的那些事儿
PPT
腾讯大讲堂02 休闲游戏发展的文化趋势
PPT
腾讯大讲堂03 qq邮箱成长历程
PPT
腾讯大讲堂04 im qq
PPT
腾讯大讲堂05 面向对象应对之道
PPT
腾讯大讲堂06 qq邮箱性能优化
PPT
腾讯大讲堂07 qq空间
PPT
腾讯大讲堂08 可扩展web架构探讨
PPT
腾讯大讲堂09 如何建设高性能网站
PPT
腾讯大讲堂01 移动qq产品发展历程
PPT
腾讯大讲堂10 customer engagement
PPT
腾讯大讲堂11 拍拍ce工作经验分享
PPT
腾讯大讲堂14 qq直播(qq live) 介绍
PPT
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
PPTX
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
PPT
腾讯大讲堂16 产品经理工作心得分享
Wrapper induction construct wrappers automatically to extract information f...
Opinion mining and summarization
Huffman coding
Do not crawl in the dust 
different ur ls similar text
大规模数据处理的那些事儿
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂04 im qq
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂07 qq空间
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂10 customer engagement
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂16 产品经理工作心得分享

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Digital-Transformation-Roadmap-for-Companies.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
Spectroscopy.pptx food analysis technology
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...

Hypercubes In Hbase