SlideShare a Scribd company logo
HBase to Save the Planet

            Alex Newman
         posix4e@apache.org
      Architect, Drawn to Scale
      Strategic Advisor, Opower
My life with HBase


                      Drawn to
Factset    Cloudera              Opower
                        Scale
About Opower

Opower is a customer engagement
 platform for the utility industry
About Opower

        Home energy reports
       Customized utility bills
Energy efficiency programs for utilities
About Opower

   Opower runs on analytics
Analytics run on Hadoop + HBase
Opower analysis relies on data
from a variety of sources

   »   Electric Utility Usage         »   Thermostat       »   Weather            »   Gas Utility Usage
                Data                         data               data                       Data



                     Data Storage &            4
                                                                  Shared Energy
                       Processing                                   Signature
                                                                    Repository
                                             3 1       2




                     Disaggregation                               OPOWER
                       Algorithms                                 Platform
Opower’s first architecture could
not support their analytic vision
                MySQL
             Scalability?
            Performance?
           Data integration?
Opower’s first architecture could
not support their analytic vision
       Analytic workflow instead of
               analytic apps:
    SQL -> CSV -> R -> too little, too slow
Problem #1
                 Data Lake Cost




Usage   AMI Regional AMI   Sensor Data   Data Lake
Problem #2
     Slower and slower queries
                Smart-grid-scale data
Lots of supporting data: weather, demographics, etc.
Problem #3
It was taking lots of “magic”
        Intense analytics
        Strange schemas
       Segmented queries
Hadoop + HBase at Opower

Opower determined that they needed
  an entirely new data architecture
NexGen Architecture @ Opower
Hadoop + HBase at Opower
      Early success:
       HBase AMI
What rocked

Endless, cheap scalability
What rocked

The analytics team loved it!
What sucked

Hard on the ops team – still trying to
              grok it
What sucked
  NoSchema p1.
    Creating Schema
  Managing MetaData
Schema <=> Performance
What sucked

     HA
   Failover
  Snapshots
What sucked
      No secondary index
Aggregation is slow (Rollup/OLAP)
    Poor Client Performance
It would be better if only …

Developers were not forced to know
how the data is stored, indexed, etc.
It would be better if only …

 There were nicer APIs and better
     query languages (SQL?)
It would be better if only …

  Version migrations were easy
       Hierarchical Tables
It would be better if only …

       Real-time tuning
It would be better if only …

       Did I mention HA?
In summary

HBase has helped Opower achieve their analytic
                      vision
     But they’ve still got a long way to go
       HBase still has a long way to go
Questions?

      Alex Newman
   posix4e@apache.org
Architect, Drawn to Scale
Strategic Advisor, Opower

More Related Content

PPTX
HBaseCon 2013: Being Smarter Than the Smart Meter
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
PPTX
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
PPTX
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
PPTX
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
PPTX
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
PPTX
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
PPTX
HBaseCon 2015 General Session: State of HBase
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015 General Session: State of HBase

What's hot (20)

PPTX
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
PDF
HBase Read High Availability Using Timeline-Consistent Region Replicas
PPTX
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
PPTX
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
PDF
MapR-DB Elasticsearch Integration
PPTX
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
PPTX
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
PPTX
HBaseCon 2015: State of HBase Docs and How to Contribute
PDF
HBaseConAsia2018 Track1-3: HBase at Xiaomi
PPTX
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
PPTX
DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...
PPTX
Hadoop and HBase @eBay
PDF
Data Evolution in HBase
PDF
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
PPTX
A Survey of HBase Application Archetypes
PDF
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
PPTX
Keynote: The Future of Apache HBase
PDF
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
MapR-DB Elasticsearch Integration
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...
Hadoop and HBase @eBay
Data Evolution in HBase
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
A Survey of HBase Application Archetypes
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
Keynote: The Future of Apache HBase
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBase Global Indexing to support large-scale data ingestion at Uber
Ad

Viewers also liked (20)

PDF
A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...
PPTX
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
PDF
IT서비스사업의 이해: SW CEO 아카데미 9차 강의
PPTX
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
PPTX
HBaseCon 2013: 1500 JIRAs in 20 Minutes
PPTX
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
PPTX
HBaseCon 2013: Rebuilding for Scale on Apache HBase
PPTX
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
PPTX
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
PDF
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
PPTX
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
PPTX
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
PPTX
HBaseCon 2012 | Scaling GIS In Three Acts
PPTX
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
PPT
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
PPTX
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
PDF
Tales from the Cloudera Field
PPTX
Cross-Site BigTable using HBase
PPTX
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
PPT
HBaseCon 2012 | Building Mobile Infrastructure with HBase
A Durable Space from April 23 NISO Virtual Conference: Dealing with the Data ...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
IT서비스사업의 이해: SW CEO 아카데미 9차 강의
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Tales from the Cloudera Field
Cross-Site BigTable using HBase
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Ad

Similar to HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower (20)

PPTX
Big Data Infrastructure and Hadoop components.pptx
PDF
Where Does Big Data Meet Big Database - QCon 2012
PDF
From flat files to deconstructed database
PPTX
Integrating hadoop - Big Data TechCon 2013
PPT
Big Data = Big Decisions
PDF
Optique presentation
PPTX
Big data hadoop-no sql and graph db-final
PPTX
The Future of Hbase
PDF
Paper Final Taube Bienert GridInterop 2012
PDF
Hadoop at datasift
PDF
Big data processing with apache spark
PPTX
Strata NY 2018: The deconstructed database
PPT
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
PDF
Database revolution opening webcast 01 18-12
PDF
Database Revolution - Exploratory Webcast
PDF
Database Survival Guide: Exploratory Webcast
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PDF
2016 August POWER Up Your Insights - IBM System Summit Mumbai
PPTX
Big Data: It’s all about the Use Cases
PDF
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
Big Data Infrastructure and Hadoop components.pptx
Where Does Big Data Meet Big Database - QCon 2012
From flat files to deconstructed database
Integrating hadoop - Big Data TechCon 2013
Big Data = Big Decisions
Optique presentation
Big data hadoop-no sql and graph db-final
The Future of Hbase
Paper Final Taube Bienert GridInterop 2012
Hadoop at datasift
Big data processing with apache spark
Strata NY 2018: The deconstructed database
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Database revolution opening webcast 01 18-12
Database Revolution - Exploratory Webcast
Database Survival Guide: Exploratory Webcast
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
2016 August POWER Up Your Insights - IBM System Summit Mumbai
Big Data: It’s all about the Use Cases
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
August Patch Tuesday
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Chapter 5: Probability Theory and Statistics
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
1. Introduction to Computer Programming.pptx
Hindi spoken digit analysis for native and non-native speakers
Programs and apps: productivity, graphics, security and other tools
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Mushroom cultivation and it's methods.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Group 1 Presentation -Planning and Decision Making .pptx
A novel scalable deep ensemble learning framework for big data classification...
August Patch Tuesday
1 - Historical Antecedents, Social Consideration.pdf
Web App vs Mobile App What Should You Build First.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
DP Operators-handbook-extract for the Mautical Institute
Building Integrated photovoltaic BIPV_UPV.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A comparative analysis of optical character recognition models for extracting...
Getting Started with Data Integration: FME Form 101
Chapter 5: Probability Theory and Statistics
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
1. Introduction to Computer Programming.pptx

HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environment - OPower

  • 1. HBase to Save the Planet Alex Newman posix4e@apache.org Architect, Drawn to Scale Strategic Advisor, Opower
  • 2. My life with HBase Drawn to Factset Cloudera Opower Scale
  • 3. About Opower Opower is a customer engagement platform for the utility industry
  • 4. About Opower Home energy reports Customized utility bills Energy efficiency programs for utilities
  • 5. About Opower Opower runs on analytics Analytics run on Hadoop + HBase
  • 6. Opower analysis relies on data from a variety of sources » Electric Utility Usage » Thermostat » Weather » Gas Utility Usage Data data data Data Data Storage & 4 Shared Energy Processing Signature Repository 3 1 2 Disaggregation OPOWER Algorithms Platform
  • 7. Opower’s first architecture could not support their analytic vision MySQL Scalability? Performance? Data integration?
  • 8. Opower’s first architecture could not support their analytic vision Analytic workflow instead of analytic apps: SQL -> CSV -> R -> too little, too slow
  • 9. Problem #1 Data Lake Cost Usage AMI Regional AMI Sensor Data Data Lake
  • 10. Problem #2 Slower and slower queries Smart-grid-scale data Lots of supporting data: weather, demographics, etc.
  • 11. Problem #3 It was taking lots of “magic” Intense analytics Strange schemas Segmented queries
  • 12. Hadoop + HBase at Opower Opower determined that they needed an entirely new data architecture
  • 14. Hadoop + HBase at Opower Early success: HBase AMI
  • 16. What rocked The analytics team loved it!
  • 17. What sucked Hard on the ops team – still trying to grok it
  • 18. What sucked NoSchema p1. Creating Schema Managing MetaData Schema <=> Performance
  • 19. What sucked HA Failover Snapshots
  • 20. What sucked No secondary index Aggregation is slow (Rollup/OLAP) Poor Client Performance
  • 21. It would be better if only … Developers were not forced to know how the data is stored, indexed, etc.
  • 22. It would be better if only … There were nicer APIs and better query languages (SQL?)
  • 23. It would be better if only … Version migrations were easy Hierarchical Tables
  • 24. It would be better if only … Real-time tuning
  • 25. It would be better if only … Did I mention HA?
  • 26. In summary HBase has helped Opower achieve their analytic vision But they’ve still got a long way to go HBase still has a long way to go
  • 27. Questions? Alex Newman posix4e@apache.org Architect, Drawn to Scale Strategic Advisor, Opower

Editor's Notes

  • #2: Name Email Address Title
  • #3: WARNING THESE ARE MY WORDS, not FDS, Cloudera or OPower Factest 2005: - Maybe I was crazy to use it - Tens of databases 10 of query langagues, VMS moving towards commdity servers. Running into issues with scaling on environments like MySQL - They were used to code that crashed. In fact, I would say while I was on call, a service from one of the sites was down, at least once a week. Luckily they had redundancy in multiple sites, and multiple servers within those sites. The redundancy was added at a higher level, so generally, at least all of the times I remember, it was able to increase the availability and downtime wasn&apos;t actually an issue. - What was an issue was scale. - INteresting enough Hbase, even at that time, was a pretty highly available database. So what did they use it for - Time and Sales. This is the collection of all of the Quotes and Trades, for different securities. So to translate you put out quotes to buy or sell stocks at a certain price. If they overlapp, the echange registers a trade, and you just bought or sold a security. Not just stocks, but options and extremely high frequency data. - There was some value add on top of that, for calculating more complicated statistics on the fly through a home grown Web SASS thing - Cloudera: - Started off in kitchen focusing on building the packages that y’all know and love. When I entered it was all manual, when I left it was all automated. One could think of this as sortof like dev-opsie, meets, qa, meets release engineering, meets generic development - Moved into our first management tools team as a developer. Where we developed the cloudera manager. It was originally part of HUE and it became more springy. - Then I left Cloudera to be a founder in Drawn to Scale. We built a prototype and started pitching it for about 6 to 7 months. - While that was going on, I because the Lead Data Architect at OPower. And then more recently, after funding, I have returned to drawntoscale as a coder in the trenches, and have changed myself to a advisor to opower. The reason why I bring this up, is I have been working with HBase in production for about 5 years.
  • #4: Opower helps people use energy more efficiently and ultimately save money on their energy bills.it vastly improves the overall customer experience by making energy use personally relevant. - Behavioral Science (Great marketing, understanding people, great hci) - Data Science (Analytics, Data Infastructure Teams) - Lobbying (Yep we do lobbying)
  • #5: - OHow many of you get a bill - OPower White labeled websites. So this is the interface you probably use through your energy website to view how much power you use. Bill forecasting, etc. - Smart Thermostats - Gas and Electric - Social
  • #6: - Analytics is used to understand who we should be targeting - Answering questions that our customers what answered. We can help them improve customer service, improve there marketing, etc. - Justifying our own existence. (Compliance)
  • #7: - This is an old slide which doesn’t really include all the places we get data - Story about detecting broken thermostats
  • #8: - But it had it’s up - Spring and MVC provided a very clear and systematic way for developers developer systems. - It was very easy to manage from an operations perspective.
  • #9: - WE did this at FDS as well. Of course not with R, but specialized langauages. - IN fact our customers did as well, and they had a whole team of people to help customers do it.
  • #10: So here is the data sizes we have, along with the costs with traditional hadoop systems. - We were a cisco shop but we ended up going with dell, mostly because of the 3.5 inch disks. It looks like cisco is wising up to this whole hadoop thing. - These numbers are for dell. So I think this is priced out assuming a 710, then a 810 and then a 910 for the RDBMS, and 510&apos;s for hadoop.
  • #11: - A lot of this data just doesn’t work well with traditional databases. - An unnamed utility takes 3 days to mysqldump the ami data out. subsampling interpolation
  • #14: - I should warn you, i drawn almost all of my drawings in xfig so if this isn’t clear I’m sorry. - Basically the utility data has to come in from a variety of different protocols, as we integrate into the utility pipeline. It then flows into hbase, it’s validated from hbase, and then imported into our existent workflow. - Some of that data, you could imagine for instance information about user is still stored in MySQL. - All of the data is in a HIVE data lake
  • #15: All of our timeseries data in regards to high frequency data is being ported to being stored in HBase. Also soon things like bill forecasting, and a bunch of cool other stuff I probably should mention is being moved here. This includes data from the utilities, and data that users are enterring themselves. In additition thermostat data is moving here.
  • #16: - We still need to improve effeciency - We are doubling the size of the cluster this year - We have a ton of room to grow.
  • #17: - Having all of your data is a huge thing - Having a place to do m/r based R is great - No more running out of memory or being bounded to a single machine - Having a cheap scratch space
  • #18: At cloudera i thought all we needed was cfengine, snmp and syslog. Frankly that would have made ops happy. But more and more I think we made the right decision and that these tools really aren’t the right answer. JuJu looks interesting. - cloudera of course built there own tool. - access and auth