SlideShare a Scribd company logo
Real-time HBase:
Lessons from the Cloud
Bryan Beaudreault, @HubSpotDev
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
You’re doing it
WRONG!
Real-time HBase: Lessons from the Cloud
Instance types
Network,
Neighbors,
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
Improve reads, limit impact
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
Over-provision, fail fast
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
Real-time HBase: Lessons from the Cloud
PERSPECTIVE:
Consumer/
Prospect
Real-time HBase: Lessons from the Cloud
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
PERSPECTIVE:
Consumer/
Prospect
c1.xlarge
i2.4xlarge
CPU heavy workloads
Reduce memory footprint
Add more servers
Excellent, but expensive
Use data encoding to reduce disk
Use Java7 and G1 to reduce GCs
m1.xlarge Memory heavy workloads
Beware compactions
More regions
PERSPECTIVE:
Consumer/
Prospect
cpu fsWriteLatency
fsReadLatency
load
callQueueLen
compactionQueueSize
iowait
steal
heap
locality
requests
memstoreSizeMB
blockCacheHitCachingRatio
flushQueueSize
Real-time HBase: Lessons from the Cloud
PERSPECTIVE:
Consumer/
Prospect
Master HBase with us.
dev.hubspot.com
Real-time HBase: Lessons from the Cloud
Bryan Beaudreault
@HubSpotDev

More Related Content

PPTX
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
PDF
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
PPTX
HBaseCon 2015: HBase 2.0 and Beyond Panel
PPTX
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
PPTX
Apache HBase, Accelerated: In-Memory Flush and Compaction
PDF
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
PDF
Meet HBase 1.0
PPTX
HBase: Where Online Meets Low Latency
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
Meet HBase 1.0
HBase: Where Online Meets Low Latency

What's hot (20)

PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PDF
HBaseCon 2015: HBase Operations at Xiaomi
PDF
Usage case of HBase for real-time application
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PPTX
HBase Accelerated: In-Memory Flush and Compaction
PDF
HBaseCon 2015: Elastic HBase on Mesos
PDF
The State of HBase Replication
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PPTX
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
PDF
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
PPTX
Date-tiered Compaction Policy for Time-series Data
PDF
HBase Application Performance Improvement
PDF
HBase 0.20.0 Performance Evaluation
PPTX
HBaseCon 2015: HBase Performance Tuning @ Salesforce
PPTX
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
PPT
HBase at Xiaomi
PPTX
Meet hbase 2.0
PDF
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon 2015: HBase Operations at Xiaomi
Usage case of HBase for real-time application
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBase Accelerated: In-Memory Flush and Compaction
HBaseCon 2015: Elastic HBase on Mesos
The State of HBase Replication
HBase and HDFS: Understanding FileSystem Usage in HBase
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
Date-tiered Compaction Policy for Time-series Data
HBase Application Performance Improvement
HBase 0.20.0 Performance Evaluation
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBase at Xiaomi
Meet hbase 2.0
hbaseconasia2017: hbase-2.0.0
Ad

Viewers also liked (20)

PPTX
HBase Data Modeling and Access Patterns with Kite SDK
PDF
HBase: Extreme Makeover
PPTX
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
PPTX
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
PPTX
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
PPTX
HBaseCon 2015: HBase Operations in a Flurry
PDF
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
PPTX
Rolling Out Apache HBase for Mobile Offerings at Visa
PDF
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
PPTX
Update on OpenTSDB and AsyncHBase
PPTX
Digital Library Collection Management using HBase
PPTX
HBase at Bloomberg: High Availability Needs for the Financial Industry
PDF
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
PPTX
Content Identification using HBase
PDF
Apache HBase in the Enterprise Data Hub at Cerner
PDF
Apache HBase Improvements and Practices at Xiaomi
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
PPTX
HBaseCon 2015: HBase @ CyberAgent
PPTX
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBase Data Modeling and Access Patterns with Kite SDK
HBase: Extreme Makeover
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Update on OpenTSDB and AsyncHBase
Digital Library Collection Management using HBase
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
Content Identification using HBase
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase Improvements and Practices at Xiaomi
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2015: HBase @ CyberAgent
HBaseCon 2013: Full-Text Indexing for Apache HBase
Ad

Similar to Real-time HBase: Lessons from the Cloud (20)

PPTX
Top 10 lessons learned from deploying hadoop in a private cloud
PDF
Lessons learned from building Demand Side Platform
PPTX
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
PPTX
Meet HBase 2.0 and Phoenix-5.0
PPTX
HBase New Features
 
PDF
HBaseConAsia2019 Keynote
PPTX
Hannes end-of-the-router-tnc17
PPTX
HBase Low Latency, StrataNYC 2014
PDF
Architecting applications with Hadoop - Fraud Detection
PPTX
dumb
PPTX
dumb
PDF
AtlasCamp 2015: Damn you Facebook - Raising the bar in SaaS
PPTX
Streaming map reduce
PPT
Cloud Economics
PPTX
What it takes to run Hadoop at Scale: Yahoo! Perspectives
PPTX
Real-time searching of big data with Solr and Hadoop
PPTX
Network Traffic Search using Apache HBase
PDF
Apache Big Data EU 2015 - HBase
PDF
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
PPTX
How to scale recommendation system with HBase
Top 10 lessons learned from deploying hadoop in a private cloud
Lessons learned from building Demand Side Platform
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Meet HBase 2.0 and Phoenix-5.0
HBase New Features
 
HBaseConAsia2019 Keynote
Hannes end-of-the-router-tnc17
HBase Low Latency, StrataNYC 2014
Architecting applications with Hadoop - Fraud Detection
dumb
dumb
AtlasCamp 2015: Damn you Facebook - Raising the bar in SaaS
Streaming map reduce
Cloud Economics
What it takes to run Hadoop at Scale: Yahoo! Perspectives
Real-time searching of big data with Solr and Hadoop
Network Traffic Search using Apache HBase
Apache Big Data EU 2015 - HBase
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
How to scale recommendation system with HBase

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
hbaseconasia2017: HBase on Beam
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
hbaseconasia2017: Apache HBase at Netease
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
PDF
hbaseconasia2017: HBase at JD.com
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
HBaseCon2017 Democratizing HBase
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PDF
HBaseCon2017 Transactions in HBase
PDF
HBaseCon2017 Highly-Available HBase
PDF
HBaseCon2017 Apache HBase at Didi
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
PDF
HBaseCon2017 Improving HBase availability in a multi tenant environment
PDF
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
PDF
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
PDF
HBaseCon2017 HBase at Xiaomi
PDF
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: HBase on Beam
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon2017 Democratizing HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Transactions in HBase
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce

Recently uploaded (20)

PDF
Time Tracking Features That Teams and Organizations Actually Need
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Cost to Outsource Software Development in 2025
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
"Secure File Sharing Solutions on AWS".pptx
PPTX
Introduction to Windows Operating System
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
Computer Software and OS of computer science of grade 11.pptx
Time Tracking Features That Teams and Organizations Actually Need
Designing Intelligence for the Shop Floor.pdf
Salesforce Agentforce AI Implementation.pdf
Monitoring Stack: Grafana, Loki & Promtail
Cost to Outsource Software Development in 2025
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Patient Appointment Booking in Odoo with online payment
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
"Secure File Sharing Solutions on AWS".pptx
Introduction to Windows Operating System
Wondershare Recoverit Full Crack New Version (Latest 2025)
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Oracle Fusion HCM Cloud Demo for Beginners
Computer Software and OS of computer science of grade 11.pptx

Editor's Notes

  • #2: Tech Lead on Data Ops at HubSpot Talking about running HBase with real-time APIs Specifically, what we’ve learned from running in EC2 for 3 years Quickly, what is HubSpot?
  • #3: Inbound marketing company. Most marketers cobble together: GA, mail chimp, wordpress All-in-one marketing platform. Provide extra value due to context integrating tools provides. Use of hbase: sending emails, analytics data, customer’s leads & contacts, internal tools 5 clusters, 10-30 nodes. 1 shared hadoop cluster 9 teams using HBase as their datastore. Each team owns hadoop jobs, kafka topics, and APIs hitting HBase. Through 3 years of HBase operations, most days I’m doing things like…
  • #4: Reading logs, looking at data, changing configs, and creating tools. Digging into HBase code. Making sure everything runs smooth and fast for our developers and customers. And, as I’m sure you all can understand … I HATE
  • #5: Being woken up at night. When we first started running HBase in the cloud… We saw this pretty often. Not always the same time, but: ruined dinner plans, all-day fire fighting sessions, Sleep matters to me, and many nights, instead of sleeping, I found myself awake…
  • #6: A lot of inter-dependencies. Contacts is used by everything. If that Hbase goes down… There’s nothing fun about sitting in your cave in the dead of night, feverishly scrambling to get your entire product back online. Some weeks after a few nights, you can feel a bit exasperated…
  • #7: No one to call. Should running HBase in EC2 be this hard? It’s a distributed system with lots of moving parts, running across multiple data centers. We’re trying to mix real-time APIs with constant hadoop jobs. Maybe this is the name of the game.
  • #8: Wrong. Bigger companies are using HBase for bigger applications, and I don’t see them complaining. They aren’t running in the cloud. Cloud isn’t the problem, but it’s obvious running in the cloud adds a whole set of variables. Servers degrade, HBase becomes unresponsive. HBase is not currently equipped to deal with all of these issues. We can’t rely just on what HBase provides stability. How should it really be?
  • #9: This is really how it should be. Running in the cloud should be just like in any DC. Sit back and watch it run. Long road. We have gotten (mostly) there. No late night wake ups in months. Knock on wood. HBase runs itself. Performance is great: thousands of API r/s sustained, hundreds of thousands of hadoop jobs per month. How did we get there? There were a few challenges …
  • #10: Honestly: depending on your use case, hard to cheaply get multi-9 uptime A lot of respect for Pinterest: all writes to 2 clusters. Fail over as necessary. But we couldn’t follow their model. What can we do? Have to be proactive — augment HBase with your own automation. Limit issues, respond ASAP. Dedicate at least one person to this until stable; give him/her support. …
  • #11: A single EC2 availability zone is multiple data centers. Network is good, but can fluctuate. These fluctuations can cause a big problem, for reads and writes. Writes go to the memstore, and written locally when flushed. Though data is written locally, regions move all the time. When this happens.. Network is working against you, rather than for you. Network graph starts to become..
  • #12: Impossible to follow. Let’s say one node disappears. What happens? 1+ network hops per read. Scan crosses multiple HFiles? Even more network hops. Not great, but doable when you’re in a couple local racks. What happens when you’re across multiple data centers? Bottom line: your 99th percent degrades, and the impact of 1 loss can be huge. …
  • #13: So… Region moves as a result of: RS dies; region splits; periodic balancer runs. Each region move is more entropy, slower requests, slower recovery. With this in mind, what can we do?
  • #14: Maintain 100% locality always. That is, make sure all region data is always written locally. When you lose locality (RS dies), heal ASAP. Always compact regions after moving them. Maintaining locality will…
  • #15: With short circuit reads that means straight to disk or memory. Loss of a region server will still require failover, but that RS is no longer host data for other servers. So client requests to other RS will be mostly unaffected. Overall you’re in a much better place. How can we achieve this?
  • #16: Default balances to keep region server load even. Doesn’t compact regions post-move. Disable HBase balancer, and write your own. Use HBaseAdmin API to move and compact. Using cost functions, prioritize Locality. Compact on move. Rate limit. Open-sourced. Graceful shutdown: Hook into balancer. Compact on move Disable splits. Track region moves and locality. Mention: 0.96.x: Stochastic load balancer. …
  • #17: You’re almost never alone. A single instance is part of a much larger neighborhood. Instances are virtualized on a physical host. Depending on your instance type, one physical host is shared with any number of neighbors. Those neighbors are all doing their own thing. CPU intensive calculations; saturating their disk or networks. And, despite virtualization…
  • #18: These neighbors can have significant impact on your instance. Disk slowness (unexplained iowait), cpu slowness (steal %), general server degradation HBase will continue running through most of these issues. Client calls build up, APIs start alerting. Impacting customers again. How can we avoid (or mitigate) this?
  • #19: HBase is good at this when a process or server just dies out right, because the ZK node will go away. Most EC2 failures don’t work like that though. We run with 10-30% more than needed. The moment a server gives you issues, kill it Try moving regions off, if it is slow just kill -9. Using hbase’s stop command may be too slow, if the host is having issues.. needs to flush memstore, etc. Just rely on WAL replay will be faster. But maybe we can do even better…
  • #20: Two http endpoints: JMX as JSON, RS-status as JSON Region server status page can also print JSON. We can write a simple script to parse these. Look for callQueueLen >= 10x RPC handlers. Inspect the handlers from the RS-status output. Start logging. We take thread and heap dumps every few seconds, and log things like cpu load, iowait, steal, network io. This provides a lot of great data for debugging. Optionally add killing, by removing znode or kill -9 after some threshold. …
  • #21: It’s tempting to think of HBase as a catchall that can handle all your different use cases. It can do a lot, but it needs to be tuned accordingly. Initially had 1 cluster, and it was a nightmare. Heavy writes of Analytics conflicting with heavy reads of Contacts. Apps did not fit well together. Landscape constantly changing. Any time we made a change to accommodate one team, it impacted every team using the cluster. We tried to make it work for a while, and this actually caused us to write better, safer code. Eventually it became too much work.
  • #22: Broke them up. One of our best decisions. Partition your clusters, separate your concerns. Do it by usage pattern, optimize each accordingly. Systems like puppet make keeping these clusters similar easy. Libraries like fabric make customization easy as well. Use LDAP to give each server a cluster name; sync all of our configs to S3 so clients can read them. Easier to make decisions. Easier to operate. Easier to track down failures. This partitioning goes for hadoop too.
  • #23: Mentioned we run hundreds of thousands of hadoop jobs per month, most run against HBase. Keeping them in control is critical for our real-time APIs 1 region = 1 mapper. 30 regions per regionserver might mean 30 mappers all running at once. Wrote our own InputFormat and RecordReader, which groups all regions for a RegionServer onto 1 mapper (configurable). Reducers already have idea of Partitioner. Use the HTable interface getRegionLocations to return RS mappings, and do same. …
  • #24: HubSpot APIs sustain multiple thousands of requests per second. If a RegionServer is dying, or hadoop job is hammering, requests will hang in API. With high concurrency, even with low timeouts, threads can pile up. Starvation could bring down all API nodes, even though only a portion of data was really unavailable. We know quick: we monitor threads very closely. codahale’s metrics library But you shouldn’t respond manually, and there are patterns for this…
  • #25: That’s where Hystrix comes in. It’s a circuit breaker from Netflix. Modified HBase client to provide a circuit per-regionserver. So region server slows down, open circuit to fail those requests, allowing others to succeed. Hystrix will trickle requests to that RegionServer. Will close circuit when all is OK. Also provides a great dashboard to get a view of latencies and r/s per regionserver from the client’s perspective. …
  • #26: Basically, there are no ideal instance types in EC2 Some have not enough CPU, some not enough memory, some not enough disk Old generation is underpowered in memory and CPU, New generation have extremely small SSD disks. HBase is meant to run on commodity hardware, but this hardware should be configured appropriately. Most instance types weren’t designed with HBase in mind. When you use the wrong size…
  • #27: It won’t look as adorable as this. You’re gonna have a bad time. HBase needs a certain amount of memory. Without, you face OOME and inefficient writes. It needs enough CPU to handle compactions. And your disk performance is critical for your overall read performance. But you can make it work…
  • #28: You just need to choose wisely, and realize it’s not hard to change. Is your data set very dense? Do you do a lot of writes, or just a lot of reads? Are you bulk loading to run over with hadoop? It all depends on your workload. Do some testing. We had our own progression at HubSpot…
  • #29: Started with m1.xlarge. Couldn’t handle the compactions. Moved to c1.xlarge. Struggled with memory: frequent small flushes, no page cache, oom killer Fixed: reduced regions, aggressive caching/batching. Recently released i2.4xlarge. Game changer, but expensive. Disk space issue Replaced c1.xlarges, with a reasonable increase in cost. Was worth it for the stability. 25GB heap. 30-50% CPU . Low iowait. Talk about use cases (m1xlarge == append only, etc). …
  • #30: Metrics and data make all of these decisions a little easier. HBase’s greatest strength, one of biggest weaknesses Hundreds of metrics. Using them is like exploring a vast, uncharted territory. Mostly undocumented. Metrics for almost everything you could want. Per region, per table, etc A bit much, and overwhelming. Still doesn’t give the full picture. Hard to visualize, hard to know what to look for to detect problems A few that we found especially useful…
  • #31: I have mentioned some throughout. Biggest ones for us are callQueueLen and client metrics. fs latencies help to see problems in HDFS or disk. Keep your queue sizes down. Of course monitor OS-level metrics like steal, load, and free memory. We store all metrics in OpenTSDB. Great datastore. We found it helpful to be able to explore these more freely…
  • #32: Colleague wrote lead.js, named after Graphite — it’s first integration. Open-source. What is it? Frontend for time series data from systems like Graphite and OpenTSDB Similar to IPython Notebook. Use coffee script to explore data. Hover over graph to see values at a time. Hide and highlight series as needed. Explain example. Available on github @ http://lead.github.io/ …
  • #33: We haven’t been afraid to get scrappy and hack together the tools we need. But we didn’t always have the answers, and at times learning HBase seemed insurmountable. It really isn’t though, and doesn’t have to seem that way. There has been a lot of development in the community since we started. Now: Great docs, very active user list, and lots of external resources On top of that, we at HubSpot are starting to open-source and talk about these things…
  • #34: So I’d like to invite you to reach out to us. Check out our blog, where we will be posting a lot more details in the coming days and weeks.
  • #35: So we can all sit back, relax, and watch HBase run.
  • #36: Thanks! Questions?