SlideShare a Scribd company logo
Got Problems?
Developers Most Frequent Headaches
and How to Address Them
Shevek
CTO
Session Agenda

 Introduction

 Problems Past

 Problems Present

 Problems Future


 Wrap Up
                     2
Who am I!


 Co-founder and CTO at
 Architect of Karmasphere’s solutions
 Have been working with Hadoop since …
 Written a few compilers
 Broken a few things:
 ›   computers, security systems, bosses, etc.




                                3
Survey of Questions
                                                                   1164 Questions
                                                                                    100%

                                                                       Others


                                                                                    80%
                       •How to maintain the cluster?

                   •Why does Hadoop do ….?
                                                                                    60%


               •How to know what the cluster is doing?
                                                                                    40%
           •How to use Hadoop?

       •How to get stuff to/from Hadoop?                                            20%


   •How to setup Hadoop?
                                                                                    0%

                                      Based on user questions and issues
                                                         4
Source: Hadoop Users Mail-list (March 2009-June 2010
Problems Past –Cluster as a Utility

 Getting a cluster – it’s a utility (like electricity)
   ›   Amazon EMR, Hadoop, Cloudera, IBM, Yahoo


 Cluster versions and protocols
   ›   Easy to switch between clusters
   ›   Staging for faster development
   ›   Easy to migrate data
   ›   Talk to remote clusters
Karmasphere Client
 Ensures Hadoop distribution and version independence
 Works from Windows (unlike Hadoop Client), Mac and Linux
 Supports any Hadoop environment: private, public or cloud
  service.
 Provides:
 ›   Job portability
 ›   Operating system portability
 ›   Firewall hopping and tunnelling
 ›   Fault tolerant API
 ›   Synchronous and Asynchronous API
 ›   Clean Object Oriented design
 Making it easy and predictable to maintain a business
  operation reliant on Hadoop
Cluster Access
Problems Present – Interact with Cluster

 Getting data in
 Getting data out
Problems Present – Interact with Cluster

 Getting data in
 Getting data out
…




                     This is the problem.
Can’t Get data out   Have to extract information
Writing a MapReduce Job

 Understanding MapReduce
 Boilerplate is boring
 Testing takes time
 Debugging is difficult




                            What Happened?
Karmasphere Job Developer
Present Continuous

 Why did my job fail?
 ›   Monitoring
 ›   Diagnostics
 ›   Debugging
 What do I need to know about my job?
 ›   Valgrind, lint, coverity, gprof, gdb, findbugs, sparse,
     JSR305, ....
 Why did my job do ….?
Karmasphere Studio - Continuous
Problems Future

 Hive
 Pig
 Cascading
 Others ….
High Level Languages - Challenges

 Accessibility
 Integration
 Portability
 Diagnostics
Karmasphere Application Framework
Traditional Approach                                                  Karmasphere Approach

                                                                User                                                                 User
Client Side




              Rich communications required for Hive
                                                                                                                                         Rich Communication
                                                                                                                                         Supported within Karmasphere
                                                                                                                                         Application framework




                                                                                          Debug/ optimization information
                                                        Hive JDBC Thrift Proxy                                                   Karmasphere
                                                                                                                                  Application
                                                                    All communications
                                                                                                                                  Framework
                                                                    ‘hampered’
                                                                    through JDBC Thrift
                                                                    proxy




                                                             Thrift Server                                                               Native
                                                                                                                                         Hadoop
                                                                                                                                         Protocol
                                                            Hive Engine
Server Side




                                                           Hadoop Client



                                                             Job Tracker                                                            Job Tracker


                                                            Cluster                                                                Cluster
                                                          (Hadoop)                                                               (Hadoop)
Your time
  costs money
                 Theory


Results                   Experiment

  Confidential
Get Working Efficiently with Hadoop


 Karmasphere Studio: Community Edition           Free

 Karmasphere Studio: Professional Edition
 ›   ($200 introductory discount for attendees)
 Karmasphere Client (Enterprise license)
 Karmasphere Studio: Analyst Edition
 ›   Coming sooner than you think!
Questions?


    shevek@karmasphere.com

More Related Content

PPT
Low Latency SQL on Hadoop - What's best for your cluster
PDF
hadoop_module6
PDF
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
PPTX
Hecatonchire kvm forum_2012_benoit_hudzia
PPT
Data Science Day New York: The Platform for Big Data
PDF
Scaling data on public clouds
PDF
Lego Cloud SAP Virtualization Week 2012
PDF
How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions ...
Low Latency SQL on Hadoop - What's best for your cluster
hadoop_module6
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Hecatonchire kvm forum_2012_benoit_hudzia
Data Science Day New York: The Platform for Big Data
Scaling data on public clouds
Lego Cloud SAP Virtualization Week 2012
How AOL Advertising Uses NoSQL to Make Millions of Smart Targeting Decisions ...

What's hot (20)

PDF
Liquidity Risk Management powered by SAP HANA
PPTX
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
PDF
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
PDF
Adversity: Good for software
PDF
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
PPTX
How did you know this ad would be relevant for me?
PDF
Real-Time Loading to Sybase IQ
PDF
Hadoop 101
 
PDF
Storage infrastructure using HBase behind LINE messages
ODP
The power of hadoop in cloud computing
PDF
An introduction to apache drill presentation
PDF
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
PPTX
Introduction to Apache HBase Training
PPTX
Hadoop World 2011: Mike Olson Keynote Presentation
PPTX
Introduction to Cloudera's Administrator Training for Apache Hadoop
PPTX
Windows Azure Design Patterns
PDF
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
PPTX
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
PPT
Dynamo Systems - QCon SF 2012 Presentation
PDF
App cap2956v2-121001194956-phpapp01 (1)
Liquidity Risk Management powered by SAP HANA
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
Adversity: Good for software
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
How did you know this ad would be relevant for me?
Real-Time Loading to Sybase IQ
Hadoop 101
 
Storage infrastructure using HBase behind LINE messages
The power of hadoop in cloud computing
An introduction to apache drill presentation
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Introduction to Apache HBase Training
Hadoop World 2011: Mike Olson Keynote Presentation
Introduction to Cloudera's Administrator Training for Apache Hadoop
Windows Azure Design Patterns
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Dynamo Systems - QCon SF 2012 Presentation
App cap2956v2-121001194956-phpapp01 (1)
Ad

Similar to Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010 (20)

PPTX
NYC-Meetup- Introduction to Hadoop Echosystem
PDF
Hadoop - Now, Next and Beyond
PDF
hadoop @ Ibmbigdata
PDF
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
PPT
HDP-1 introduction for HUG France
PPTX
Apache Hadoop Now Next and Beyond
PDF
Imaginea product-support-offering
PDF
Big Data/Hadoop Infrastructure Considerations
PDF
Imaginea - Ideas to Life - About Us
PDF
Searching conversations with hadoop
PDF
Cloud computing era
PDF
Hadoop summit cloudera keynote_v5
PPTX
Ambari Meetup: Architecture and Demo
PPTX
Hadoop For Enterprises
PDF
Architecting the Future of Big Data & Search - Eric Baldeschwieler
PDF
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
PDF
Hadoop + Forcedotcom = Like
PDF
3 12-2013 performance-testing_service_virtualization
PDF
Introduction to Hadoop
KEY
The Other Way of Doing Big Data
NYC-Meetup- Introduction to Hadoop Echosystem
Hadoop - Now, Next and Beyond
hadoop @ Ibmbigdata
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
HDP-1 introduction for HUG France
Apache Hadoop Now Next and Beyond
Imaginea product-support-offering
Big Data/Hadoop Infrastructure Considerations
Imaginea - Ideas to Life - About Us
Searching conversations with hadoop
Cloud computing era
Hadoop summit cloudera keynote_v5
Ambari Meetup: Architecture and Demo
Hadoop For Enterprises
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Hadoop + Forcedotcom = Like
3 12-2013 performance-testing_service_virtualization
Introduction to Hadoop
The Other Way of Doing Big Data
Ad

More from Yahoo Developer Network (20)

PDF
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
PDF
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
PDF
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
PDF
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
PDF
CICD at Oath using Screwdriver
PDF
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
PPTX
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
PDF
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
PPTX
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
PDF
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
PPTX
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
PDF
Moving the Oath Grid to Docker, Eric Badger, Oath
PDF
Architecting Petabyte Scale AI Applications
PDF
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
PPTX
Jun 2017 HUG: YARN Scheduling – A Step Beyond
PDF
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
PPTX
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
PPTX
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
CICD at Oath using Screwdriver
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Moving the Oath Grid to Docker, Eric Badger, Oath
Architecting Petabyte Scale AI Applications
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSummit2010

  • 1. Got Problems? Developers Most Frequent Headaches and How to Address Them Shevek CTO
  • 2. Session Agenda  Introduction  Problems Past  Problems Present  Problems Future  Wrap Up 2
  • 3. Who am I!  Co-founder and CTO at  Architect of Karmasphere’s solutions  Have been working with Hadoop since …  Written a few compilers  Broken a few things: › computers, security systems, bosses, etc. 3
  • 4. Survey of Questions 1164 Questions 100% Others 80% •How to maintain the cluster? •Why does Hadoop do ….? 60% •How to know what the cluster is doing? 40% •How to use Hadoop? •How to get stuff to/from Hadoop? 20% •How to setup Hadoop? 0% Based on user questions and issues 4 Source: Hadoop Users Mail-list (March 2009-June 2010
  • 5. Problems Past –Cluster as a Utility  Getting a cluster – it’s a utility (like electricity) › Amazon EMR, Hadoop, Cloudera, IBM, Yahoo  Cluster versions and protocols › Easy to switch between clusters › Staging for faster development › Easy to migrate data › Talk to remote clusters
  • 6. Karmasphere Client  Ensures Hadoop distribution and version independence  Works from Windows (unlike Hadoop Client), Mac and Linux  Supports any Hadoop environment: private, public or cloud service.  Provides: › Job portability › Operating system portability › Firewall hopping and tunnelling › Fault tolerant API › Synchronous and Asynchronous API › Clean Object Oriented design  Making it easy and predictable to maintain a business operation reliant on Hadoop
  • 8. Problems Present – Interact with Cluster  Getting data in  Getting data out
  • 9. Problems Present – Interact with Cluster  Getting data in  Getting data out … This is the problem. Can’t Get data out Have to extract information
  • 10. Writing a MapReduce Job  Understanding MapReduce  Boilerplate is boring  Testing takes time  Debugging is difficult What Happened?
  • 12. Present Continuous  Why did my job fail? › Monitoring › Diagnostics › Debugging  What do I need to know about my job? › Valgrind, lint, coverity, gprof, gdb, findbugs, sparse, JSR305, ....  Why did my job do ….?
  • 13. Karmasphere Studio - Continuous
  • 14. Problems Future  Hive  Pig  Cascading  Others ….
  • 15. High Level Languages - Challenges  Accessibility  Integration  Portability  Diagnostics
  • 17. Traditional Approach Karmasphere Approach User User Client Side Rich communications required for Hive Rich Communication Supported within Karmasphere Application framework Debug/ optimization information Hive JDBC Thrift Proxy Karmasphere Application All communications Framework ‘hampered’ through JDBC Thrift proxy Thrift Server Native Hadoop Protocol Hive Engine Server Side Hadoop Client Job Tracker Job Tracker Cluster Cluster (Hadoop) (Hadoop)
  • 18. Your time costs money Theory Results Experiment Confidential
  • 19. Get Working Efficiently with Hadoop  Karmasphere Studio: Community Edition Free  Karmasphere Studio: Professional Edition › ($200 introductory discount for attendees)  Karmasphere Client (Enterprise license)  Karmasphere Studio: Analyst Edition › Coming sooner than you think!
  • 20. Questions? shevek@karmasphere.com