SlideShare a Scribd company logo
Quick House Keeping Rule

• Q&A panel is available if you have any questions during the
 webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides and recording




                                                              Page 1
        © Hortonworks Inc. 2013
Introducing Hortonworks Data Platform for Windows

Enterprise Apache Hadoop for Windows Environments



March 2013




© Hortonworks Inc. 2013                             Page 2
Our Speakers

                     John Kreisa
                     VP, Strategic Marketing




                                         Saptak Sen
                                         Sr. Product Manager




                       Rohit Bakshi
                       Product Manager




                                                               Page 3
     © Hortonworks Inc. 2013
Agenda

• Why Hadoop on Windows?

• Hortonworks Data Platform for Windows

• Microsoft - Big Data and Apache Hadoop

• Hortonworks Data Platform under the covers

• Q&A




                                               Page 4
        © Hortonworks Inc. 2013
Polling Question
Where are you with Hadoop?

__ We are running it in production

__ We have it running in our labs

__ We are just investigating Hadoop

__ What is Hadoop?




                                      Page 5
       © Hortonworks Inc. 2013
Agenda

• Why Hadoop on Windows?

• Hortonworks Data Platform for Windows

• Microsoft - Big Data and Apache Hadoop

• Hortonworks Data Platform under the covers

• Q&A




                                               Page 6
        © Hortonworks Inc. 2013
Why Apache Hadoop on Windows?
• According to IDC Windows Server held 73% market share in 2012
   – Hadoop was traditionally built for Linux servers so there are a large number of
     underserved organizations


• Apache Hadoop: de-facto platform for processing
  massive amounts of unstructured data
   – Complementary to existing Microsoft technologies
   – There is a huge untapped community of Windows developers
     and ecosystem partners


• A strong Microsoft-Hortonworks partnership and 18 months of
  development makes this a natural next step




                                                                                       Page 7
        © Hortonworks Inc. 2013
What Makes Up Big Data?

                                                                    Transactions + Interactions
Petabytes
                 BIG DATA                       Mobile Web                  + Observations
                                                Sentiment          SMS/MMS

                                                 User Click Stream
                                                                                  = BIG DATA
                                                                         Speech to Text

                                                               Social Interactions & Feeds
 Terabytes       WEB                Web logs
                                                                        Spatial & GPS Coordinates
                                        A/B testing
                                                                               Sensors / RFID / Devices
                                                 Behavioral Targeting
  Gigabytes      CRM                                                                   Business Data Feeds
                                                            Dynamic Pricing
                                    Segmentation                                             External Demographics
                                                                   Search Marketing
                                        Customer Touches                                      User Generated Content
                 ERP
  Megabytes                                                           Affiliate Networks
                  Purchase detail              Support Contacts                                  HD Video, Audio, Images
                                                                        Dynamic Funnels
                  Purchase record
                                                   Offer details           Offer history           Product/Service Logs
                  Payment record



                                                 Increasing Data Variety and Complexity


                                                                                                                           Page 8
              © Hortonworks Inc. 2013
Big Data: Big and Getting Bigger Fast!
• Unstructured data growth exceeds 80% year/year in most enterprises
    – Machine-generated data is a key driver in data growth


• IDC projects digital universe will reach 40 zettabytes (ZB) by 2020
    – 1 ZB = 1,000,000,000,000 GBs!
    – Projected to increase 15x by 2020


• According to 2012 Barclays CIO study big data
  outranks virtualization as #1 spending initiative




                                                         *2012 IDC Digital Universe Study   Page 9
         © Hortonworks Inc. 2013
Enter Apache Hadoop
The core of the next generation data platform…




                                                          OSS that delivers high-scale
   HADOOP CORE                   HDFS        MAP REDUCE
                                                          storage & processing with
                                                          enterprise-ready platform
   PLATFORM SERVICES               Enterprise Readiness
                                                          services




   Hortonworkers are the original
   architects, operators, and builders of
   core Hadoop



                                                                                 Page 10
       © Hortonworks Inc. 2013
Agenda

• Why Hadoop on Windows?

• Hortonworks Data Platform for Windows

• Microsoft - Big Data and Apache Hadoop

• Hortonworks Data Platform under the covers

• Q&A




                                               Page 11
        © Hortonworks Inc. 2013
Introducing HDP for Windows

 OPERATIONAL                                DATA            Hortonworks
   SERVICES                               SERVICES
                                                            Data Platform (HDP)
   Manage &
   Manage &                                Store,
                                           Store,
   Operate at
   Operate at                           Process and
                                        Process and         For Windows
     Scale
     Scale                              Access Data
                                        Access Data

                                                            • 100% Open Source
 HADOOP CORE
                                 Distributed                  Enterprise Hadoop
                                 Storage & Processing

                                                            • Component and version
 PLATFORM SERVICES                   Enterprise Readiness
                                                              compatible with Microsoft
                                                              HDInsight
                                HORTONWORKS
                                DATA PLATFORM (HDP)         • Availability
                                For Windows
                                                               •   Beta release available now

                                                               •   GA early 2Q 2013




                                                                                            Page 12
      © Hortonworks Inc. 2013
Hortonworks Data Platform for Windows
HDP: the first and only distribution available on Windows & Linux
• Enterprise-grade Apache Hadoop on Windows
   – Enables same experience for Hadoop on Windows & Linux


• More partners, more developers for Hadoop
   – Makes native Apache Hadoop available to Windows ecosystem
   – More options for Windows focused organizations


• Hortonworks focus: Enterprise Apache Hadoop for all platforms
   – Trusted reliable production-ready distribution for on-premise Hadoop on Windows
     deployments


• Built with joint investment and contributions from Microsoft
   – Deep engineering relationship ensures tight integration and maximum performance




                                                                                 Page 13
       © Hortonworks Inc. 2013
Hortonworks: Best In Class Hadoop Support
• Experienced enterprise support team
   – Experience supporting enterprise clients in production
   – Core engineers have real operational
     experience: built and supported 44+K nodes in production
   – Extensive experience in commercial big data offerings
     including HDP, MapR, Karmasphere




• Global 24x7 operation – support based in Sunnyvale, UK & India

• Stringent case management processes ensures high quality customer
  service & responsiveness




                                                                   Page 14
       © Hortonworks Inc. 2013
Transferring Our Hadoop Expertise to You
                              The expert source for
                              Apache Hadoop training & certification

                              • World class training programs designed to
                                help you learn fast
                                 – Role-based hands on classes with 50% lab time
                                 – New HDP on Windows course

                              • Expert consulting services
                                 – Programs designed to transfer knowledge


                              • Industry leading Hadoop Sandbox program
                                 – Fastest way to learn Apache Hadoop
                                 – Multi-level tutorials for wide applicability
                                 – Customizable and updateable

                                                                                  Page 15
    © Hortonworks Inc. 2013
Hortonworks Snapshot

                                         We develop, distribute and support
                                         the ONLY 100% open source
 Headquarters: Palo Alto, CA
 Employees: 180+ and growing
                                         Enterprise Hadoop distribution
 Investors: Benchmark, Index, Yahoo



Develop                                     Distribute                      Support
• We employ the core                   • We distribute the only 100%   • We are uniquely positioned
  architects, builders and               Open Source Enterprise          to deliver the highest quality
  operators of Apache Hadoop             Hadoop Distribution:            of Hadoop support
                                         Hortonworks Data
• We drive innovation within             Platform                      • We enable the ecosystem to
  Apache Software                                                        work better with Hadoop
  Foundation projects                  • We engineer, test & certify
                                         HDP for enterprise usage

Endorsed by Strategic Partners




                                                                                                   Page 16
             © Hortonworks Inc. 2013
Agenda

• Why Hadoop on Windows?

• Hortonworks Data Platform for Windows

• Microsoft - Big Data and Apache Hadoop

• Hortonworks Data Platform under the covers

• Q&A




                                               Page 17
        © Hortonworks Inc. 2013
Microsoft Big Data
Microsoft Big Data
   – Simplifies data management for IT
   – Enables IT and users to easily enrich their data with the world’s data, and
   – Delivers agility to end users through familiar tools like Excel




                                                                                   Page 18
        © Hortonworks Inc. 2013
Microsoft End-To-End Big Data Platform




                                     Page 19
     © Hortonworks Inc. 2013
Agenda

• Why Hadoop on Windows?

• Hortonworks Data Platform for Windows

• Microsoft - Big Data and Apache Hadoop

• Hortonworks Data Platform under the covers

• Q&A




                                               Page 20
        © Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
                                                          Deliver high-scale
                                                          storage & processing
                                                          with enterprise-ready
                                                          platform services
                               WEBHDFS       MAP REDUCE   Unique Focus Areas:
 HADOOP CORE
                                HDFS                      • Bigger, faster, more flexible
                                                           Continued focus on speed & scale and
 PLATFORM SERVICES                 Enterprise Readiness    enabling near-real-time apps


                                                          • Tested & certified at scale
                                                           Run ~1300 system tests on large clusters
                                                           for every release
 Hortonworkers are the architects,
 operators, and builders of core Hadoop
                                                          • Enterprise-ready services
                                                           High availability, disaster recovery,
                                                           snapshots, security, …


                                                                                              Page 21
     © Hortonworks Inc. 2013
Data Services for Full Data Lifecycle

                                          DATA
                                                           Provide data services to
                                        SERVICES           store, process & access
                                SQOOP     PIG   HIVE       data in many ways
                                          HCATALOG
                                                           Unique Focus Areas:
                                Distributed                • Apache HCatalog
  HADOOP CORE                   Storage & Processing        Metadata services for consistent table
                                                            access to Hadoop data
  PLATFORM SERVICES                 Enterprise Readiness   • Apache Hive
                                                            Explore & process Hadoop data via SQL &
                                                            ODBC-compliant BI tools



 Hortonworks enables Hadoop data to be
 accessed via existing tools & systems




                                                                                             Page 22
      © Hortonworks Inc. 2013
Data Services for Full Data Lifecycle

                                          DATA
                                                           Provide data services to
                                        SERVICES           store, process & access
                                SQOOP
                                          PIG   HIVE       data in many ways
                                          HCATALOG
                                                           Unique Focus Areas:
                                Distributed                • Apache HCatalog
  HADOOP CORE                   Storage & Processing        Metadata services for consistent table
                                                            access to Hadoop data
  PLATFORM SERVICES                 Enterprise Readiness   • Apache Hive
                                                            Explore & process Hadoop data via SQL &
                                                            ODBC-compliant BI tools



 Hortonworks enables Hadoop data to be
 accessed via existing tools & systems




                                                                                             Page 23
      © Hortonworks Inc. 2013
Data Services for Full Data Lifecycle

                                          DATA
                                                           Provide data services to
                                        SERVICES           store, process & access
                                SQOOP
                                          PIG   HIVE       data in many ways
                                          HCATALOG
                                                           Unique Focus Areas:
                                Distributed                • Apache HCatalog
  HADOOP CORE                   Storage & Processing        Metadata services for consistent table
                                                            access to Hadoop data
  PLATFORM SERVICES                 Enterprise Readiness   • Apache Hive
                                                            Explore & process Hadoop data via SQL &
                                                            ODBC-compliant BI tools



 Hortonworks enables Hadoop data to be
 accessed via existing tools & systems




                                                                                             Page 24
      © Hortonworks Inc. 2013
Operational Services for Ease of Use

 OPERATIONAL                              DATA
                                                           Include complete
   SERVICES                             SERVICES           operational services for
                                          Store,           productive operations
     Oozie                             Process and
                                       Access Data         & management

                                Distributed                • Apache Oozie:
  HADOOP CORE                   Storage & Processing        Manage and schedule job execution for
                                                            Hadoop jobs
  PLATFORM SERVICES                 Enterprise Readiness




 Only Hortonworks provides a complete
 open source Hadoop management tool




                                                                                           Page 25
      © Hortonworks Inc. 2013
Inside HDP for Windows

 OPERATIONAL                                DATA             Hortonworks
   SERVICES                               SERVICES
                                                             Data Platform (HDP)
   Manage &                              Store, HIVE
                                         PIG
    Oozie
   Operate at                         Process and            For Windows
     Scale                            Access Data
                                SQOOP    HCATALOG
                                                             • 100% Open Source
                                WEBHDFS
                                Distributed                    Enterprise Hadoop
 HADOOP CORE                    Storage & ProcessingREDUCE
                                 HDFS          MAP
                                                             • Component and version
 PLATFORM SERVICES                                             compatible with Microsoft
                                                               HDInsight
                                HORTONWORKS
                                DATA PLATFORM (HDP)          • Availability
                                For Windows
                                                                •   Beta release available now

                                                                •   GA early 2Q 2012




                                                                                             Page 26
      © Hortonworks Inc. 2013
Seamless Interoperability with Your Microsoft Tools

                                                                                                      • Integrated with Microsoft tools
APPLICATIONS




                                                                                                        for native big data analysis
                                                                                                          – Bi-directional connectors for SQL
                                               Microsoft Applications                                       Server and SQL Azure through SQOOP
                                                                                                          – Excel ODBC integration through Hive


                                                                                                      • Addressing demand for Hadoop
                                                                                                        on Windows
                                                                                                          – Ideal for Windows customers with
DATA SYSTEMS




                                                                          HORTONWORKS                       Hadoop operational experience
                                                                          DATA PLATFORM
                                                                          For Windows                 • Enables all common Hadoop
                                                                                                        workloads
                                                                                                          – Data refinement and ETL offload for
                                                                                                            high-volume data landing
                                                                                                          – Data exploration for discovery of new
                                                                                                            business opportunities
DATA SOURCES




                   Traditional Sources                              New Sources
                OLTP, (RDBMS, OLTP, OLAP)             (web logs, email, sensor data, social media)
                                                                                             MOBILE
                 POS                                                                         DATA
               SYSTEMS




                                                                                                                                      Page 27
                     © Hortonworks Inc. 2013
Demo Time!




                          Excel integration with HDP
                          • Interact with HDP through Excel
                          • Use Data Explorer to explore and turn raw data
                            into valuable information




                                                                      Page 28
    © Hortonworks Inc. 2013
Maximize Your Hadoop Deployment Choice
• Use HDP for Windows for on-premises deployment on Windows Server
   – Ideal for Windows users with Hadoop experience
   – Perfect next step for those who are ready to move from POC to production


• Use HDInsight for Microsoft tooling and Management and Provisioning
   – HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) –
     available in Preview today
   – HDInsight Server for full integration of Hadoop with Microsoft tools on premises –
     Developer Preview available today


• Full interoperability and deployment choice across platforms
   – Implement big data applications that run on-premise & cloud
   – By leveraging open source HDP, enables seamless interoperability across
     environments: Linux, Windows, Windows Azure




                                                                                           Page 29
        © Hortonworks Inc. 2013
Next Steps
                             Download Hortonworks Sandbox
                             www.hortonworks.com/sandbox




                             Download Hortonworks Data Platform for
                             Windows (Beta)
                             www.hortonworks.com/download




                             Follow…
                             @hortonworks, @hortonworks_U



                                                               Page 30
   © Hortonworks Inc. 2013

More Related Content

PDF
Apache Hadoop on the Open Cloud
PDF
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
PPT
Data Discovery, Visualization, and Apache Hadoop
PPTX
Introduction to Microsoft HDInsight and BI Tools
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PDF
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
PDF
Introduction to Hortonworks Data Platform
PPTX
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Apache Hadoop on the Open Cloud
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Data Discovery, Visualization, and Apache Hadoop
Introduction to Microsoft HDInsight and BI Tools
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Introduction to Hortonworks Data Platform
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...

What's hot (20)

PPTX
YARN Ready: Integrating to YARN with Tez
PPTX
Bigger Data For Your Budget
PDF
Hortonworks Presentation at Big Data London
PPTX
Big Data Analytics - Is Your Elephant Enterprise Ready?
PPTX
Introduction to the Hortonworks YARN Ready Program
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
PPTX
State of the Union with Shaun Connolly
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
PDF
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
PPTX
Hortonworks Yarn Code Walk Through January 2014
PDF
Cloudian 451-hortonworks - webinar
PDF
Hortonworks and Red Hat Webinar - Part 2
PDF
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
PDF
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
PDF
Hortonworks and Platfora in Financial Services - Webinar
PDF
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
PDF
Software Architecture and Predictive Models in R
PPTX
Enabling the Real Time Analytical Enterprise
YARN Ready: Integrating to YARN with Tez
Bigger Data For Your Budget
Hortonworks Presentation at Big Data London
Big Data Analytics - Is Your Elephant Enterprise Ready?
Introduction to the Hortonworks YARN Ready Program
Data Lake for the Cloud: Extending your Hadoop Implementation
Enterprise Hadoop with Hortonworks and Nimble Storage
State of the Union with Shaun Connolly
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks Yarn Code Walk Through January 2014
Cloudian 451-hortonworks - webinar
Hortonworks and Red Hat Webinar - Part 2
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
Hortonworks and Platfora in Financial Services - Webinar
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Software Architecture and Predictive Models in R
Enabling the Real Time Analytical Enterprise
Ad

Viewers also liked (20)

PDF
Using Tableau with Hortonworks Data Platform
PPTX
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
PDF
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
PDF
Zementis hortonworks-webinar-2014-09
PDF
Hive Functions Cheat Sheet
PDF
Building a Big Data platform with the Hadoop ecosystem
PPTX
How Universities Use Big Data to Transform Education
PPTX
Hive - 1455: Cloud Storage
PPTX
Scaling real time streaming architectures with HDF and Dell EMC Isilon
PPTX
S3Guard: What's in your consistency model?
PPTX
Hortonworks Data Cloud for AWS
PPTX
Hadoop crash course workshop at Hadoop Summit
PPTX
How to Use Apache Zeppelin with HWX HDB
PPTX
Webinar Series Part 5 New Features of HDF 5
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
PDF
SQL to Hive Cheat Sheet
PDF
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
PDF
The path to a Modern Data Architecture in Financial Services
PPT
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
PDF
Icons and Stencils for Hadoop
Using Tableau with Hortonworks Data Platform
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Zementis hortonworks-webinar-2014-09
Hive Functions Cheat Sheet
Building a Big Data platform with the Hadoop ecosystem
How Universities Use Big Data to Transform Education
Hive - 1455: Cloud Storage
Scaling real time streaming architectures with HDF and Dell EMC Isilon
S3Guard: What's in your consistency model?
Hortonworks Data Cloud for AWS
Hadoop crash course workshop at Hadoop Summit
How to Use Apache Zeppelin with HWX HDB
Webinar Series Part 5 New Features of HDF 5
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
SQL to Hive Cheat Sheet
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
The path to a Modern Data Architecture in Financial Services
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Icons and Stencils for Hadoop
Ad

Similar to Introduction to Hortonworks Data Platform for Windows (20)

PDF
The Next Generation of Big Data Analytics
PPTX
Break Through the Traditional Advertisement Services with Big Data and Apache...
PPTX
2012 06 hortonworks paris hug
PDF
Talend Open Studio and Hortonworks Data Platform
PDF
Hortonworks roadshow
PDF
Hadoop - Now, Next and Beyond
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
PDF
Hadoop: What It Is and What It's Not
PDF
vBACD July 2012 - Apache Hadoop, Now and Beyond
PDF
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
PDF
Hadoop Trends
PDF
Hadoop's Opportunity to Power Next-Generation Architectures
PPTX
Why hadoop for data science?
PPTX
Create a Smarter Data Lake with HP Haven and Apache Hadoop
PDF
Powering Next Generation Data Architecture With Apache Hadoop
PDF
Tackling big data with hadoop and open source integration
PPTX
Hadoop Reporting and Analysis - Jaspersoft
PDF
Hortonworks Big Data & Hadoop
PDF
Hortonworks Hadoop @ Oslo Hadoop User Group
PDF
Meetup oslo hortonworks HDP
The Next Generation of Big Data Analytics
Break Through the Traditional Advertisement Services with Big Data and Apache...
2012 06 hortonworks paris hug
Talend Open Studio and Hortonworks Data Platform
Hortonworks roadshow
Hadoop - Now, Next and Beyond
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hadoop: What It Is and What It's Not
vBACD July 2012 - Apache Hadoop, Now and Beyond
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Hadoop Trends
Hadoop's Opportunity to Power Next-Generation Architectures
Why hadoop for data science?
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
Tackling big data with hadoop and open source integration
Hadoop Reporting and Analysis - Jaspersoft
Hortonworks Big Data & Hadoop
Hortonworks Hadoop @ Oslo Hadoop User Group
Meetup oslo hortonworks HDP

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Institutional Correction lecture only . . .
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
master seminar digital applications in india
PPTX
Cell Structure & Organelles in detailed.
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
RMMM.pdf make it easy to upload and study
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Pharma ospi slides which help in ospi learning
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Pre independence Education in Inndia.pdf
Insiders guide to clinical Medicine.pdf
Institutional Correction lecture only . . .
FourierSeries-QuestionsWithAnswers(Part-A).pdf
master seminar digital applications in india
Cell Structure & Organelles in detailed.
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
GDM (1) (1).pptx small presentation for students
Computing-Curriculum for Schools in Ghana
Renaissance Architecture: A Journey from Faith to Humanism
RMMM.pdf make it easy to upload and study
TR - Agricultural Crops Production NC III.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Pharma ospi slides which help in ospi learning
PPH.pptx obstetrics and gynecology in nursing
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Microbial diseases, their pathogenesis and prophylaxis

Introduction to Hortonworks Data Platform for Windows

  • 1. Quick House Keeping Rule • Q&A panel is available if you have any questions during the webinar • There will be time for Q&A at the end • We will record the webinar for future viewing • All attendees will receive a copy of the slides and recording Page 1 © Hortonworks Inc. 2013
  • 2. Introducing Hortonworks Data Platform for Windows Enterprise Apache Hadoop for Windows Environments March 2013 © Hortonworks Inc. 2013 Page 2
  • 3. Our Speakers John Kreisa VP, Strategic Marketing Saptak Sen Sr. Product Manager Rohit Bakshi Product Manager Page 3 © Hortonworks Inc. 2013
  • 4. Agenda • Why Hadoop on Windows? • Hortonworks Data Platform for Windows • Microsoft - Big Data and Apache Hadoop • Hortonworks Data Platform under the covers • Q&A Page 4 © Hortonworks Inc. 2013
  • 5. Polling Question Where are you with Hadoop? __ We are running it in production __ We have it running in our labs __ We are just investigating Hadoop __ What is Hadoop? Page 5 © Hortonworks Inc. 2013
  • 6. Agenda • Why Hadoop on Windows? • Hortonworks Data Platform for Windows • Microsoft - Big Data and Apache Hadoop • Hortonworks Data Platform under the covers • Q&A Page 6 © Hortonworks Inc. 2013
  • 7. Why Apache Hadoop on Windows? • According to IDC Windows Server held 73% market share in 2012 – Hadoop was traditionally built for Linux servers so there are a large number of underserved organizations • Apache Hadoop: de-facto platform for processing massive amounts of unstructured data – Complementary to existing Microsoft technologies – There is a huge untapped community of Windows developers and ecosystem partners • A strong Microsoft-Hortonworks partnership and 18 months of development makes this a natural next step Page 7 © Hortonworks Inc. 2013
  • 8. What Makes Up Big Data? Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment SMS/MMS User Click Stream = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity Page 8 © Hortonworks Inc. 2013
  • 9. Big Data: Big and Getting Bigger Fast! • Unstructured data growth exceeds 80% year/year in most enterprises – Machine-generated data is a key driver in data growth • IDC projects digital universe will reach 40 zettabytes (ZB) by 2020 – 1 ZB = 1,000,000,000,000 GBs! – Projected to increase 15x by 2020 • According to 2012 Barclays CIO study big data outranks virtualization as #1 spending initiative *2012 IDC Digital Universe Study Page 9 © Hortonworks Inc. 2013
  • 10. Enter Apache Hadoop The core of the next generation data platform… OSS that delivers high-scale HADOOP CORE HDFS MAP REDUCE storage & processing with enterprise-ready platform PLATFORM SERVICES Enterprise Readiness services Hortonworkers are the original architects, operators, and builders of core Hadoop Page 10 © Hortonworks Inc. 2013
  • 11. Agenda • Why Hadoop on Windows? • Hortonworks Data Platform for Windows • Microsoft - Big Data and Apache Hadoop • Hortonworks Data Platform under the covers • Q&A Page 11 © Hortonworks Inc. 2013
  • 12. Introducing HDP for Windows OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & Manage & Store, Store, Operate at Operate at Process and Process and For Windows Scale Scale Access Data Access Data • 100% Open Source HADOOP CORE Distributed Enterprise Hadoop Storage & Processing • Component and version PLATFORM SERVICES Enterprise Readiness compatible with Microsoft HDInsight HORTONWORKS DATA PLATFORM (HDP) • Availability For Windows • Beta release available now • GA early 2Q 2013 Page 12 © Hortonworks Inc. 2013
  • 13. Hortonworks Data Platform for Windows HDP: the first and only distribution available on Windows & Linux • Enterprise-grade Apache Hadoop on Windows – Enables same experience for Hadoop on Windows & Linux • More partners, more developers for Hadoop – Makes native Apache Hadoop available to Windows ecosystem – More options for Windows focused organizations • Hortonworks focus: Enterprise Apache Hadoop for all platforms – Trusted reliable production-ready distribution for on-premise Hadoop on Windows deployments • Built with joint investment and contributions from Microsoft – Deep engineering relationship ensures tight integration and maximum performance Page 13 © Hortonworks Inc. 2013
  • 14. Hortonworks: Best In Class Hadoop Support • Experienced enterprise support team – Experience supporting enterprise clients in production – Core engineers have real operational experience: built and supported 44+K nodes in production – Extensive experience in commercial big data offerings including HDP, MapR, Karmasphere • Global 24x7 operation – support based in Sunnyvale, UK & India • Stringent case management processes ensures high quality customer service & responsiveness Page 14 © Hortonworks Inc. 2013
  • 15. Transferring Our Hadoop Expertise to You The expert source for Apache Hadoop training & certification • World class training programs designed to help you learn fast – Role-based hands on classes with 50% lab time – New HDP on Windows course • Expert consulting services – Programs designed to transfer knowledge • Industry leading Hadoop Sandbox program – Fastest way to learn Apache Hadoop – Multi-level tutorials for wide applicability – Customizable and updateable Page 15 © Hortonworks Inc. 2013
  • 16. Hortonworks Snapshot We develop, distribute and support the ONLY 100% open source Headquarters: Palo Alto, CA Employees: 180+ and growing Enterprise Hadoop distribution Investors: Benchmark, Index, Yahoo Develop Distribute Support • We employ the core • We distribute the only 100% • We are uniquely positioned architects, builders and Open Source Enterprise to deliver the highest quality operators of Apache Hadoop Hadoop Distribution: of Hadoop support Hortonworks Data • We drive innovation within Platform • We enable the ecosystem to Apache Software work better with Hadoop Foundation projects • We engineer, test & certify HDP for enterprise usage Endorsed by Strategic Partners Page 16 © Hortonworks Inc. 2013
  • 17. Agenda • Why Hadoop on Windows? • Hortonworks Data Platform for Windows • Microsoft - Big Data and Apache Hadoop • Hortonworks Data Platform under the covers • Q&A Page 17 © Hortonworks Inc. 2013
  • 18. Microsoft Big Data Microsoft Big Data – Simplifies data management for IT – Enables IT and users to easily enrich their data with the world’s data, and – Delivers agility to end users through familiar tools like Excel Page 18 © Hortonworks Inc. 2013
  • 19. Microsoft End-To-End Big Data Platform Page 19 © Hortonworks Inc. 2013
  • 20. Agenda • Why Hadoop on Windows? • Hortonworks Data Platform for Windows • Microsoft - Big Data and Apache Hadoop • Hortonworks Data Platform under the covers • Q&A Page 20 © Hortonworks Inc. 2013
  • 21. Enhancing the Core of Apache Hadoop Deliver high-scale storage & processing with enterprise-ready platform services WEBHDFS MAP REDUCE Unique Focus Areas: HADOOP CORE HDFS • Bigger, faster, more flexible Continued focus on speed & scale and PLATFORM SERVICES Enterprise Readiness enabling near-real-time apps • Tested & certified at scale Run ~1300 system tests on large clusters for every release Hortonworkers are the architects, operators, and builders of core Hadoop • Enterprise-ready services High availability, disaster recovery, snapshots, security, … Page 21 © Hortonworks Inc. 2013
  • 22. Data Services for Full Data Lifecycle DATA Provide data services to SERVICES store, process & access SQOOP PIG HIVE data in many ways HCATALOG Unique Focus Areas: Distributed • Apache HCatalog HADOOP CORE Storage & Processing Metadata services for consistent table access to Hadoop data PLATFORM SERVICES Enterprise Readiness • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Hortonworks enables Hadoop data to be accessed via existing tools & systems Page 22 © Hortonworks Inc. 2013
  • 23. Data Services for Full Data Lifecycle DATA Provide data services to SERVICES store, process & access SQOOP PIG HIVE data in many ways HCATALOG Unique Focus Areas: Distributed • Apache HCatalog HADOOP CORE Storage & Processing Metadata services for consistent table access to Hadoop data PLATFORM SERVICES Enterprise Readiness • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Hortonworks enables Hadoop data to be accessed via existing tools & systems Page 23 © Hortonworks Inc. 2013
  • 24. Data Services for Full Data Lifecycle DATA Provide data services to SERVICES store, process & access SQOOP PIG HIVE data in many ways HCATALOG Unique Focus Areas: Distributed • Apache HCatalog HADOOP CORE Storage & Processing Metadata services for consistent table access to Hadoop data PLATFORM SERVICES Enterprise Readiness • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Hortonworks enables Hadoop data to be accessed via existing tools & systems Page 24 © Hortonworks Inc. 2013
  • 25. Operational Services for Ease of Use OPERATIONAL DATA Include complete SERVICES SERVICES operational services for Store, productive operations Oozie Process and Access Data & management Distributed • Apache Oozie: HADOOP CORE Storage & Processing Manage and schedule job execution for Hadoop jobs PLATFORM SERVICES Enterprise Readiness Only Hortonworks provides a complete open source Hadoop management tool Page 25 © Hortonworks Inc. 2013
  • 26. Inside HDP for Windows OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & Store, HIVE PIG Oozie Operate at Process and For Windows Scale Access Data SQOOP HCATALOG • 100% Open Source WEBHDFS Distributed Enterprise Hadoop HADOOP CORE Storage & ProcessingREDUCE HDFS MAP • Component and version PLATFORM SERVICES compatible with Microsoft HDInsight HORTONWORKS DATA PLATFORM (HDP) • Availability For Windows • Beta release available now • GA early 2Q 2012 Page 26 © Hortonworks Inc. 2013
  • 27. Seamless Interoperability with Your Microsoft Tools • Integrated with Microsoft tools APPLICATIONS for native big data analysis – Bi-directional connectors for SQL Microsoft Applications Server and SQL Azure through SQOOP – Excel ODBC integration through Hive • Addressing demand for Hadoop on Windows – Ideal for Windows customers with DATA SYSTEMS HORTONWORKS Hadoop operational experience DATA PLATFORM For Windows • Enables all common Hadoop workloads – Data refinement and ETL offload for high-volume data landing – Data exploration for discovery of new business opportunities DATA SOURCES Traditional Sources New Sources OLTP, (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media) MOBILE POS DATA SYSTEMS Page 27 © Hortonworks Inc. 2013
  • 28. Demo Time! Excel integration with HDP • Interact with HDP through Excel • Use Data Explorer to explore and turn raw data into valuable information Page 28 © Hortonworks Inc. 2013
  • 29. Maximize Your Hadoop Deployment Choice • Use HDP for Windows for on-premises deployment on Windows Server – Ideal for Windows users with Hadoop experience – Perfect next step for those who are ready to move from POC to production • Use HDInsight for Microsoft tooling and Management and Provisioning – HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) – available in Preview today – HDInsight Server for full integration of Hadoop with Microsoft tools on premises – Developer Preview available today • Full interoperability and deployment choice across platforms – Implement big data applications that run on-premise & cloud – By leveraging open source HDP, enables seamless interoperability across environments: Linux, Windows, Windows Azure Page 29 © Hortonworks Inc. 2013
  • 30. Next Steps Download Hortonworks Sandbox www.hortonworks.com/sandbox Download Hortonworks Data Platform for Windows (Beta) www.hortonworks.com/download Follow… @hortonworks, @hortonworks_U Page 30 © Hortonworks Inc. 2013

Editor's Notes

  • #9: For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples.ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases.Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data.Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data.Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella.Moreover, business data feeds and publicly available data sets are also “big data”.So we should not minimize our thinking to just data that flows through an organization.Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example.The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available.One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics appied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime!Anyhow, this is what Big Data means to me…hopefully it makes sense to you.
  • #10: , an amount that exceeds previous forecasts by 5 ZBs, resulting in a 50-fold growth from the beginning of 2010
  • #11: At its core, Hadoop is about HDFS and MapReduce, 2 projects that are really about distributed storage and data processing which are the underpinnings of Hadoop.In addition to Core Hadoop, we must identify and include the requisite “Platform Services” that are central to any piece of enterprise software. These include High Availability, Disaster Recovery, Security, etc, which enable use of the technology for a much broader (and mission critical) problem set.This is accomplished not by introducing new open source projects, but rather ensuring that these aspects are addressed within existing projects.HDFS: Self-healing, distributed file system for multi-structured data; breaks files into blocks & stores redundantly across clusterMapReduce: Framework for running large data processing jobs in parallel across many nodes & combining resultsYARN: New application management framework that enables Hadoop to go beyond MapReduce appsEnterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …
  • #13: In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • #15: Not only is all of this backed by the architects, developers and operators of Hadoop, but it is also assisted by a world class support team. With backgrounds from IBM, Oracle, MySQL and more, the team enables 24X7 support together with very mature support processes to ensure high quality customer service and responsiveness
  • #16: Additionally, we are a leading provider of Hadoop support through our Hortonworks University, with courses for both development and operations. If required, we can also provide expert consulting services from both ourselves or our System Integrator partners.And for anyone looking to get their hands on Hadoop, we have recently introduced the Hadoop Sandbox program which enables users to download a full instance of HDP together with guided tutorials covering both development and administration topics.
  • #17: At Hortonworks today, our focus is very clear: we Develop, Distribute and Support a 100% open source distribution of Enterprise Apache Hadoop.We employ the core architects, builders and operators of Apache Hadoop and drive the innovation in the open source community.We distribute the only 100% open source Enterprise Hadoop distribution: the Hortonworks Data PlatformGiven our operational expertise of running some of the largest Hadoop infrastructure in the world at Yahoo, our team is uniquely positioned to support youOur approach is also uniquely endorsed by some of the biggest vendors in the IT marketYahoo is both and investor and a customer, and most importantly, a development partner. We partner to develop Hadoop, and no distribution of HDP is released without first being tested on Yahoo’s infrastructure and using the same regression suite that they have used for years as they grew to have the largest production cluster in the worldMicrosoft has partnered with Hortonworks to include HDP in both their off-premise offering on Azure but also their on-premise offering under the product name HDInsight. This also includes integration with both Visual Studio for application development but also with System Center for operational management of the infrastructureTeradata includes HDP in their products in order to provide the broadest possible range of options for their customers
  • #19: Hdp on windows, hdp server on windows, hd on azureMscustomer that wants to leverage familiar windows tools system center, Work with it like in linux, bring your own scriptsWhat they will get and when they will get itIntegration with ms tooling Mscust gives them choice because the infrastructure bits underpinnings the sameSo get started todayDriver is isv app that is vertical in nature and need a choice to deploy on windows todayField positioning
  • #22: At its core, Hadoop is about HDFS and MapReduce, 2 projects that are really about distributed storage and data processing which are the underpinnings of Hadoop.In addition to Core Hadoop, we must identify and include the requisite “Platform Services” that are central to any piece of enterprise software. These include High Availability, Disaster Recovery, Security, etc, which enable use of the technology for a much broader (and mission critical) problem set.This is accomplished not by introducing new open source projects, but rather ensuring that these aspects are addressed within existing projects.HDFS: Self-healing, distributed file system for multi-structured data; breaks files into blocks & stores redundantly across clusterMapReduce: Framework for running large data processing jobs in parallel across many nodes & combining resultsYARN: New application management framework that enables Hadoop to go beyond MapReduce appsEnterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …
  • #23: Beyond Core and Platform Services, we must add a set of Data Services that enable the full data lifecycle. This includes capabilities to:Store dataProcess dataAccess dataFor example: how do we maintain consistent metadata information required to determine how best to query data stored in HDFS? The answer: a project called Apache HCatalogOr how do we access data stored in Hadoop from SQL-oriented tools? The answer: with projects such as Hive, which is the defacto standard for accessing data stored in HDFS.All of these are broadly captured under the category of “data services”.Apache HCatalog: Metadata & Table ManagementMetadata service that enables users to access Hadoop data as a set of tables without needing to be concerned with where or how their data is storedEnables consistent data sharing and interoperability across data processing tools such as Pig, MapReduce and HiveEnables deep interoperability and data access with systems such as Teradata, SQL Server, etc.Apache Hive: SQL Interface for HadoopThe de-facto SQL-like interface for Hadoop that enables data summarization, ad-hoc query, and analysis of large datasetsConnects to Excel, Microstrategy, PowerPivot, Tableau and other leading BI tools via Hortonworks Hive ODBC DriverHive currently serves batch and non-interactive use cases; in 2013, Hortonworks is working with Hive community to extend use cases to interactive query. Cloudera, on the other hand, has chosen to abandon Hive in lieu of Cloudera Impala (a Cloudera controlled technology aimed at the analytics market and solely focused on non-operational interactive query use cases)Apache HBase: NoSQL DB for Interactive AppsNon-relational, columnar database that provides a way for developers to create, read, update, and delete data in Hadoop in a way that performs well for interactive applicationsCommonly used for serving “intelligent applications” that predict user behavior, detect shifting usage patterns, or recommend ways for users to engageWebHDFS: Web service interface for HDFSScalable REST API that enables easy and scalable access to HDFS Move files in & out and delete from HDFS; leverages parallelism of clusterPerform file and directory functionswebhdfs://<HOST>:<HTTP PORT>/PATHIncluded in versions 1.0 and 2.0 of Hadoop; created & driven by HortonworkersTalend Open Studio for Big Data: open source ETL tool available as an optional download with HDPIntuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and PigOozie scheduling allows you to manage and stage jobs Connectors for any database, business application or systemIntegrated HCatalog storage
  • #24: Beyond Core and Platform Services, we must add a set of Data Services that enable the full data lifecycle. This includes capabilities to:Store dataProcess dataAccess dataFor example: how do we maintain consistent metadata information required to determine how best to query data stored in HDFS? The answer: a project called Apache HCatalogOr how do we access data stored in Hadoop from SQL-oriented tools? The answer: with projects such as Hive, which is the defacto standard for accessing data stored in HDFS.All of these are broadly captured under the category of “data services”.Apache HCatalog: Metadata & Table ManagementMetadata service that enables users to access Hadoop data as a set of tables without needing to be concerned with where or how their data is storedEnables consistent data sharing and interoperability across data processing tools such as Pig, MapReduce and HiveEnables deep interoperability and data access with systems such as Teradata, SQL Server, etc.Apache Hive: SQL Interface for HadoopThe de-facto SQL-like interface for Hadoop that enables data summarization, ad-hoc query, and analysis of large datasetsConnects to Excel, Microstrategy, PowerPivot, Tableau and other leading BI tools via Hortonworks Hive ODBC DriverHive currently serves batch and non-interactive use cases; in 2013, Hortonworks is working with Hive community to extend use cases to interactive query. Cloudera, on the other hand, has chosen to abandon Hive in lieu of Cloudera Impala (a Cloudera controlled technology aimed at the analytics market and solely focused on non-operational interactive query use cases)Apache HBase: NoSQL DB for Interactive AppsNon-relational, columnar database that provides a way for developers to create, read, update, and delete data in Hadoop in a way that performs well for interactive applicationsCommonly used for serving “intelligent applications” that predict user behavior, detect shifting usage patterns, or recommend ways for users to engageWebHDFS: Web service interface for HDFSScalable REST API that enables easy and scalable access to HDFS Move files in & out and delete from HDFS; leverages parallelism of clusterPerform file and directory functionswebhdfs://<HOST>:<HTTP PORT>/PATHIncluded in versions 1.0 and 2.0 of Hadoop; created & driven by HortonworkersTalend Open Studio for Big Data: open source ETL tool available as an optional download with HDPIntuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and PigOozie scheduling allows you to manage and stage jobs Connectors for any database, business application or systemIntegrated HCatalog storage
  • #25: Beyond Core and Platform Services, we must add a set of Data Services that enable the full data lifecycle. This includes capabilities to:Store dataProcess dataAccess dataFor example: how do we maintain consistent metadata information required to determine how best to query data stored in HDFS? The answer: a project called Apache HCatalogOr how do we access data stored in Hadoop from SQL-oriented tools? The answer: with projects such as Hive, which is the defacto standard for accessing data stored in HDFS.All of these are broadly captured under the category of “data services”.Apache HCatalog: Metadata & Table ManagementMetadata service that enables users to access Hadoop data as a set of tables without needing to be concerned with where or how their data is storedEnables consistent data sharing and interoperability across data processing tools such as Pig, MapReduce and HiveEnables deep interoperability and data access with systems such as Teradata, SQL Server, etc.Apache Hive: SQL Interface for HadoopThe de-facto SQL-like interface for Hadoop that enables data summarization, ad-hoc query, and analysis of large datasetsConnects to Excel, Microstrategy, PowerPivot, Tableau and other leading BI tools via Hortonworks Hive ODBC DriverHive currently serves batch and non-interactive use cases; in 2013, Hortonworks is working with Hive community to extend use cases to interactive query. Cloudera, on the other hand, has chosen to abandon Hive in lieu of Cloudera Impala (a Cloudera controlled technology aimed at the analytics market and solely focused on non-operational interactive query use cases)Apache HBase: NoSQL DB for Interactive AppsNon-relational, columnar database that provides a way for developers to create, read, update, and delete data in Hadoop in a way that performs well for interactive applicationsCommonly used for serving “intelligent applications” that predict user behavior, detect shifting usage patterns, or recommend ways for users to engageWebHDFS: Web service interface for HDFSScalable REST API that enables easy and scalable access to HDFS Move files in & out and delete from HDFS; leverages parallelism of clusterPerform file and directory functionswebhdfs://<HOST>:<HTTP PORT>/PATHIncluded in versions 1.0 and 2.0 of Hadoop; created & driven by HortonworkersTalend Open Studio for Big Data: open source ETL tool available as an optional download with HDPIntuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and PigOozie scheduling allows you to manage and stage jobs Connectors for any database, business application or systemIntegrated HCatalog storage
  • #26: Any data management platform that is operated at any reasonable scale requires a management technology – for example SQL Server Management Studio for SQL Server, or Oracle Enterprise Manager for Oracle DB, etc. Hadoop is no exception, and means Apache Ambari, which is increasingly being recognized as foundational to the operation of Hadoop infrastructures. It allows users to provision, manage and monitor a cluster and provides a set of tools to visualize and diagnose operational issues. There are other projects in this category (such as Oozie) but Ambari is really the most influential.Apache Ambari: Management & MonitoringMake Hadoop clusters easy to operateSimplified cluster provisioning with a step-by-step install wizardPre-configured operational metrics for insight into health of Hadoop servicesVisualization of job and task execution for visibility into performance issuesComplete RESTful API for integrating with existing operational toolsIntuitive user interface that makes controlling a cluster easy and productive
  • #27: In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • #28: It is for that reason that we focus on HDP interoperability across all of these categories:Data systemsHDP is endorsed and embedded with SQL Server, Teradata and moreBI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and moreWith Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with HadoopFor Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDPOperational toolsIntegration with System Center, and with Teradata viewpoint
  • #29: Rohit: Can you provide three bullet points of your demo?
  • #30: Hdp on windows, hdp server on windows, hd on azureMscustomer that wants to leverage familiar windows tools system center, Work with it like in linux, bring your own scriptsWhat they will get and when they will get itIntegration with ms tooling Mscust gives them choice because the infrastructure bits underpinnings the sameSo get started todayDriver is isv app that is vertical in nature and need a choice to deploy on windows todayField positioning