SlideShare a Scribd company logo
Sneak Peak into Self-Service, Cross-Enterprise, Job
Scheduling with CA Workload Automation Advanced
Integration for Hadoop
Beeshmanth (B) Kotamreddy
DevOps: Continuous Delivery
CA Technologies
Principal Product Manager
DO4T42T
@TwitterHandle
#CAWorld
2 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
For Informational Purposes Only
Terms of this Presentation
© 2015 CA. All rights reserved. All trademarks referenced herein belong to their respective companies. The presentation provided at CA
World 2015 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer
references relate to customer's specific use and experience of CA products and solutions so actual results may vary.
Certain information in this presentation may outline CA’s general product direction. This presentation shall not serve to (i) affect the rights
and/or obligations of CA or its licensees under any existing or future license agreement or services agreement relating to any CA software
product; or (ii) amend any product documentation or specifications for any CA software product. This presentation is based on current
information and resource allocations as of November 18, 2015, and is subject to change or withdrawal by CA at any time without notice. The
development, release and timing of any features or functionality described in this presentation remain at CA’s sole discretion.
Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in
this presentation, CA may make such release available to new licensees in the form of a regularly scheduled major product release. Such
release may be made available to licensees of the product who are active subscribers to CA maintenance and support, on a when and if-
available basis. The information in this presentation is not deemed to be incorporated into any contract.
3 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Agenda
BIGDATA AND CHANGING CUSTOMER NEEDS
HADOOP
Q & A
BUSINESS CHALLENGES
CA WORKLOAD AUTOMATION ADVANCED INTEGRATION FOR HADOOP
DEMO
1
2
3
4
5
6
4 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Maximize the value of Big Data
with the power of Workload
Automation
 HDFS Operations
 Pig
 Hive
 Sqoop
 Oozie Workflows
Exciting, disruptive & evolving ecosystem
"80% of customer data will be wasted due to immature enterprise data 'value chains.' “
~IDC
5 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
What is Big Data?
Datasets whose volume, velocity, variety and complexity exceed ability of commonly
used software tools to capture, process, store, manage, and analyze them.
Information Sources
MobileTransactional
Data
SearchTextsCRM, SCM,
ERP
$ € ¥
ImagesEmail Social
Media
IT Ops AudioVideo
Velocity Volume
Variety Complexity
Big
Data
6 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Enterprises across all industries use Big Data
Enterprises require new capabilities around processing large amounts of data in a variety of different formats
 Fraud Prevention
 Trading Risks
 Customer Risk Assessment
 Call Detail Records
 Real-time bandwidth allocations
 Life time value and promotions
RETAILERS
 Customer Analytics
 Brand Sentiment Analytics
 Promotion Planning
TELCO CARRIERSBANKS
 Genomic Analysis
 Medical trial Analysis
 Hospital Diagnostics Analytics
 IOT/Smart Meter Analytics
 Energy trading and pricing risk
analytics
GOVERNMENT/
PUBLIC SECTOR
 Crime Intelligence and Prevention
 Fraud Prevention
UTILITY PROVIDERSHEALTH CARE PROVIDERS
$
7 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
What is Hadoop ?
Hadoop is… open-source software designed for
High Scalability, Fault Tolerant and Highly Distributed
Key elements:
1. Distributed processing of Big Data (e.g. MapReduce)
2. Distributed storage (Hadoop Distributed File System or HDFS)
HDFS
(Distributed Reliable Storage)
MapReduce
(Resource Management
& Data Processing)
HDFS
(Distributed Reliable Storage)
YARN
(Resource Management)
MapReduce
(Dist. Programming)
Hadoop 1.0 Hadoop 2.0
Spark
(In Memory)
HBase
(NoSQLstore)
Hive
(Query)
Pig
(Scripting)
Oozie
(Workflow)
8 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
MapReduce – Core Hadoop1
 Hadoop’s MapReduce framework involves two phases:
1. Map Phase: Distributes dataset among multiple servers and
operates on the data locally.
2. Reduce Phase: Recombines the partial results.
A distributed computing Framework
9 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Job-1
Job-2
Job-3
Job-4
Job-5
HDFS
Data Nodes
Task Trackers
Hadoop Distributed File System (HDFS)
Self-healing, high bandwidth Clustered Storage
• Name Node - One of the
Core Hadoop services that
maintains the namespace –
knows where data is and
manages blocks on data nodes
• Data Node - serves that
actual store the data in their
local disks.
• Secondary Name Node -
performs periodic checkpoint
of primary name node to
serve as a backup in case of
failure
Slave Nodes
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
HDFS breaks incoming files into blocks and stores them
redundantly across the cluster.
Name Node (primary)
Name Node (secondary)
Master Node
Periodic Checkpoint
2
10 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
SO, YOU HAVE DATA
And you want it to help you better
understand your business, customers and
marketplace.
THAT’S WHY YOU USE HADOOP
But, extracting data insights may require you
to interface with systems outside of Hadoop.
And that isn’t always easy…
11 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Enterprises typically have multiple scheduling engines
to manage end-to-end business processes
Companies typically interface with multiple systems such as
ERP (SAP/ Oracle etc.), databases, reporting tools, point of sale systems,
social media files etc., in addition to Hadoop
As a result, Enterprises use multiple tools to manage
their workload automation needs
Visualizing the end-to-end business workflows,
& managing dependencies across Hadoop
and non-Hadoop systems might not always be easy
12 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Challenges
Multiple Schedulers needed to run traditional jobs and Hadoop jobs
 Hadoop jobs may not integrate into existing Workflows
Heterogeneous Environment and Tools
 Team productivity, experience, knowledge
 Placing workloads - “right place , right time”
Slow responsiveness to the business
 No central location to monitor end-end workflows
13 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Drag and drop Hadoop jobs
into existing workflows.
Monitor traditional and
Hadoop jobs from a single
console.
Detect problems early
and resolve them quickly.
Set up automatic alerts
for critical events.
Unified visibility into your
heterogeneous and Hadoop
environments
Improved performance
and uptime through proactive
monitoring and alerts
Lower costs by eliminating the complexity of disconnected monitoring tools
BIG DATA MADE EASY with
CA Workload Automation Advanced Integration for Hadoop
14 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Extract and
Move Input Files
Transform and
Process Input
Run Specialized
ERP jobs to
extract data
Ingest Data in
DW
Batch Ingestion
into Hadoop and
Batch Analytics
Load results into
NoSQL for
interactive queries
FTP JOB
Receive POS data
DATABASE JOB
Extract Pricing
data when
database trigger
occurs
INFORMATICA
JOB
Parse POS Files
INFORMATICA
JOB
Run SQL query to
match customer
data in SAP
SAP JOB
Run SAP Extract
Job to extract
inventory data
TERADATA JOB
Run ETL to merge
input files into
DW
SQOOP JOB
Run Sqoop jobs
to copy data into
Hadoop cluster
PIG JOB
Run pig jobs for
operational
analytics
NOSQL JOB
Interactive Search
job to run
dynamic
promotion
WorkflowWorkloadsUsecase
Extract POS,
Inventory, Price Data
Mine Customer Information and
Inventory Information from ERP
Load Data into NoSQL
and render dynamic
discounting on-demand
Perform Batch aggregation and
Machine learning for Promotion
Analytics
CA Workload Automation extends scheduling for Big Data
Retail Customer Analytics in the Application Economy
15 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Timeline as of November, 2015
 Reduce Silos and increase efficiency with
centralized scheduling of Hadoop & Non-
Hadoop workloads through supporting
 HDFS Operations
 Pig
 Hive
 Sqoop
 Oozie Workflows
 Integrated Hadoop & Heterogeneous
environments through supporting
Cloudera and HortonWorks distributions
for Hadoop
Marquee
Features/
Business
Value
Planned Under Consideration
DELIVERED PLANNED UNDER CONSIDERATION
Product/
Releases
CA WLA Advanced Integration for Hadoop R12.0
Currently GA
Roadmap: Product Name
 Unify additional Hadoop environments
through supporting other distributions
such as IBM Big Insights, MapR, Greenplum
 Extended security through native support
for Kerberos authentication.
 Centralized Scheduling for evolving
Hadoop ecosystem with integrations to
Spark, Hbase, Flume, Kafka, Tajo etc.
CA WLA Advanced Integration for Hadoop R12.X CA WLA Advanced Integration for Hadoop RX.X
16 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
The Evolving Hadoop Ecosystem
Components Description
mahout R
Data Mining/machine learning
tools used against Hadoop data to
detect patterns and trends
Pig
Scripting language for analyzing
large datasets. Compiles to
MapReduce jobs
MapReduce YARN
Programming model for
processing large data sets. YARN
performs overall resource mgmt
Oozie
A workflow scheduler tool to
manage Hadoop MapReduce jobs
Sqoop Hive
Enable SQL for Hadoop data:
Sqoop - Data transfer between
Hadoop and structured datastores.
HIVE - data warehouse for Hadoop.
Drill - open source, low latency SQL
query engine for Hadoop and NoSQL.
Drill
ZooKeeper
Coordination of config. data,
naming and synchronization of
Hadoop projects
Components Description
BigTop
Packaging services for Hadoop
projects to ease testing and
deployment
HBase
A non-relational, distributed
database that runs on top of
HDFS
Thrift / AVRO
Schema-based data serialization
system using RPC calls
Solr
hutch
Indexing and search tools for
data stored in HDFS for Hadoop
Elasticsearch
Kafka / Flume
Collect, aggregate, and move
streaming data from multiple
sources into Hadoop
Spark
AppDev tool for Hadoop apps
combining batch, streaming, and
interactive analytics
Anbari Chukwa
Monitoring & Management of
Hadoop clusters and nodes
17 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Influencing Our Roadmap
Winning with CA
 Submit your ideas on
communities.ca.com
 Vote & comment on ideas
that are important to you
 CA Product Management
reviews ideas and updates
status as they move
through the lifecycle
 “Currently Planned” idea
status indicates inclusion
in Agile Backlog or
Product Roadmap
Take the opportunity to influence our product development.
Help ensure that we deliver is what you need and want.
Agile Development
CA Communities Ideation
 Register to participate in:
– Live Demos/End-of-Sprint
Reviews
– Private - Members Only -
Online Community
– Pre-Release Onsite Testing
and Support (Beta)
– Upgrade Support from
SWAT Team
 How to register:
https://validate.ca.com
Customer Validation
18 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
DEMO
CA Workload Automation Advanced Integration for
Hadoop
19 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
Q & A
20 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD
For More Information
To learn more, please visit:
http://cainc.to/Nv2VOe
CA World ’15

More Related Content

PPTX
Mobile to Mainframe: Leveraging Application Services for Rapid Application De...
PDF
Tech Talk: Predictive Workload Analytics with CA Workload Automation iDash
PDF
Introduction to CA Service Virtualization
PDF
Continuous Delivery—CA Workload Automation ESP Edition 11.4 Enhancements
PDF
Tech Talk: Getting to Know Node.js
PDF
Case Study: State of Colorado Takes the Road to ITSM Maturity: Playing the ...
PDF
Pre-Con Education: Migrating to CA Release Automation 5.5.2 to Exploit New ...
PDF
CA Workload Automation r12 Test Drive
Mobile to Mainframe: Leveraging Application Services for Rapid Application De...
Tech Talk: Predictive Workload Analytics with CA Workload Automation iDash
Introduction to CA Service Virtualization
Continuous Delivery—CA Workload Automation ESP Edition 11.4 Enhancements
Tech Talk: Getting to Know Node.js
Case Study: State of Colorado Takes the Road to ITSM Maturity: Playing the ...
Pre-Con Education: Migrating to CA Release Automation 5.5.2 to Exploit New ...
CA Workload Automation r12 Test Drive

What's hot (20)

PDF
Hand-On Lab: CA Release Automation Rapid Development Kit and SDK
PDF
Best Practice for Supercharging CA Workload Automation dSeries (DE) for Optim...
PDF
Practical Reporting for Effective Analytical Data Intelligence with CA Worklo...
PDF
Desjardins Group Leverages CA Workload Automation as It Begins Its DevOps Jou...
PDF
Upgrade and Unleash the Power of CA Workload Automation AutoSys (AE) and CA W...
PDF
Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...
PDF
How to Test the New Fashioned Way
PDF
Hands-On Lab: Complement CA Release Automation with a New Continuous Delivery...
PDF
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
PDF
Case Study: SunTrust’s Next Gen QA and Release Services Transformation Journey
PDF
Application Testing Best Practices for Mobile Devices
PDF
Getting the Most Out of Your DB2 Investment
PDF
Tech Talk: Harness the Power of Innovations Like Microservice Architecture an...
PDF
Extend Your Catalog of Artifacts and Breeze Through Your Next Audit With CA ...
PDF
Hands-On Lab: Let's Build an ITSM Dashboard
PDF
The Why, Where and How of Service Virtualization Adoption
PDF
Migrating to CA Workload Automation - Consolidation and Conversion Considerat...
PDF
Tech Talk: Master Your Continuous Delivery Pipeline with a New Level of Orche...
PDF
CA Project and Portfolio Management v14.x - Building a Better Portfolio
PDF
Big Iron + Big Data = BIG DEAL! Unlock The Power of Your Mainframe Data
Hand-On Lab: CA Release Automation Rapid Development Kit and SDK
Best Practice for Supercharging CA Workload Automation dSeries (DE) for Optim...
Practical Reporting for Effective Analytical Data Intelligence with CA Worklo...
Desjardins Group Leverages CA Workload Automation as It Begins Its DevOps Jou...
Upgrade and Unleash the Power of CA Workload Automation AutoSys (AE) and CA W...
Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...
How to Test the New Fashioned Way
Hands-On Lab: Complement CA Release Automation with a New Continuous Delivery...
Pre-Con Education: Advanced and Reporting and Dashboards With Xtraction
Case Study: SunTrust’s Next Gen QA and Release Services Transformation Journey
Application Testing Best Practices for Mobile Devices
Getting the Most Out of Your DB2 Investment
Tech Talk: Harness the Power of Innovations Like Microservice Architecture an...
Extend Your Catalog of Artifacts and Breeze Through Your Next Audit With CA ...
Hands-On Lab: Let's Build an ITSM Dashboard
The Why, Where and How of Service Virtualization Adoption
Migrating to CA Workload Automation - Consolidation and Conversion Considerat...
Tech Talk: Master Your Continuous Delivery Pipeline with a New Level of Orche...
CA Project and Portfolio Management v14.x - Building a Better Portfolio
Big Iron + Big Data = BIG DEAL! Unlock The Power of Your Mainframe Data
Ad

Viewers also liked (7)

POTX
Performance Tuning EC2 Instances
PDF
Linux Performance Analysis: New Tools and Old Secrets
PDF
Linux Systems Performance 2016
PPTX
Broken Linux Performance Tools 2016
PDF
BPF: Tracing and more
PDF
Velocity 2015 linux perf tools
PDF
Linux Profiling at Netflix
Performance Tuning EC2 Instances
Linux Performance Analysis: New Tools and Old Secrets
Linux Systems Performance 2016
Broken Linux Performance Tools 2016
BPF: Tracing and more
Velocity 2015 linux perf tools
Linux Profiling at Netflix
Ad

Similar to Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Workload Automation Advanced Integration for Hadoop (20)

PDF
Automate Hadoop Jobs with Real World Business Impact
PDF
Tech Talk: CA Workload Automation Agent Monitor, Agents and Advanced Integrat...
PDF
Pre-Con Ed: Better Big Data Analytics with CA Workload Automation for Hadoop
PDF
Pre-Con Ed: Better Big Data Analytics with CA Workload Automation for Hadoop
PDF
Tech Mahindra ADOPT©: Accelerate DevOps Transformation
PDF
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
PDF
DevOps offerings by Brainstack Technologies
PPTX
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
PPT
IBM Innovate 2013 Session: DevOps 101
PDF
Introduction to 5w’s of DevOps
PDF
Dev opsnirvana
PDF
CA External WAAE Roadmap - UK User Group - CA Workload Automation Technology ...
PDF
Devops interview-questions-PDF
PDF
D.Herriau sur Devops - CA Technologies
PDF
What is Hadoop & its Use cases-PromtpCloud
PPT
Hadoop in action
PPTX
Hadoop & devOps : better together
PDF
DevOps Operations Challenges
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
PPTX
Modern infrastructure for business data lake
 
Automate Hadoop Jobs with Real World Business Impact
Tech Talk: CA Workload Automation Agent Monitor, Agents and Advanced Integrat...
Pre-Con Ed: Better Big Data Analytics with CA Workload Automation for Hadoop
Pre-Con Ed: Better Big Data Analytics with CA Workload Automation for Hadoop
Tech Mahindra ADOPT©: Accelerate DevOps Transformation
Technology Primer: Hey IT—Your Big Data Infrastructure Can’t Sit in a Silo An...
DevOps offerings by Brainstack Technologies
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
IBM Innovate 2013 Session: DevOps 101
Introduction to 5w’s of DevOps
Dev opsnirvana
CA External WAAE Roadmap - UK User Group - CA Workload Automation Technology ...
Devops interview-questions-PDF
D.Herriau sur Devops - CA Technologies
What is Hadoop & its Use cases-PromtpCloud
Hadoop in action
Hadoop & devOps : better together
DevOps Operations Challenges
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Modern infrastructure for business data lake
 

More from CA Technologies (20)

PPTX
CA Mainframe Resource Intelligence
PDF
Mainframe as a Service: Sample a Buffet of IBM z/OS® Platform Excellence
PDF
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
PDF
Case Study: How The Home Depot Built Quality Into Software Development
PDF
Pre-Con Ed: Privileged Identity Governance: Are You Certifying Privileged Use...
PDF
Case Study: Privileged Access in a World on Time
PDF
Case Study: How SGN Used Attack Path Mapping to Control Privileged Access in ...
PDF
Case Study: Putting Citizens at The Center of Digital Government
PDF
Making Security Work—Implementing a Transformational Security Program
PDF
Keynote: Making Security a Competitive Advantage
PDF
Emerging Managed Services Opportunities in Identity and Access Management
PDF
The Unmet Demand for Premium Cloud Monitoring Services—and How Service Provid...
PDF
Leveraging Monitoring Governance: How Service Providers Can Boost Operational...
PDF
The Next Big Service Provider Opportunity—Beyond Infrastructure: Architecting...
PDF
Application Experience Analytics Services: The Strategic Digital Transformati...
PDF
Application Experience Analytics Services: The Strategic Digital Transformati...
PDF
Strategic Direction Session: Deliver Next-Gen IT Ops with CA Mainframe Operat...
PDF
Strategic Direction Session: Enhancing Data Privacy with Data-Centric Securit...
PDF
Blockchain: Strategies for Moving From Hype to Realities of Deployment
PDF
Establish Digital Trust as the Currency of Digital Enterprise
CA Mainframe Resource Intelligence
Mainframe as a Service: Sample a Buffet of IBM z/OS® Platform Excellence
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How The Home Depot Built Quality Into Software Development
Pre-Con Ed: Privileged Identity Governance: Are You Certifying Privileged Use...
Case Study: Privileged Access in a World on Time
Case Study: How SGN Used Attack Path Mapping to Control Privileged Access in ...
Case Study: Putting Citizens at The Center of Digital Government
Making Security Work—Implementing a Transformational Security Program
Keynote: Making Security a Competitive Advantage
Emerging Managed Services Opportunities in Identity and Access Management
The Unmet Demand for Premium Cloud Monitoring Services—and How Service Provid...
Leveraging Monitoring Governance: How Service Providers Can Boost Operational...
The Next Big Service Provider Opportunity—Beyond Infrastructure: Architecting...
Application Experience Analytics Services: The Strategic Digital Transformati...
Application Experience Analytics Services: The Strategic Digital Transformati...
Strategic Direction Session: Deliver Next-Gen IT Ops with CA Mainframe Operat...
Strategic Direction Session: Enhancing Data Privacy with Data-Centric Securit...
Blockchain: Strategies for Moving From Hype to Realities of Deployment
Establish Digital Trust as the Currency of Digital Enterprise

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Spectral efficient network and resource selection model in 5G networks
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Monthly Chronicles - July 2025
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Network Security Unit 5.pdf for BCA BBA.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Spectral efficient network and resource selection model in 5G networks
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development

Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Workload Automation Advanced Integration for Hadoop

  • 1. Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Workload Automation Advanced Integration for Hadoop Beeshmanth (B) Kotamreddy DevOps: Continuous Delivery CA Technologies Principal Product Manager DO4T42T @TwitterHandle #CAWorld
  • 2. 2 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD For Informational Purposes Only Terms of this Presentation © 2015 CA. All rights reserved. All trademarks referenced herein belong to their respective companies. The presentation provided at CA World 2015 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer references relate to customer's specific use and experience of CA products and solutions so actual results may vary. Certain information in this presentation may outline CA’s general product direction. This presentation shall not serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future license agreement or services agreement relating to any CA software product; or (ii) amend any product documentation or specifications for any CA software product. This presentation is based on current information and resource allocations as of November 18, 2015, and is subject to change or withdrawal by CA at any time without notice. The development, release and timing of any features or functionality described in this presentation remain at CA’s sole discretion. Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in this presentation, CA may make such release available to new licensees in the form of a regularly scheduled major product release. Such release may be made available to licensees of the product who are active subscribers to CA maintenance and support, on a when and if- available basis. The information in this presentation is not deemed to be incorporated into any contract.
  • 3. 3 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Agenda BIGDATA AND CHANGING CUSTOMER NEEDS HADOOP Q & A BUSINESS CHALLENGES CA WORKLOAD AUTOMATION ADVANCED INTEGRATION FOR HADOOP DEMO 1 2 3 4 5 6
  • 4. 4 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Maximize the value of Big Data with the power of Workload Automation  HDFS Operations  Pig  Hive  Sqoop  Oozie Workflows Exciting, disruptive & evolving ecosystem "80% of customer data will be wasted due to immature enterprise data 'value chains.' “ ~IDC
  • 5. 5 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD What is Big Data? Datasets whose volume, velocity, variety and complexity exceed ability of commonly used software tools to capture, process, store, manage, and analyze them. Information Sources MobileTransactional Data SearchTextsCRM, SCM, ERP $ € ¥ ImagesEmail Social Media IT Ops AudioVideo Velocity Volume Variety Complexity Big Data
  • 6. 6 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Enterprises across all industries use Big Data Enterprises require new capabilities around processing large amounts of data in a variety of different formats  Fraud Prevention  Trading Risks  Customer Risk Assessment  Call Detail Records  Real-time bandwidth allocations  Life time value and promotions RETAILERS  Customer Analytics  Brand Sentiment Analytics  Promotion Planning TELCO CARRIERSBANKS  Genomic Analysis  Medical trial Analysis  Hospital Diagnostics Analytics  IOT/Smart Meter Analytics  Energy trading and pricing risk analytics GOVERNMENT/ PUBLIC SECTOR  Crime Intelligence and Prevention  Fraud Prevention UTILITY PROVIDERSHEALTH CARE PROVIDERS $
  • 7. 7 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD What is Hadoop ? Hadoop is… open-source software designed for High Scalability, Fault Tolerant and Highly Distributed Key elements: 1. Distributed processing of Big Data (e.g. MapReduce) 2. Distributed storage (Hadoop Distributed File System or HDFS) HDFS (Distributed Reliable Storage) MapReduce (Resource Management & Data Processing) HDFS (Distributed Reliable Storage) YARN (Resource Management) MapReduce (Dist. Programming) Hadoop 1.0 Hadoop 2.0 Spark (In Memory) HBase (NoSQLstore) Hive (Query) Pig (Scripting) Oozie (Workflow)
  • 8. 8 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD MapReduce – Core Hadoop1  Hadoop’s MapReduce framework involves two phases: 1. Map Phase: Distributes dataset among multiple servers and operates on the data locally. 2. Reduce Phase: Recombines the partial results. A distributed computing Framework
  • 9. 9 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Job-1 Job-2 Job-3 Job-4 Job-5 HDFS Data Nodes Task Trackers Hadoop Distributed File System (HDFS) Self-healing, high bandwidth Clustered Storage • Name Node - One of the Core Hadoop services that maintains the namespace – knows where data is and manages blocks on data nodes • Data Node - serves that actual store the data in their local disks. • Secondary Name Node - performs periodic checkpoint of primary name node to serve as a backup in case of failure Slave Nodes 2 4 5 1 2 5 1 3 4 2 3 5 1 3 4 HDFS breaks incoming files into blocks and stores them redundantly across the cluster. Name Node (primary) Name Node (secondary) Master Node Periodic Checkpoint 2
  • 10. 10 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD SO, YOU HAVE DATA And you want it to help you better understand your business, customers and marketplace. THAT’S WHY YOU USE HADOOP But, extracting data insights may require you to interface with systems outside of Hadoop. And that isn’t always easy…
  • 11. 11 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Enterprises typically have multiple scheduling engines to manage end-to-end business processes Companies typically interface with multiple systems such as ERP (SAP/ Oracle etc.), databases, reporting tools, point of sale systems, social media files etc., in addition to Hadoop As a result, Enterprises use multiple tools to manage their workload automation needs Visualizing the end-to-end business workflows, & managing dependencies across Hadoop and non-Hadoop systems might not always be easy
  • 12. 12 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Challenges Multiple Schedulers needed to run traditional jobs and Hadoop jobs  Hadoop jobs may not integrate into existing Workflows Heterogeneous Environment and Tools  Team productivity, experience, knowledge  Placing workloads - “right place , right time” Slow responsiveness to the business  No central location to monitor end-end workflows
  • 13. 13 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Drag and drop Hadoop jobs into existing workflows. Monitor traditional and Hadoop jobs from a single console. Detect problems early and resolve them quickly. Set up automatic alerts for critical events. Unified visibility into your heterogeneous and Hadoop environments Improved performance and uptime through proactive monitoring and alerts Lower costs by eliminating the complexity of disconnected monitoring tools BIG DATA MADE EASY with CA Workload Automation Advanced Integration for Hadoop
  • 14. 14 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Extract and Move Input Files Transform and Process Input Run Specialized ERP jobs to extract data Ingest Data in DW Batch Ingestion into Hadoop and Batch Analytics Load results into NoSQL for interactive queries FTP JOB Receive POS data DATABASE JOB Extract Pricing data when database trigger occurs INFORMATICA JOB Parse POS Files INFORMATICA JOB Run SQL query to match customer data in SAP SAP JOB Run SAP Extract Job to extract inventory data TERADATA JOB Run ETL to merge input files into DW SQOOP JOB Run Sqoop jobs to copy data into Hadoop cluster PIG JOB Run pig jobs for operational analytics NOSQL JOB Interactive Search job to run dynamic promotion WorkflowWorkloadsUsecase Extract POS, Inventory, Price Data Mine Customer Information and Inventory Information from ERP Load Data into NoSQL and render dynamic discounting on-demand Perform Batch aggregation and Machine learning for Promotion Analytics CA Workload Automation extends scheduling for Big Data Retail Customer Analytics in the Application Economy
  • 15. 15 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Timeline as of November, 2015  Reduce Silos and increase efficiency with centralized scheduling of Hadoop & Non- Hadoop workloads through supporting  HDFS Operations  Pig  Hive  Sqoop  Oozie Workflows  Integrated Hadoop & Heterogeneous environments through supporting Cloudera and HortonWorks distributions for Hadoop Marquee Features/ Business Value Planned Under Consideration DELIVERED PLANNED UNDER CONSIDERATION Product/ Releases CA WLA Advanced Integration for Hadoop R12.0 Currently GA Roadmap: Product Name  Unify additional Hadoop environments through supporting other distributions such as IBM Big Insights, MapR, Greenplum  Extended security through native support for Kerberos authentication.  Centralized Scheduling for evolving Hadoop ecosystem with integrations to Spark, Hbase, Flume, Kafka, Tajo etc. CA WLA Advanced Integration for Hadoop R12.X CA WLA Advanced Integration for Hadoop RX.X
  • 16. 16 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD The Evolving Hadoop Ecosystem Components Description mahout R Data Mining/machine learning tools used against Hadoop data to detect patterns and trends Pig Scripting language for analyzing large datasets. Compiles to MapReduce jobs MapReduce YARN Programming model for processing large data sets. YARN performs overall resource mgmt Oozie A workflow scheduler tool to manage Hadoop MapReduce jobs Sqoop Hive Enable SQL for Hadoop data: Sqoop - Data transfer between Hadoop and structured datastores. HIVE - data warehouse for Hadoop. Drill - open source, low latency SQL query engine for Hadoop and NoSQL. Drill ZooKeeper Coordination of config. data, naming and synchronization of Hadoop projects Components Description BigTop Packaging services for Hadoop projects to ease testing and deployment HBase A non-relational, distributed database that runs on top of HDFS Thrift / AVRO Schema-based data serialization system using RPC calls Solr hutch Indexing and search tools for data stored in HDFS for Hadoop Elasticsearch Kafka / Flume Collect, aggregate, and move streaming data from multiple sources into Hadoop Spark AppDev tool for Hadoop apps combining batch, streaming, and interactive analytics Anbari Chukwa Monitoring & Management of Hadoop clusters and nodes
  • 17. 17 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Influencing Our Roadmap Winning with CA  Submit your ideas on communities.ca.com  Vote & comment on ideas that are important to you  CA Product Management reviews ideas and updates status as they move through the lifecycle  “Currently Planned” idea status indicates inclusion in Agile Backlog or Product Roadmap Take the opportunity to influence our product development. Help ensure that we deliver is what you need and want. Agile Development CA Communities Ideation  Register to participate in: – Live Demos/End-of-Sprint Reviews – Private - Members Only - Online Community – Pre-Release Onsite Testing and Support (Beta) – Upgrade Support from SWAT Team  How to register: https://validate.ca.com Customer Validation
  • 18. 18 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD DEMO CA Workload Automation Advanced Integration for Hadoop
  • 19. 19 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD Q & A
  • 20. 20 © 2015 CA. ALL RIGHTS RESERVED.@CAWORLD #CAWORLD For More Information To learn more, please visit: http://cainc.to/Nv2VOe CA World ’15