SlideShare a Scribd company logo
DEEP DIVE
Building a Streaming-Enabled Architecture
www.dmradio.biz
Featured Speakers
 Hardware (network, storage, servers)
 Data Sources
 Data Staging
 Data Volumes
 Data Flow
 Data Governance
 Data Usage
 Data Structures
 Schema Definition
 Ingest Speeds
 Data Workloads
Everything Is In Flux
The Impact of Parallelism
We used to see 10x performance
improvement every 6 years, now we
see 1000x (and that’s just an
approximation) regularly
You Can’t Build Skyscrapers with Bricks & Mortar
A Renaissance in Data Engineering Is Underway
- Web giants innovated to solve their own challenges
- Facebook, Google, LinkedIn, Yahoo! and others…
- By open-sourcing software, these companies
changed how the industry operates, how tech is built
- The result is a new world of scale-out software
- Innovations span the spectrum of functionality:
database, analytics, networking, data flow, security
- Paramount among these in terms of significance is
the world of streaming data and supporting tools
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
Competition Demands Digital Transformation
Adopted by the EU, but affects the USA
Behemoths are straws in the wind
- They identified huge opportunities
- Upended entire industries
- Built bulletproof infrastructure
- Deconstructed business processes
- Re-architected processes at scale
- Instilled a data-driven approach
The Great Democratizer
Stream First Architecture
Technical Concepts
Streaming as Primary Method
• Must do it: at or near real-time, partial updates with low cycle time
• Events: financial fraud, stock trading, high frequency sensor to controller
(autonomous vehicles)
• Many different items co-mingled (Internet of Things)
• Older examples: Internet packets; digital sensors; computer high density disks
• Boosts business: faster awareness leads to faster response leads to
improved business (consumer activity monitoring)
• Add flexibility and lower Total Cost Ownership (if x and if y and if z)
• Avoid committee-itis
• Faster process-analyze-change cycles
• Allow personnel to address more topics in same time period
Processing
Logic
Processing
Logic
Processing
Logic
Processing
Logic
Processing
Logic
Processing
Logic
End User
End User
End UserAccumulator
Streams parallel processing multiple inputs. Streaming means processing occurs as data flows through.
Straight through or accumulate
• Timing
• Iterative
• Correlation
• Referencing
• Coding
• Real-time
• Near real-
time
• Non-time
Technical Issues: High Level
• Latency: built in delay time caused by many factors. Are you able and
willing to invest in finding and removing these items? Can be
significant cost to do so. Inherent latency in your use case.
• Order misalignment: data arrives or produced out of order
• Errors: detect and correct (don’t underestimate this). Embedded data
quality problems. Processing logic flaws. How to surgically update.
• Power usage: Higher power use per compute period (mobile)
• Storage space: overhead for parallel and streaming look ahead, look
back requires multiple copies. Content mgmt. procedures become
very important.
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
Technical Issues: Low Level
• Requires deep expertise (hard to find usually)
• Memory: must contain all data needed per calculation including
lookup codes, cross-references, accumulation arrays
• Memory mgmt. is usually weak in high level languages like Java, C#
• Cross-stream data exchange: how to truly separate data for full
independence (age old parallel computing challenge) or how to store
and forward across streams (what is needed, how much, how long)
• Dependencies: state mgmt., server health
• Latency: network slow down, security hand shakes (e.g. TLS),
database freezes, file contention, cluster IO
Buyer Advice: Look for the Special Things
• In memory: of course but especially for engineering of known state-
of-art problems like heap space, garbage collection, swapping
• Integration: both done for you (framework) and well documented
interface APIs using standard computer languages
• Error reporting: this is critical. Your personnel cost will go up
significantly without this because you will (really!!) experience many
crashes. Need meaningful error messages pointing to actual problem.
• Demonstrates knowledge and work in next generation: expanded
memory spaces; integrated memory-compute.
• Shows end-to-end more complicated use cases than yours. (Since
yours will become more complicated than you know quickly).
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
September 2018
Kevin Petrie
Sr Director, Product Marketing
ADOPTING A STREAMING-
ENABLED ARCHITECTURE
ATTUNITY
2© 2018 Attunity 2© 2017 Attunity
CLOUDSTREAMINGDATA LAKES
ON PREMDATA WAREHOUSE BATCH
MODERN DATA ARCHITECTURES
3© 2018 Attunity 3© 2017 Attunity
#1 provider of change
data capture (CDC)
Support most sources with
best performance and
least impact
#1 cloud database
migration technology
Already moved over
80,000 databases
to Cloud platforms
like AWS
#1 in ease of use
Designed to accelerate
deployments by data
architects and
DBAs instead of
developers
The leading platform for delivering data efficiently and in
real-time to data lake, streaming and cloud architectures
ATTUNITY
4© 2018 Attunity 4© 2017 Attunity
Automated, universal and
real-time data delivery
Accelerate creation of analytics-
ready data structures
ATTUNITY REPLICATE ATTUNITY COMPOSE
ATTUNITY ENTERPRISE MANAGER
Intelligent management, metadata & control
THE ATTUNITY PLATFORM
5© 2018 Attunity 5© 2017 Attunity
Pre-packaged
automation of complex
tasks
Modern user experience
Zero source footprint
Change data
capture (CDC)
Stream live updates
Optimized for high-
performance movement
All major platforms
DB | DW | Hadoop |
Legacy
On Premises | Cloud
SAP | Mainframe
Simplified Real-Time Universal
DATA DELIVERY WITH ATTUNITY REPLICATE
6© 2018 Attunity
Configure, execute and
monitor data flows
Multiple data centers
On premises and cloud
Scale to 1,000s of tasks
OPERATIONS ANALYTICS CONTROL
Historical and real-time
reporting
Capacity planning
Performance monitoring
Visually optimize operations
Create and discover
operational metadata
Microservices integration
via .NET and REST APIs
ATTUNITY ENTERPRISE MANAGER
7© 2018 Attunity 7© 2017 Attunity
WHAT IT IS
ESSENTIAL CHARACTERISTICS OF MODERN DATA STREAMING
Producers and consumers are independent
Communication is asynchronous
Records are persisted for future use
High throughput – e.g., records/second
Records are sent in re-playable, ordered sequence
Geo-distributed replication
Fault tolerance
8© 2018 Attunity 8© 2017 Attunity
WHAT IT IS
APACHE KAFKA
Open source distributed streaming platform for moving, storing and processing
high volumes of data in real time
Developed by Jay Kreps and colleagues to process many continuous data flows at
LinkedIn
More scalable, more fault-tolerant and higher-performance than traditional
message-oriented middleware
Used for building real-time streaming data pipelines and streaming applications
9© 2018 Attunity 9© 2017 Attunity
WHAT IT IS
KAFKA VS. ENTERPRISE SERVICE BUS PREDECESSORS
ActiveMQ, RabbitMQ and IBM MQSeries
Centralized, highly scalable cluster to serve all
applications across large enterprise environments
Persistent storage system for configurable time periods,
including forever
Abstracted stream processing to easily create derived
streams and datasets with minimal coding
10© 2018 Attunity 10© 2017 Attunity
WHY DATA STREAMING MATTERS
“LIFE DOESN’T HAPPEN IN BATCHES”
Data streams enable businesses to react to
events as they happen
Streaming data improves efficiency and
scalability
Can be used for multiple purposes, by
multiple users
11© 2018 Attunity 11© 2017 Attunity
HOW IT WORKS
HIGH LEVEL ARCHITECTURE
Producers send records to brokers to be
read by consumers
Broker persists records to file system on
disk for subsequent usage
Records are grouped into topics for
selected consumer use
Topics can be partitioned to improve
throughput and redundancy
CONSUMER
KAFKA
BROKER
CONSUMER
CONSUMER
PRODUCER
PRODUCER
PRODUCER
DISK
12© 2018 Attunity 12© 2017 Attunity
HOW IT WORKS
KEY COMPONENTS
(a.k.a. message): unit of data, similar to DB row or record
(a.k.a. publisher/writer): process that creates and publishes records
(a.k.a. subscriber/reader): process that reads records
Kafka instance that receives records from producers, persists, provides them to consumers
Group of two or more brokers that provide redundancy and scalability
(a.k.a. stream): category of record to which a consumer subscribes
Subset of topic created to enable redundancy, and parallel reading/writing for higher
performance
RECORD
PRODUCER
CONSUMER
BROKER
CLUSTER
TOPIC
PARTITION
13© 2018 Attunity 13© 2017 Attunity
HOW IT WORKS
DETAILED ARCHITECTURE
Key
TOPIC(S)
Data
0111
1110
PRODUCER(S)
Creates
records with
serialized key
and value
Identifies
location of
write partition
for topic
RECORD
BROKER
0111
1110 0111
1110
Partition
Leader
RECORD
0111
1110
CONSUMER(S)
Deserializes
record to
create
original key
and value
Key
Data
0111
1110
BROKER
0111
1110
Partition
Replica
TOPIC(S)
*Adapted from Enabling Streaming Architectures for Continuous Data and Events with Kafka; Gartner; TJ Craig, Gary Oliffe, Soyeb Barot; 23 May 2018
14© 2018 Attunity 14© 2017 Attunity
HOW IT WORKS
DETAILED ARCHITECTURE
Key
TOPIC(S)
Data
0111
1110
PRODUCER(S)
Creates
records with
serialized key
and value
Identifies
location of
write partition
for topic
RECORD
BROKER
0111
1110 0111
1110
Partition
Leader
RECORD
0111
1110
CONSUMER(S)
Deserializes
record to
create
original key
and value
Key
Data
0111
1110
TOPIC(S)
CHANGE DATA CAPTURE
Eliminates manual scripting to
configure record creation from
source database transactions
*Adapted from Enabling Streaming Architectures for Continuous Data and Events with Kafka; Gartner; TJ Craig, Gary Oliffe, Soyeb Barot; 23 May 2018
BROKER
0111
1110
Partition
Replica
15© 2018 Attunity 15© 2017 Attunity
Generate real-time events
Multi topic, multi partition
One-to-many event
publication
Schema evolution; easy
schema registry integration
DATABASE AS A STREAM UNIVERSAL STREAMING METADATA
Amazon KinesisAzure Event Hub
ATTUNITY AND DATA STREAMING
16© 2018 Attunity 16© 2017 Attunity
USE CASES
STREAMING
INGESTION
MESSAGE/EVENT
BROKER
PREPROCESSING
FOR MACHINE
LEARNING
EVENT STREAM
PROCESSING
DATA PERSISTENCE
REAL-TIME
ANALYTIC
PROCESSING
MICROSERVICES
17© 2018 Attunity 17© 2017 Attunity
KAFKA
STREAMING INGESTION AND MESSAGE BROKER
FORTUNE 100 FOOD PROCESSOR
CDC to HDP data lake
Attunity Replicate feeds HDFS, HBase for
timely reporting and product delivery
Needed real-time view of production
capacity and customer orders
Nightly batch loads couldn’t keep up
=> Fulfilment delays, inaccurate reports
PROBLEM SOLUTION
ATTUNITY
REPLICATE
Log based
CDC
SAP ECC
10 tables
(purchase orders,
production plans)
HDP DATA LAKE
18© 2018 Attunity 18© 2017 Attunity
Copies live transactions without
touching production
Securely transfers them for
client usage on global AWS
microservices platform
Need to efficiently roll out extensive
cloud-based microservices platform
Must minimize latency and security risk
while synchronizing massive
transactional updates globally
PROBLEM SOLUTION
$1
Trillion
ATTUNITY
REPLICATEDB2 z/OS
(on prem)
KINESIS DYNAMO DB
MICROSERVICE
HUB ON RDS
DYNAMO DB
STREAMS
RDS
EMEA
CUSTOMER
APJ
CUSTOMER
AWS CLOUD
CDC
MICROSERVICES
LEADING ASSET MANAGEMENT FIRM
Assetsunder
Management
19© 2018 Attunity 19© 2017 Attunity
EVENT STREAM PROCESSING
FORTUNE 100 PHARMACEUTICAL FIRM
CDC to Kafka to Lambda Architecture
Multi-pronged analysis of clinical data at
scale
Minimal administrative burden; no PROD
impact
Needed efficient, scalable delivery of
clinical data for analytics
Lacked tools for low-impact data capture
PROBLEM SOLUTION
ATTUNITY
REPLICATE
Log based
CDC
Lambda
Architecture
Clinical
Systems
Structured Analysis
Clinical
Systems
Batch Historical
Data
Real Time
Updates
Stream
Processing
Graph Analysis
Natural Language
Processing
Machine Learning
KAFKA
20© 2018 Attunity 20© 2017 Attunity
Improved ease of use
Consistent, 100% automated processes
across end points
Reduced impact on MF production
Improved performance to Kafka
Better TCO
Needed 360 degree customer view for
CSAT initiatives
Inefficient OGG solution
Lack of data consistency and
standardization
PROBLEM ATTUNITY SOLUTION
Fortune 100
company
$100
BILLION
STREAMING INGESTION
FORTUNE 100 HEALTH BENEFITS FIRM
DB2 Z/OS
SQL SERVER
ORACLE
KAFKA
ATTUNITY
REPLICATE
Log based
CDC
HDP DATA LAKE
21© 2018 Attunity
MACHINE LEARNING PRE-PROCESSING
LEADING PAYMENT PROCESSOR
BANK/
MERCHANT
GATEWAY
APPLICATION AND
DATABASE
OPENSCORING.IO
DECISION
SERVICE ENGINE
Credit Check,
Authentication
Decision < 100 MS
CHANGE DATA
CAPTURE
A B
Transactions
Decisions
logged real
time
MACHINE
LEARNING
Data analyzed
over days
Decision
models
published
with new
insights
Data
delivered
in seconds
Data delivered
in minutes
Decision
performance
monitored
Real time decisions
Decision measurement
and tuning
C
D
E
F
A
22© 2018 Attunity
Overall
Rating
Product
Capabilities
Ease of
Deployment
4.5 out of 5 4.4 4.2
4.1 out of 5 4.2 3.7
4.2 out of 5 4.3 4.0
4.1 out of 5 4.3 4.0
VENDOR OF CHOICE
Replicate has been working great for
several years, implementation was a
breeze
- DBA, Retail Industry
Great vendor to work with and an
incredibly easy tool to use
- Senior Member of Technical Staff,
Communications Industry
23© 2018 Attunity 23© 2017 Attunity
Trusted by Microsoft
with 3 OEMs,
bundled inside
SQL Server
Trusted by Amazon
(AWS) with strategic
partnership for cloud
database migration
Trusted by IBM and
Oracle with respective
OEMs of Attunity
technology
Trusted by Teradata
and HP as resellers for
data warehouse and
analytics
Trusted by
global system
integrators
Trusted by over
2000 customers for
commitment, flexibility
and speed
2000+
Trusted by SAP as
certified solution in use
with over 200 SAP
customers
Trusted by big data
leaders for data lake
solutions
Trusted by IBM and
Oracle with respective
OEMs of Attunity
technology
Trusted by Teradata
and HP as resellers for
data warehouse and
analytics
PARTNER OF CHOICE
24© 2018 Attunity
FINANCIAL
SERVICES
MANUFACTURING/
INDUSTRIAL
HEALTH
CARE
GOVERNMENT TECHNOLOGY /
TELECOM
RETAIL OTHER
INDUSTRIES
2000 CUSTOMERS AND HALF THE FORTUNE 100
Thank you
attunity.com
Kevin Petrie
Kevin.Petrie@Attunity.com

More Related Content

PDF
DI&A Slides: Data Lake vs. Data Warehouse
PDF
The Key to Big Data Modeling: Collaboration
PDF
DataOps - The Foundation for Your Agile Data Architecture
PDF
Data-Ed: Data Architecture Requirements
PDF
Building a Collaborative Data Architecture
PDF
Strategic imperative the enterprise data model
PDF
Building the Modern Data Hub
PDF
Seiner dataversity - rwdg 2017-09 - how to select the appropriate data gove...
DI&A Slides: Data Lake vs. Data Warehouse
The Key to Big Data Modeling: Collaboration
DataOps - The Foundation for Your Agile Data Architecture
Data-Ed: Data Architecture Requirements
Building a Collaborative Data Architecture
Strategic imperative the enterprise data model
Building the Modern Data Hub
Seiner dataversity - rwdg 2017-09 - how to select the appropriate data gove...

What's hot (20)

PDF
RWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data Governance
PPTX
The Need to Know for Information Architects: Big Data to Big Information
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
PPTX
IDERA Slides: Managing Complex Data Environments
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PDF
Data Prep - A Key Ingredient for Cloud-based Analytics
PDF
Everybody is a Data Steward – Get Over It!
PDF
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
PDF
Five Things to Consider About Data Mesh and Data Governance
PDF
Focus on Your Analysis, Not Your SQL Code
PDF
RWDG Slides: Data Governance and Three Levels of Metadata Management
PDF
Data-Ed Webinar: Data Modeling Fundamentals
PDF
Big Challenges in Data Modeling: Modeling Metadata
PDF
Data-Ed Online: Trends in Data Modeling
PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
PDF
RWDG Slides: Metadata Governance for Catalogs, Glossaries, Dictionaries, and ...
PDF
Metadata Governance for Vocabularies, Dictionaries, and Data
PDF
Data-Ed: Essential Metadata Strategies
PDF
Approaching Data Quality
RWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data Governance
The Need to Know for Information Architects: Big Data to Big Information
Emerging Trends in Data Architecture – What’s the Next Big Thing?
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
IDERA Slides: Managing Complex Data Environments
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
Data Prep - A Key Ingredient for Cloud-based Analytics
Everybody is a Data Steward – Get Over It!
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Five Things to Consider About Data Mesh and Data Governance
Focus on Your Analysis, Not Your SQL Code
RWDG Slides: Data Governance and Three Levels of Metadata Management
Data-Ed Webinar: Data Modeling Fundamentals
Big Challenges in Data Modeling: Modeling Metadata
Data-Ed Online: Trends in Data Modeling
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
RWDG Slides: Metadata Governance for Catalogs, Glossaries, Dictionaries, and ...
Metadata Governance for Vocabularies, Dictionaries, and Data
Data-Ed: Essential Metadata Strategies
Approaching Data Quality
Ad

Similar to DM Radio Webinar: Adopting a Streaming-Enabled Architecture (20)

PDF
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
PDF
7_considerations_final
PPTX
Digital Business Transformation in the Streaming Era
PDF
Real-time processing of large amounts of data
PDF
Pivoting event streaming, from PROJECTS to a PLATFORM
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
PPTX
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
PDF
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
PDF
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
PDF
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
PDF
Streaming analytics
PDF
BD_Architecture and Charateristics.pptx.pdf
PDF
Cloud Lambda Architecture Patterns
PPTX
Trivento summercamp masterclass 9/9/2016
PDF
Future-proof-Architecture-for-Streaming-Data-Analytics-WhitePaper
PDF
Lyft data Platform - 2019 slides
PDF
The Lyft data platform: Now and in the future
PPTX
Data streaming fundamentals
PDF
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
PDF
Apache kafka event_streaming___kai_waehner
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
7_considerations_final
Digital Business Transformation in the Streaming Era
Real-time processing of large amounts of data
Pivoting event streaming, from PROJECTS to a PLATFORM
Data in Motion Tour 2024 Riyadh, Saudi Arabia
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
Streaming analytics
BD_Architecture and Charateristics.pptx.pdf
Cloud Lambda Architecture Patterns
Trivento summercamp masterclass 9/9/2016
Future-proof-Architecture-for-Streaming-Data-Analytics-WhitePaper
Lyft data Platform - 2019 slides
The Lyft data platform: Now and in the future
Data streaming fundamentals
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Apache kafka event_streaming___kai_waehner
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Catalogs Are the Answer – What Is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What Is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends and Best Practices To Implement Today
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage

Recently uploaded (20)

PPTX
Global journeys: estimating international migration
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Lecture1 pattern recognition............
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Quality review (1)_presentation of this 21
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Introduction to Business Data Analytics.
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
Global journeys: estimating international migration
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
IB Computer Science - Internal Assessment.pptx
Fluorescence-microscope_Botany_detailed content
Lecture1 pattern recognition............
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Quality review (1)_presentation of this 21
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Business Data Analytics.
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Supervised vs unsupervised machine learning algorithms
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Acumen Training GuidePresentation.pptx
Launch Your Data Science Career in Kochi – 2025

DM Radio Webinar: Adopting a Streaming-Enabled Architecture

  • 1. DEEP DIVE Building a Streaming-Enabled Architecture www.dmradio.biz
  • 3.  Hardware (network, storage, servers)  Data Sources  Data Staging  Data Volumes  Data Flow  Data Governance  Data Usage  Data Structures  Schema Definition  Ingest Speeds  Data Workloads Everything Is In Flux
  • 4. The Impact of Parallelism We used to see 10x performance improvement every 6 years, now we see 1000x (and that’s just an approximation) regularly
  • 5. You Can’t Build Skyscrapers with Bricks & Mortar
  • 6. A Renaissance in Data Engineering Is Underway - Web giants innovated to solve their own challenges - Facebook, Google, LinkedIn, Yahoo! and others… - By open-sourcing software, these companies changed how the industry operates, how tech is built - The result is a new world of scale-out software - Innovations span the spectrum of functionality: database, analytics, networking, data flow, security - Paramount among these in terms of significance is the world of streaming data and supporting tools
  • 8. Competition Demands Digital Transformation Adopted by the EU, but affects the USA Behemoths are straws in the wind - They identified huge opportunities - Upended entire industries - Built bulletproof infrastructure - Deconstructed business processes - Re-architected processes at scale - Instilled a data-driven approach
  • 11. Streaming as Primary Method • Must do it: at or near real-time, partial updates with low cycle time • Events: financial fraud, stock trading, high frequency sensor to controller (autonomous vehicles) • Many different items co-mingled (Internet of Things) • Older examples: Internet packets; digital sensors; computer high density disks • Boosts business: faster awareness leads to faster response leads to improved business (consumer activity monitoring) • Add flexibility and lower Total Cost Ownership (if x and if y and if z) • Avoid committee-itis • Faster process-analyze-change cycles • Allow personnel to address more topics in same time period
  • 12. Processing Logic Processing Logic Processing Logic Processing Logic Processing Logic Processing Logic End User End User End UserAccumulator Streams parallel processing multiple inputs. Streaming means processing occurs as data flows through. Straight through or accumulate • Timing • Iterative • Correlation • Referencing • Coding • Real-time • Near real- time • Non-time
  • 13. Technical Issues: High Level • Latency: built in delay time caused by many factors. Are you able and willing to invest in finding and removing these items? Can be significant cost to do so. Inherent latency in your use case. • Order misalignment: data arrives or produced out of order • Errors: detect and correct (don’t underestimate this). Embedded data quality problems. Processing logic flaws. How to surgically update. • Power usage: Higher power use per compute period (mobile) • Storage space: overhead for parallel and streaming look ahead, look back requires multiple copies. Content mgmt. procedures become very important.
  • 15. Technical Issues: Low Level • Requires deep expertise (hard to find usually) • Memory: must contain all data needed per calculation including lookup codes, cross-references, accumulation arrays • Memory mgmt. is usually weak in high level languages like Java, C# • Cross-stream data exchange: how to truly separate data for full independence (age old parallel computing challenge) or how to store and forward across streams (what is needed, how much, how long) • Dependencies: state mgmt., server health • Latency: network slow down, security hand shakes (e.g. TLS), database freezes, file contention, cluster IO
  • 16. Buyer Advice: Look for the Special Things • In memory: of course but especially for engineering of known state- of-art problems like heap space, garbage collection, swapping • Integration: both done for you (framework) and well documented interface APIs using standard computer languages • Error reporting: this is critical. Your personnel cost will go up significantly without this because you will (really!!) experience many crashes. Need meaningful error messages pointing to actual problem. • Demonstrates knowledge and work in next generation: expanded memory spaces; integrated memory-compute. • Shows end-to-end more complicated use cases than yours. (Since yours will become more complicated than you know quickly).
  • 18. September 2018 Kevin Petrie Sr Director, Product Marketing ADOPTING A STREAMING- ENABLED ARCHITECTURE ATTUNITY
  • 19. 2© 2018 Attunity 2© 2017 Attunity CLOUDSTREAMINGDATA LAKES ON PREMDATA WAREHOUSE BATCH MODERN DATA ARCHITECTURES
  • 20. 3© 2018 Attunity 3© 2017 Attunity #1 provider of change data capture (CDC) Support most sources with best performance and least impact #1 cloud database migration technology Already moved over 80,000 databases to Cloud platforms like AWS #1 in ease of use Designed to accelerate deployments by data architects and DBAs instead of developers The leading platform for delivering data efficiently and in real-time to data lake, streaming and cloud architectures ATTUNITY
  • 21. 4© 2018 Attunity 4© 2017 Attunity Automated, universal and real-time data delivery Accelerate creation of analytics- ready data structures ATTUNITY REPLICATE ATTUNITY COMPOSE ATTUNITY ENTERPRISE MANAGER Intelligent management, metadata & control THE ATTUNITY PLATFORM
  • 22. 5© 2018 Attunity 5© 2017 Attunity Pre-packaged automation of complex tasks Modern user experience Zero source footprint Change data capture (CDC) Stream live updates Optimized for high- performance movement All major platforms DB | DW | Hadoop | Legacy On Premises | Cloud SAP | Mainframe Simplified Real-Time Universal DATA DELIVERY WITH ATTUNITY REPLICATE
  • 23. 6© 2018 Attunity Configure, execute and monitor data flows Multiple data centers On premises and cloud Scale to 1,000s of tasks OPERATIONS ANALYTICS CONTROL Historical and real-time reporting Capacity planning Performance monitoring Visually optimize operations Create and discover operational metadata Microservices integration via .NET and REST APIs ATTUNITY ENTERPRISE MANAGER
  • 24. 7© 2018 Attunity 7© 2017 Attunity WHAT IT IS ESSENTIAL CHARACTERISTICS OF MODERN DATA STREAMING Producers and consumers are independent Communication is asynchronous Records are persisted for future use High throughput – e.g., records/second Records are sent in re-playable, ordered sequence Geo-distributed replication Fault tolerance
  • 25. 8© 2018 Attunity 8© 2017 Attunity WHAT IT IS APACHE KAFKA Open source distributed streaming platform for moving, storing and processing high volumes of data in real time Developed by Jay Kreps and colleagues to process many continuous data flows at LinkedIn More scalable, more fault-tolerant and higher-performance than traditional message-oriented middleware Used for building real-time streaming data pipelines and streaming applications
  • 26. 9© 2018 Attunity 9© 2017 Attunity WHAT IT IS KAFKA VS. ENTERPRISE SERVICE BUS PREDECESSORS ActiveMQ, RabbitMQ and IBM MQSeries Centralized, highly scalable cluster to serve all applications across large enterprise environments Persistent storage system for configurable time periods, including forever Abstracted stream processing to easily create derived streams and datasets with minimal coding
  • 27. 10© 2018 Attunity 10© 2017 Attunity WHY DATA STREAMING MATTERS “LIFE DOESN’T HAPPEN IN BATCHES” Data streams enable businesses to react to events as they happen Streaming data improves efficiency and scalability Can be used for multiple purposes, by multiple users
  • 28. 11© 2018 Attunity 11© 2017 Attunity HOW IT WORKS HIGH LEVEL ARCHITECTURE Producers send records to brokers to be read by consumers Broker persists records to file system on disk for subsequent usage Records are grouped into topics for selected consumer use Topics can be partitioned to improve throughput and redundancy CONSUMER KAFKA BROKER CONSUMER CONSUMER PRODUCER PRODUCER PRODUCER DISK
  • 29. 12© 2018 Attunity 12© 2017 Attunity HOW IT WORKS KEY COMPONENTS (a.k.a. message): unit of data, similar to DB row or record (a.k.a. publisher/writer): process that creates and publishes records (a.k.a. subscriber/reader): process that reads records Kafka instance that receives records from producers, persists, provides them to consumers Group of two or more brokers that provide redundancy and scalability (a.k.a. stream): category of record to which a consumer subscribes Subset of topic created to enable redundancy, and parallel reading/writing for higher performance RECORD PRODUCER CONSUMER BROKER CLUSTER TOPIC PARTITION
  • 30. 13© 2018 Attunity 13© 2017 Attunity HOW IT WORKS DETAILED ARCHITECTURE Key TOPIC(S) Data 0111 1110 PRODUCER(S) Creates records with serialized key and value Identifies location of write partition for topic RECORD BROKER 0111 1110 0111 1110 Partition Leader RECORD 0111 1110 CONSUMER(S) Deserializes record to create original key and value Key Data 0111 1110 BROKER 0111 1110 Partition Replica TOPIC(S) *Adapted from Enabling Streaming Architectures for Continuous Data and Events with Kafka; Gartner; TJ Craig, Gary Oliffe, Soyeb Barot; 23 May 2018
  • 31. 14© 2018 Attunity 14© 2017 Attunity HOW IT WORKS DETAILED ARCHITECTURE Key TOPIC(S) Data 0111 1110 PRODUCER(S) Creates records with serialized key and value Identifies location of write partition for topic RECORD BROKER 0111 1110 0111 1110 Partition Leader RECORD 0111 1110 CONSUMER(S) Deserializes record to create original key and value Key Data 0111 1110 TOPIC(S) CHANGE DATA CAPTURE Eliminates manual scripting to configure record creation from source database transactions *Adapted from Enabling Streaming Architectures for Continuous Data and Events with Kafka; Gartner; TJ Craig, Gary Oliffe, Soyeb Barot; 23 May 2018 BROKER 0111 1110 Partition Replica
  • 32. 15© 2018 Attunity 15© 2017 Attunity Generate real-time events Multi topic, multi partition One-to-many event publication Schema evolution; easy schema registry integration DATABASE AS A STREAM UNIVERSAL STREAMING METADATA Amazon KinesisAzure Event Hub ATTUNITY AND DATA STREAMING
  • 33. 16© 2018 Attunity 16© 2017 Attunity USE CASES STREAMING INGESTION MESSAGE/EVENT BROKER PREPROCESSING FOR MACHINE LEARNING EVENT STREAM PROCESSING DATA PERSISTENCE REAL-TIME ANALYTIC PROCESSING MICROSERVICES
  • 34. 17© 2018 Attunity 17© 2017 Attunity KAFKA STREAMING INGESTION AND MESSAGE BROKER FORTUNE 100 FOOD PROCESSOR CDC to HDP data lake Attunity Replicate feeds HDFS, HBase for timely reporting and product delivery Needed real-time view of production capacity and customer orders Nightly batch loads couldn’t keep up => Fulfilment delays, inaccurate reports PROBLEM SOLUTION ATTUNITY REPLICATE Log based CDC SAP ECC 10 tables (purchase orders, production plans) HDP DATA LAKE
  • 35. 18© 2018 Attunity 18© 2017 Attunity Copies live transactions without touching production Securely transfers them for client usage on global AWS microservices platform Need to efficiently roll out extensive cloud-based microservices platform Must minimize latency and security risk while synchronizing massive transactional updates globally PROBLEM SOLUTION $1 Trillion ATTUNITY REPLICATEDB2 z/OS (on prem) KINESIS DYNAMO DB MICROSERVICE HUB ON RDS DYNAMO DB STREAMS RDS EMEA CUSTOMER APJ CUSTOMER AWS CLOUD CDC MICROSERVICES LEADING ASSET MANAGEMENT FIRM Assetsunder Management
  • 36. 19© 2018 Attunity 19© 2017 Attunity EVENT STREAM PROCESSING FORTUNE 100 PHARMACEUTICAL FIRM CDC to Kafka to Lambda Architecture Multi-pronged analysis of clinical data at scale Minimal administrative burden; no PROD impact Needed efficient, scalable delivery of clinical data for analytics Lacked tools for low-impact data capture PROBLEM SOLUTION ATTUNITY REPLICATE Log based CDC Lambda Architecture Clinical Systems Structured Analysis Clinical Systems Batch Historical Data Real Time Updates Stream Processing Graph Analysis Natural Language Processing Machine Learning KAFKA
  • 37. 20© 2018 Attunity 20© 2017 Attunity Improved ease of use Consistent, 100% automated processes across end points Reduced impact on MF production Improved performance to Kafka Better TCO Needed 360 degree customer view for CSAT initiatives Inefficient OGG solution Lack of data consistency and standardization PROBLEM ATTUNITY SOLUTION Fortune 100 company $100 BILLION STREAMING INGESTION FORTUNE 100 HEALTH BENEFITS FIRM DB2 Z/OS SQL SERVER ORACLE KAFKA ATTUNITY REPLICATE Log based CDC HDP DATA LAKE
  • 38. 21© 2018 Attunity MACHINE LEARNING PRE-PROCESSING LEADING PAYMENT PROCESSOR BANK/ MERCHANT GATEWAY APPLICATION AND DATABASE OPENSCORING.IO DECISION SERVICE ENGINE Credit Check, Authentication Decision < 100 MS CHANGE DATA CAPTURE A B Transactions Decisions logged real time MACHINE LEARNING Data analyzed over days Decision models published with new insights Data delivered in seconds Data delivered in minutes Decision performance monitored Real time decisions Decision measurement and tuning C D E F A
  • 39. 22© 2018 Attunity Overall Rating Product Capabilities Ease of Deployment 4.5 out of 5 4.4 4.2 4.1 out of 5 4.2 3.7 4.2 out of 5 4.3 4.0 4.1 out of 5 4.3 4.0 VENDOR OF CHOICE Replicate has been working great for several years, implementation was a breeze - DBA, Retail Industry Great vendor to work with and an incredibly easy tool to use - Senior Member of Technical Staff, Communications Industry
  • 40. 23© 2018 Attunity 23© 2017 Attunity Trusted by Microsoft with 3 OEMs, bundled inside SQL Server Trusted by Amazon (AWS) with strategic partnership for cloud database migration Trusted by IBM and Oracle with respective OEMs of Attunity technology Trusted by Teradata and HP as resellers for data warehouse and analytics Trusted by global system integrators Trusted by over 2000 customers for commitment, flexibility and speed 2000+ Trusted by SAP as certified solution in use with over 200 SAP customers Trusted by big data leaders for data lake solutions Trusted by IBM and Oracle with respective OEMs of Attunity technology Trusted by Teradata and HP as resellers for data warehouse and analytics PARTNER OF CHOICE
  • 41. 24© 2018 Attunity FINANCIAL SERVICES MANUFACTURING/ INDUSTRIAL HEALTH CARE GOVERNMENT TECHNOLOGY / TELECOM RETAIL OTHER INDUSTRIES 2000 CUSTOMERS AND HALF THE FORTUNE 100