Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs | Sheikh Araf and Ameya Panse, Goldman Sachs

arafshhomeMy DocumentsMonitoring and Resiliency Testing our Apache Kafka Clusters Send 1.pptx
Monitoring and Resiliency
Testing our Apache Kafka
Clusters
Ameya Panse
Associate
Goldman Sachs
Sheikh Araf
Associate
Goldman Sachs

We’re helping clients build a treasury of the future and powering
software partners to enhance their offerings.
Differentiated Platform

Apache Kafka Backbone
Apache Kafka Cluster
Payment
Service
Ledger Reporting
Validation
Data
Platform
Payment
Rails
CRM

Two Sides to Monitoring Kafka Infrastructure
Kafka
Cluster

App App App
Kafka
Cluster
DB
DB
App App App
App
App
Connectors
Stream
Processors
Consumers
Producers

Kafka Brokers
• CPU Usage
• Disk Usage
• Network Tx Packets
• Network Rx Packets
• Leader Count
• Under Replicated Partition
Count
Kafka Clients
• Producer Error Rate
• Active Consumer Connections
• Producer Retry Rate
• Consumer Connection Close
Rate
• Produce & Consume Latency

Monitoring the Clients

Collecting Client Metrics
Containers
Application
Container
JMX Agent
Sidecar
Virtual Machine
Application
Process
JMX Agent
Process
Datadog
Backend
Datadog
Dashboard
PagerDuty
Alerts

Frictionless onboarding
module "my-app" {
source = "..."
...
kafka_app_name = "my-service"
}
main.tf Behind the scenes:
• Add a JMX agent sidecar
• Configure agent to collect
Kafka client metrics
• Configure agent to
authenticate and push
metrics to dashboard

Monitoring TLS Certificate Expiry
<dependency>
<groupId>com.gs.txb.security</groupId>
<artifactId>certificate-info-endpoint</artifactId>
<version>1.0.0</version>
</dependency>
pom.xml
management.server.port=PORT
management.server.keystore=KeystoreLocation
application.properties
Containers
Application Container
JMX Agent Sidecar
/certs
Poll
Metric

Monitoring the Cluster

Synthetic Monitoring with Heartbeat App
• Broker Availability
• Cross-Zone Connectivity
• Monitor Support Services
• Real Time Alerting
• End to End Monitoring

Synthetic Monitoring Algorithm
Kafka Cluster
Broker 1
Broker 2
Broker 3
Partition 1
Leader
Partition 2
Leader
Partition 3
Leader
Producer 1
JMX
Sidecar
Producer 2
JMX
Sidecar
Producer 3
JMX
Sidecar
Producer Containers
Data Center 1
Data Center 2
Data Center 3
Consumer 1
JMX
Sidecar
Consumer 2
JMX
Sidecar
Consumer 3
JMX
Sidecar
Consumer Containers
Dashboard

Consolidating the Metrics
Dashboards and Alerts

Alerts

Putting it all together
Culture of Resiliency Game Days

Failure Scenarios
Zone 1
Kafka Cluster
Broker Broker
Zone 2
Broker Broker
Zone 3
Broker Broker
Zone 4
Broker Broker
Zone 1
Kafka Cluster
Broker Broker
Zone 2
Broker Broker
Zone 3
Broker Broker
Zone 4
Broker Broker
Zone 1
Kafka Cluster
Broker Broker
Zone 2
Broker Broker
Zone 3
Broker Broker
Zone 4
Broker Broker
Loss of one broker in one zone Loss of two broker in one zone Loss of two broker in different zones
Topic: Replication factor = 3
Cluster: Min in-sync replicas = 2
Producer: Acks = all

Game Days
Business Services
Transaction Banking Stack
Apache Kafka Backbone
App App App App
Stream of
Payment Messages
Track system health
via Kafka dashboard
Assert all payments
are successful
Ensure all
applications recover
automatically

In Summary…
• Apache Kafka as backbone for processing payments
• Monitor cluster health with synthetic traffic
• Monitoring clients using JMX agent sidecar
• Simplify the onboarding process to improve monitoring coverage
• One stop view to monitor the health of the infrastructure
• A culture of regular resiliency testing

Thank You
Ameya Panse
ameya.panse@gs.com
Sheikh Araf
sheikh.araf@gs.com

Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs | Sheikh Araf and Ameya Panse, Goldman Sachs

More Related Content

What's hot (20)

Similar to Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs | Sheikh Araf and Ameya Panse, Goldman Sachs (20)

More from HostedbyConfluent (20)

Recently uploaded (20)

Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs | Sheikh Araf and Ameya Panse, Goldman Sachs

Editor's Notes