SlideShare a Scribd company logo
Novel Multi-region Clusters 
Cassandra Deployments Split Between Heterogeneous Data Centres 
with NAT & DNS-SD 
#CassandraSummit
Adam Zegelin 
Co-founder & VP of Engineering 
www.instaclustr.com! 
adam@instaclustr.com 
@adamzegelin
Instaclustr 
• Instaclustr provides Cassandra-as-a-service in the cloud 
(Currently only on AWS — Google Cloud in private beta) 
• We currently manage 50+ Cassandra nodes for various customers 
• We often get requests to do cool things — and try and make it 
happen!
Multi-DC @ Instaclustr 
• Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud 
• Works out-of-the-box today. 
• Requires per-node public IP 
• Private network clusters ⇄ Cloud clusters 
• Easy if your private network allocates per-node public IP addresses 
• VPNs 
• Something else?
• Overview of multi- region/data centre clusters 
• What is supported out-of-the-box 
• Alternative solutions 
• Supporting technology overview (NAT/PAT and DNS-SD) 
• Implementation
Single Node 
• What you get from running 
apt-get install 
cassandra and /usr/bin/ 
cassandra 
• Fragile (no redundancy) 
• Dev/test/sandbox only 
C*
Multi-node, Single Data Centre 
• Two or more servers running 
Cassandra within one DC 
• Replication of data 
(redundancy) 
• Increased capacity (storage + 
throughput) 
• Baseline for production 
clusters 
C* C* 
C*
Multi-node, Multi-DC 
• Cassandra running in two or 
more data centres 
• Global deployments 
• Data near your customers 
(reduced latency) 
• Supported out-of-the-box 
C* C* 
C* 
C* C* 
C* 
C* C* 
C*
Snitches 
• Understands data centres and racks 
• Implementation may automatically determine node DC and rack 
(EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads 
a .properties file) 
• Node DC and rack is advertised via Gossip 
• Determine node proximity (estimated link latency) 
• Cluster may use a combination of Snitch implementations
Data Centres 
• Collection of Racks 
• Complete replications 
• Geographically separate 
• Possibly high-latency interconnects 
(e.g. East Coast US → Sydney, ~300ms round-trip)
Racks 
• Collection of nodes 
• May fail as a single unit 
• Modelled on the traditional DC rack/cage 
(n-servers running of a UPS)
☁️ 
• Amazon Web Services 
(use EC2MultiRegionSnitch) 
• Data Centre ≡ AWS Region 
(e.g. US_East_1, AP_SOUTHEAST_2) 
• Rack ≡ Availability Zone 
(e.g. us-east-1a, ap-southeast-2b) 
• Google Cloud Platform 
(no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!) 
• Data Centre ≡ GCP Region 
(e.g. US, Europe) 
• Rack ≡ Zone 
(e.g. us-central1-a, europe-west1-a)
Data Centre Aware 
• Cassandra is data centre aware 
• Only fetch data from a remote DC if absolutely required 
(remote data is more “expensive”) 
! 
• Clients can be made data centre aware 
• If your app knows its DC, client will talk to the closest DC
Cluster cluster = Cluster.builder() 
.addContactPoint(…) 
.withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) 
.build();
Multi DC Support 
• Per-node public (internet-facing) IP address 
• Optionally, per-node private IP address 
• Per-node public address is used for inter-data centre connectivity 
• Per node private address is used for intra-data centre connectivity
Multi DC Support 
• Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional 
• Easy to setup per-node public and private addresses 
• Private network clusters ⇄ Cloud clusters 
• Private networks: 푛 public addresses, shared by 푥 private 
addresses. Not 1 ↔ 1 
(where often 푥 > 푛) 
• done via Network Address Translation
IPv4 Address Space Exhaustion 
Source: http://www.potaroo.net/tools/ipv4/
Multi-DC Support 
• IPv4 
• Address exhaustion 
• Over time, will become more expensive to purchase addresses 
• Wasteful 
(being a good internet citizen)
Alternatives 
• IPv6 
• Java supports it ∴ Cassandra probably supports it 
(untested by us) 
• Global IPv6 adoption is ~4% 
(according to Google — google.com/intl/en/ipv6/statistics.html) 
• IPv6/IPv4 hybrid 
(Teredo, 6over4, et. al.) 
• AWS EC2 does not support IPv6. End of story. 
(Elastic Load Balancer does support IPv6)
Alternatives 
• VPNs 
• tinc, OpenVPN, etc. 
• All private address space — no dual addressing 
• Requires multiple links — between every DC and per client 
• Address space overlaps between multiple VPNs 
• Connectivity to multiple clusters an issue 
(for multi-cluster apps, centralised monitoring, etc)
Data Centres Links 
3 3 
5 10 
7 21
Alternatives 
• Network Address Translation (NAT) 
(aka IP Masquerading or Port Address Translation (PAT)) 
• Deployed on most private networks 
• Connectivity between private network clusters ⇄ Cloud clusters 
• Supports client connectivity to multiple clusters
NAT Basics 
• Re-maps IP address spaces 
(e.g. Public 96.31.81.80 ↔ Private 192.168.*.*) 
• 푛 public addresses, shared by 푥 private addresses. Not 1 ↔ 1 
(where often n = 1, 푥 > 푛) 
• Port Address Translation 
• Private port ↔ Public port 
• Outbound connections only without port forwarding or NAT traversal 
• Per DC gateway device — performs NAT and port forwarding
NAT with Inbound Connections 
• Static port forwarding 
(configured on the gateway) 
• Automatic port forwarding — UPnP, NAT-PMP/PCP 
(configured by the application, e.g. Cassandra) 
• NAT Traversal — STUN, ICE, etc.
NAT + C∗ 
Situation: 푛 Cassandra nodes, 1 public address per data centre 
• Port forward different public ports for each node 
• Advertise assigned ports 
• Modify Cassandra and client applications to connect to 
advertised ports
Advertising Port Mappings 
• Extend Cassandra Gossip 
• Include port numbers in node address announcements 
• Allow seed node addresses to include port numbers 
• Allow multiple nodes to have identical public & private addresses 
(only port numbers differ per DC) 
• How to bootstrap? SIP? 
• Cassandra must be aware of the allocated ports in order to advertise 
• Hard if C* is not directly responsible for the port mapping 
(e.g. static port forwarding) 
• Too many modifications to internals
Advertising Port Mappings 
• DNS-SD — dns-sd.org 
(aka Bonjour/Zeroconf) 
• Reads — works with existing DNS implementations 
(it’s just a DNS query) 
• Even inside restrictive networks, DNS usually works 
• Combination of DNS TXT, SRV and PTR records. 
• Updates 
• via DNS Update & TSIG — supported by bind 
• via API — e.g. for AWS Route 53
Advertising Port Mappings 
• DNS-SD cont’d. 
• SRV records contain hostname and port 
(i.e., hostname of the NAT gateway and public C* port) 
• TXT records contain key=value pairs 
(useful for additional connection & config details) 
• Modify C* connection code to lookup foreign node port from DNS 
• Modify client driver connection code to lookup ports from DNS 
• Can be queried & updated out-of-band 
(updated by the NAT device or central management server which knows which ports were mapped)
Advertised Details 
• Each cluster is it’s own browse domain 
• Each NAT gateway device has an A record in the browse domain 
• Each DNS-SD service is named based on the private IP address 
• Requires unique private IP addresses across data centres 
• SRV port is the C* thrift port 
• Additional ports are advertise via TXT
Configuration 
• Cassandra is configured to only use private addresses 
• On cluster creation 
• Establish a new DNS-SD browse domain 
• Create A records for each gateway device 
• NAT gateway device is notified when a new C* node is started 
• Allocates random public ports for C* and configures Port Forwarding 
• Updates DNS-SD 
• New SRV and TXT record
Output of dns-sd 
(Can also use avahi-browse, dig, or any other DNS query tool) 
$ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. 
Browsing for _cassandra._tcp 
! 
A/R Flags if Domain Service Type Instance Name 
Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4 
Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2 
Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3 
Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2 
Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4 
Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3 
$ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. 
Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. 
! 
192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws-us- 
east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) 
version=2.0.7 
cqlport=1237 
$ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. 
! 
Non-authoritative answer: 
Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au 
Address: 54.209.123.195
Java Driver Modifications 
public interface AddressTranslater { ! 
public InetSocketAddress translate(InetSocketAddress address); ! 
} 
• This is usually a no-op 
(the default is IdentityTranslater) 
• Modify translate() to perform a DNS-SD lookup. 
• The address parameter is a node private IP address. 
• Locate a service with a name = private IP address to determine 
public IP/port.
Modifying Cassandra 
public class OutboundTcpConnectionPool! 
{! 
! !! ⋮! 
public static Socket newSocket(InetAddress endpoint) throws IOException {…} 
⋮ 
} 
• Responsible for managing Socket connections. 
• Modify newSocket() to perform a DNS-SD lookup. 
• The endpoint parameter is a node private IP address. 
• Locate a service with a name = private IP address to determine 
public IP/port
NAT Gateway NAT Gateway 
C* C* 
C* 
C* C* 
C* 
DNS (+ DNS-SD) Server 
Client (Route 53, Self-hosted, etc) 
Application"
Thanks! 
Questions? 
adam@instaclustr.com

More Related Content

PDF
What’s New in Syncsort’s Trillium Software System (TSS) 15.7
PPT
BLOGIC. (ISWC 2009 Invited Talk)
PDF
Web 2 | CSS - Cascading Style Sheets
PDF
Cypher to SQL online mapper
PPTX
HTML/HTML5
PDF
C language in hindi (cलेग्वेज इन हिंदी )
PPT
Introduction to CSS
PDF
Multi-Region Cassandra Clusters
What’s New in Syncsort’s Trillium Software System (TSS) 15.7
BLOGIC. (ISWC 2009 Invited Talk)
Web 2 | CSS - Cascading Style Sheets
Cypher to SQL online mapper
HTML/HTML5
C language in hindi (cलेग्वेज इन हिंदी )
Introduction to CSS
Multi-Region Cassandra Clusters

Viewers also liked (20)

PDF
GumGum: Multi-Region Cassandra in AWS
PPTX
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
PDF
An Introduction to Priam
PDF
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
PDF
CrowdStrike: Real World DTCS For Operators
PDF
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
PDF
Carlos Santa María - Hiperconvergencia, el futuro del Data Center - semanainf...
PDF
NGCC 2016 - Support large partitions
PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
PPTX
3800 die-bonder overview
PDF
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
PPTX
Cassandra Summit 2015: Real World DTCS For Operators
PDF
Securing Cassandra
PDF
Cassandra multi-datacenter operations essentials
PDF
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
PPTX
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
PPTX
Cassandra Operations at Netflix
KEY
Multi Data Center Strategies
PDF
Ficstar Software: Cassandra Installation to Optimization
PDF
DataStax: Extreme Cassandra Optimization: The Sequel
GumGum: Multi-Region Cassandra in AWS
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
An Introduction to Priam
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
CrowdStrike: Real World DTCS For Operators
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
Carlos Santa María - Hiperconvergencia, el futuro del Data Center - semanainf...
NGCC 2016 - Support large partitions
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
3800 die-bonder overview
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Cassandra Summit 2015: Real World DTCS For Operators
Securing Cassandra
Cassandra multi-datacenter operations essentials
Lessons Learned from Real-World Deployments of Java EE 7 at JavaOne 2014
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Cassandra Operations at Netflix
Multi Data Center Strategies
Ficstar Software: Cassandra Installation to Optimization
DataStax: Extreme Cassandra Optimization: The Sequel
Ad

Similar to Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Split Between Heterogeneous Data Centre (20)

PPTX
Building a Just-in-Time Application Stack for Analysts
PPTX
Windsor AWS UG Virtual Private Cloud
PPTX
D108636GC10_les01.pptx
PDF
Apache Cassandra in the Real World
PDF
Hacking apache cloud stack
PPTX
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
PDF
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
PPT
Day 20.i pv6 lab
PPTX
Addressing DHCP and DNS scalability issues in OpenStack Neutron
PPTX
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
PPT
Using galera replication to create geo distributed clusters on the wan
PDF
Application Development with Apache Cassandra as a Service
PPTX
AWS Best Practices Version 2
PDF
Using galera replication to create geo distributed clusters on the wan
PDF
Using galera replication to create geo distributed clusters on the wan
PPTX
Apache cassandra
ODP
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
PPTX
Cassandra
PPTX
BigData Developers MeetUp
PPTX
Cassandra - A Basic Introduction Guide
Building a Just-in-Time Application Stack for Analysts
Windsor AWS UG Virtual Private Cloud
D108636GC10_les01.pptx
Apache Cassandra in the Real World
Hacking apache cloud stack
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
Day 20.i pv6 lab
Addressing DHCP and DNS scalability issues in OpenStack Neutron
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
Using galera replication to create geo distributed clusters on the wan
Application Development with Apache Cassandra as a Service
AWS Best Practices Version 2
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Apache cassandra
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Cassandra
BigData Developers MeetUp
Cassandra - A Basic Introduction Guide
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
PPTX
Introduction to DataStax Enterprise Graph Database
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
PPTX
Cassandra on Docker @ Walmart Labs
PDF
Cassandra 3.0 Data Modeling
PPTX
Cassandra Adoption on Cisco UCS & Open stack
PDF
Data Modeling for Apache Cassandra
PDF
Coursera Cassandra Driver
PDF
Production Ready Cassandra
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
Standing Up Your First Cluster
PDF
Real Time Analytics with Dse
PDF
Introduction to Data Modeling with Apache Cassandra
PDF
Cassandra Core Concepts
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
Bad Habits Die Hard
PDF
Advanced Data Modeling with Apache Cassandra
PDF
Advanced Cassandra
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Cassandra on Docker @ Walmart Labs
Cassandra 3.0 Data Modeling
Cassandra Adoption on Cisco UCS & Open stack
Data Modeling for Apache Cassandra
Coursera Cassandra Driver
Production Ready Cassandra
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 2
Standing Up Your First Cluster
Real Time Analytics with Dse
Introduction to Data Modeling with Apache Cassandra
Cassandra Core Concepts
Enabling Search in your Cassandra Application with DataStax Enterprise
Bad Habits Die Hard
Advanced Data Modeling with Apache Cassandra
Advanced Cassandra

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Encapsulation theory and applications.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars

Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Split Between Heterogeneous Data Centre

  • 1. Novel Multi-region Clusters Cassandra Deployments Split Between Heterogeneous Data Centres with NAT & DNS-SD #CassandraSummit
  • 2. Adam Zegelin Co-founder & VP of Engineering www.instaclustr.com! adam@instaclustr.com @adamzegelin
  • 3. Instaclustr • Instaclustr provides Cassandra-as-a-service in the cloud (Currently only on AWS — Google Cloud in private beta) • We currently manage 50+ Cassandra nodes for various customers • We often get requests to do cool things — and try and make it happen!
  • 4. Multi-DC @ Instaclustr • Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud • Works out-of-the-box today. • Requires per-node public IP • Private network clusters ⇄ Cloud clusters • Easy if your private network allocates per-node public IP addresses • VPNs • Something else?
  • 5. • Overview of multi- region/data centre clusters • What is supported out-of-the-box • Alternative solutions • Supporting technology overview (NAT/PAT and DNS-SD) • Implementation
  • 6. Single Node • What you get from running apt-get install cassandra and /usr/bin/ cassandra • Fragile (no redundancy) • Dev/test/sandbox only C*
  • 7. Multi-node, Single Data Centre • Two or more servers running Cassandra within one DC • Replication of data (redundancy) • Increased capacity (storage + throughput) • Baseline for production clusters C* C* C*
  • 8. Multi-node, Multi-DC • Cassandra running in two or more data centres • Global deployments • Data near your customers (reduced latency) • Supported out-of-the-box C* C* C* C* C* C* C* C* C*
  • 9. Snitches • Understands data centres and racks • Implementation may automatically determine node DC and rack (EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads a .properties file) • Node DC and rack is advertised via Gossip • Determine node proximity (estimated link latency) • Cluster may use a combination of Snitch implementations
  • 10. Data Centres • Collection of Racks • Complete replications • Geographically separate • Possibly high-latency interconnects (e.g. East Coast US → Sydney, ~300ms round-trip)
  • 11. Racks • Collection of nodes • May fail as a single unit • Modelled on the traditional DC rack/cage (n-servers running of a UPS)
  • 12. ☁️ • Amazon Web Services (use EC2MultiRegionSnitch) • Data Centre ≡ AWS Region (e.g. US_East_1, AP_SOUTHEAST_2) • Rack ≡ Availability Zone (e.g. us-east-1a, ap-southeast-2b) • Google Cloud Platform (no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!) • Data Centre ≡ GCP Region (e.g. US, Europe) • Rack ≡ Zone (e.g. us-central1-a, europe-west1-a)
  • 13. Data Centre Aware • Cassandra is data centre aware • Only fetch data from a remote DC if absolutely required (remote data is more “expensive”) ! • Clients can be made data centre aware • If your app knows its DC, client will talk to the closest DC
  • 14. Cluster cluster = Cluster.builder() .addContactPoint(…) .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) .build();
  • 15. Multi DC Support • Per-node public (internet-facing) IP address • Optionally, per-node private IP address • Per-node public address is used for inter-data centre connectivity • Per node private address is used for intra-data centre connectivity
  • 16. Multi DC Support • Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional • Easy to setup per-node public and private addresses • Private network clusters ⇄ Cloud clusters • Private networks: 푛 public addresses, shared by 푥 private addresses. Not 1 ↔ 1 (where often 푥 > 푛) • done via Network Address Translation
  • 17. IPv4 Address Space Exhaustion Source: http://www.potaroo.net/tools/ipv4/
  • 18. Multi-DC Support • IPv4 • Address exhaustion • Over time, will become more expensive to purchase addresses • Wasteful (being a good internet citizen)
  • 19. Alternatives • IPv6 • Java supports it ∴ Cassandra probably supports it (untested by us) • Global IPv6 adoption is ~4% (according to Google — google.com/intl/en/ipv6/statistics.html) • IPv6/IPv4 hybrid (Teredo, 6over4, et. al.) • AWS EC2 does not support IPv6. End of story. (Elastic Load Balancer does support IPv6)
  • 20. Alternatives • VPNs • tinc, OpenVPN, etc. • All private address space — no dual addressing • Requires multiple links — between every DC and per client • Address space overlaps between multiple VPNs • Connectivity to multiple clusters an issue (for multi-cluster apps, centralised monitoring, etc)
  • 21. Data Centres Links 3 3 5 10 7 21
  • 22. Alternatives • Network Address Translation (NAT) (aka IP Masquerading or Port Address Translation (PAT)) • Deployed on most private networks • Connectivity between private network clusters ⇄ Cloud clusters • Supports client connectivity to multiple clusters
  • 23. NAT Basics • Re-maps IP address spaces (e.g. Public 96.31.81.80 ↔ Private 192.168.*.*) • 푛 public addresses, shared by 푥 private addresses. Not 1 ↔ 1 (where often n = 1, 푥 > 푛) • Port Address Translation • Private port ↔ Public port • Outbound connections only without port forwarding or NAT traversal • Per DC gateway device — performs NAT and port forwarding
  • 24. NAT with Inbound Connections • Static port forwarding (configured on the gateway) • Automatic port forwarding — UPnP, NAT-PMP/PCP (configured by the application, e.g. Cassandra) • NAT Traversal — STUN, ICE, etc.
  • 25. NAT + C∗ Situation: 푛 Cassandra nodes, 1 public address per data centre • Port forward different public ports for each node • Advertise assigned ports • Modify Cassandra and client applications to connect to advertised ports
  • 26. Advertising Port Mappings • Extend Cassandra Gossip • Include port numbers in node address announcements • Allow seed node addresses to include port numbers • Allow multiple nodes to have identical public & private addresses (only port numbers differ per DC) • How to bootstrap? SIP? • Cassandra must be aware of the allocated ports in order to advertise • Hard if C* is not directly responsible for the port mapping (e.g. static port forwarding) • Too many modifications to internals
  • 27. Advertising Port Mappings • DNS-SD — dns-sd.org (aka Bonjour/Zeroconf) • Reads — works with existing DNS implementations (it’s just a DNS query) • Even inside restrictive networks, DNS usually works • Combination of DNS TXT, SRV and PTR records. • Updates • via DNS Update & TSIG — supported by bind • via API — e.g. for AWS Route 53
  • 28. Advertising Port Mappings • DNS-SD cont’d. • SRV records contain hostname and port (i.e., hostname of the NAT gateway and public C* port) • TXT records contain key=value pairs (useful for additional connection & config details) • Modify C* connection code to lookup foreign node port from DNS • Modify client driver connection code to lookup ports from DNS • Can be queried & updated out-of-band (updated by the NAT device or central management server which knows which ports were mapped)
  • 29. Advertised Details • Each cluster is it’s own browse domain • Each NAT gateway device has an A record in the browse domain • Each DNS-SD service is named based on the private IP address • Requires unique private IP addresses across data centres • SRV port is the C* thrift port • Additional ports are advertise via TXT
  • 30. Configuration • Cassandra is configured to only use private addresses • On cluster creation • Establish a new DNS-SD browse domain • Create A records for each gateway device • NAT gateway device is notified when a new C* node is started • Allocates random public ports for C* and configures Port Forwarding • Updates DNS-SD • New SRV and TXT record
  • 31. Output of dns-sd (Can also use avahi-browse, dig, or any other DNS query tool) $ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Browsing for _cassandra._tcp ! A/R Flags if Domain Service Type Instance Name Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2 Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4 Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3 $ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. ! 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws-us- east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) version=2.0.7 cqlport=1237 $ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. ! Non-authoritative answer: Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au Address: 54.209.123.195
  • 32. Java Driver Modifications public interface AddressTranslater { ! public InetSocketAddress translate(InetSocketAddress address); ! } • This is usually a no-op (the default is IdentityTranslater) • Modify translate() to perform a DNS-SD lookup. • The address parameter is a node private IP address. • Locate a service with a name = private IP address to determine public IP/port.
  • 33. Modifying Cassandra public class OutboundTcpConnectionPool! {! ! !! ⋮! public static Socket newSocket(InetAddress endpoint) throws IOException {…} ⋮ } • Responsible for managing Socket connections. • Modify newSocket() to perform a DNS-SD lookup. • The endpoint parameter is a node private IP address. • Locate a service with a name = private IP address to determine public IP/port
  • 34. NAT Gateway NAT Gateway C* C* C* C* C* C* DNS (+ DNS-SD) Server Client (Route 53, Self-hosted, etc) Application"