SlideShare a Scribd company logo
™
Cassandra and AWS Support on AWS/EC2
Cloudurable
Amazon
Cassandra
Support around Cassandra
and Kafka running in EC2
™
Cassandra / Kafka Support in EC2/AWS
Company
Overview How we got our start
Cassandra / Kafka Support in EC2/AWS
™
Different companies same
challenges
❖ How to setup a Cluster across multiple AZs
❖ Where does enhanced networking fit it
❖ Should we use EBS or instance storage
❖ Monitoring and logging that can be actionable
❖ Integration with AWS services like CloudFormation, and
CloudWatch.
❖ Best fit for images, VPC setup, peering, subnets,
firewalls
Cassandra / Kafka Support in EC2/AWS
™
Services we provide
❖ Cassandra Training
❖ Cassandra Consulting
❖ Setting up Cassandra in AWS/EC2
❖ AWS CloudFormations
❖ Subscription Support around Cassandra running in
AWS/EC2
❖ AWS CloudWatch monitoring
❖ AWS CloudWatch logging
™
Cloudurable Cassandra AWS Support
AWS Review Review of key Amazon
Services and features
Cassandra / Kafka Support in EC2/AWS
™
Advice and documents AWS
Cassandra
❖ There is a lot of advice on how to configure a
Cassandra cluster on AWS
❖ Not every configuration meets every use case
❖ Best way to know how to deploy Cassandra on AWS is
to know the basics of AWS
❖ We start covering AWS (as it applies to Cassandra)
❖ Later we go into detail with AWS Cassandra specifics
Cassandra / Kafka Support in EC2/AWS
™
AWS Key Concepts
❖ EC2 – compute services, virtual servers
❖ EC2 instance a virtual server running in a VPC
❖ EBS – virtual disk drives
❖ VPC – software defined networks
❖ Public Subnets – have IGW
❖ Private Subnets – no route to IGW
Cassandra / Kafka Support in EC2/AWS
™
Amazon Region and Availability
Zones
❖ AWS supports regions around the world
❖ Regions are independent of each other
❖ place services in a region to be closer to your end consumer to lower latency
and to improve reliability
❖ Availability Zone (AZ) are isolated - Multiple AZs live in a region
❖ AZ protects against outage
❖ Placing your services and application in different Azs
❖ AZs have independent power, backup generators, UPS units, etc.
❖ AZs if possible exists in a separate location of a metropolitan area
❖ AZs are redundantly connected together with fast connections that deliver low-
latency using multiple tier-1 transit providers.
™
Cloudurable Cassandra AWS Support
EC2 Compute EC2, EC2 instances, Instance
types, networking speed
Amazon EC2
Cassandra / Kafka Support in EC2/AWS
™
EC2 Compute
❖ Resizable compute capacity in the cloud
❖ Compute: computational power needed for your use case
❖ Add compute resources as needed (IaaS)
❖ EC2 allows you to launch instances
❖ instance is a server
❖ install whatever software you need: NGINX, Apache httpd, Cassandra, Kafka,
etc.
❖ Pay for compute power that you use
❖ Different instance types with various ranges of CPU, RAM, IO, and networking
power
❖ Pay for compute resources by hour or longer
Amazon EC2
Cassandra / Kafka Support in EC2/AWS
™
EC2 Instance Types
❖ Defines the size/power of virtual server
❖ Many types of EC2 instances - Families of instance type
❖ Virtual CPUs (vCPUs)
❖ vCPU is a hyperthread of an Intel Xeon core for M4,
M3, C4, C3, R3, HS1, G2, I2, and D2.
❖ Memory RAM (size and type)
❖ Network performance
Amazon EC2
Cassandra / Kafka Support in EC2/AWS
™
Families of types – Part 1
❖ T2 - inexpensive and burst-able (good for less expensive and
more sporadic workloads)
❖ M4 - new generation of general purpose instances (added
clustering and placement groups to M3)
❖ C4 - compute optimized like M4 but less memory and more
vCPUs (use this if you are not using all of your M4 memory)
❖ P2 - GPU intensive applications (Machine learning)
❖ G2 - graphics-intensive applications (server-side graphic
workloads)
What are the two most likely of these families that
you would use with Cassandra?
Amazon EC2
Cassandra / Kafka Support in EC2/AWS
™
Families of types – Part 2
❖ X1 - memory optimized for in-memory computing (SAP,
HANA)
❖ R3 - memory intensive databases and distributed
caches (MongoDB)
❖ I2 - High IOPS at lower cost, SSD instance storage
(MongoDB, RDBMS)
❖ D2 - High IO throughput and large disks at lower cost,
magnetic instance storage (MapReduce, Kafka)
What are the two most likely of these families that
you would use with Cassandra? Why?
Amazon EC2
™
Cloudurable Cassandra AWS Support
Amazon Elastic
Block Storage
Virtual volumes
SSD
Magnetic
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
Elastic Block Storage (EBS)
❖ Amazon Web Services (AWS) provides Amazon
Elastic Block Store (Amazon EBS) for EC2 instance
storage
❖ EBS virtual hard drives and SSDs for your virtual
servers (EC2 instances)
❖ EBS volumes are automatically replicated in same AZ
❖ Easy to take snapshots of volumes (back up)
❖ Advantages: reliability, snapshotting, resizing
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
EBS Volume types
❖ Four EBS volume types
❖ Two types of Hard Disk Drives (HDD) (Magnetic)
❖ Two types of SSDs
❖ Volumes differ in price and performance
❖ EC2 instance can have many EBS volumes attached
❖ EBS volume can only be attached to one EC2 instance
at a time
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
Magnetic Volumes - HDD
❖  Magnetic volumes can’t be used as a boot volume.
❖  Lowest performance for random access
❖  least cost per gigabyte
❖  highest throughput (500 MB/s) for sequential access
❖  Magnetic volumes average 100 IOPS, but can burst to hundreds of IOPS.
❖  Good for services like Kafka which writes to a transaction log in long streams,
❖  Good for databases which use log structured storage or log structured merge
tree
❖ LevelDB, RocksDB, Cassandra
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
Good use cases for Magnetic
Volumes
❖ streaming workloads which require cost effective, fast,
consistent I/O
❖ big data
❖ data warehouses
❖ log processing
❖ Databases which use structured merge tree
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
Two types of Magnetic
Volumes
❖ st1 - Throughput Optimized HDD
❖ sc1 - Cold HDD and most cost effective
Which would be better for a Cassandra production system
with low reads but large rows with frequent writes?
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
General-Purpose SSD (gp2)
1
❖ Cost effective, and useful for many workloads.
❖ Minivan of EBS
❖ Performance of 3 IOPS per gigabyte provisioned
❖ 250 GB volume you can expect a baseline of 750 IOPS
❖ Peak capped @ 10,000 IOPS
❖ Sizes range from 1 GB to 16 TB
❖ Use Cases: Databases that use some form of BTrees (MongoDB,
MySQL, etc.).
❖ Geared to a lower volume database or one that has peak load times
but long periods at rest where IOPS credits can accumulate
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
General-Purpose SSD (gp2)
2
❖ Can be used for boot volumes
❖ Under 1 TB these volumes burst to 3,000 IOPS for extended periods of time
❖ Unused IOPS get accumulated as IOPS credits which can be used with bursting
❖ IOPS credits is like a savings account
❖ As you are using it, the bank account is being withdrawn from
• Use Case
• A server than does periodic batch or cron jobs
• Low-latency interactive apps
• Medium-sized databases
Could you use this for Cassandra? If your Cassandra Cluster had 12 nodes
And you got max 12,000 reads per second across the cluster and max
120,000 writes per second what size gp2 would work per node assuming
the cluster grows 2 TB per year?
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
Bursting
Credit Amazon Documentation for both images
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
Provisioned IO (io1)
❖ For I/O intensive workloads
❖ Most expensive EBS option
❖ IOPS up to 20,000 - you can purchase IOPs
❖ Use Cases
❖ Mission critical business applications that require
sustained IOPS performance
❖ Databases with large, high-volume workloads
❖ For developers bad at math
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
EBS Type Review
How can Cassandra use HDD and get 1,000 IOPs? 3 ways
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
NOT EBS: Instance storage
❖ Don’t forget you don’t have to use EBS
❖ Instances storage is faster than EBS
❖ EC2 instance types with instance storage are expensive
❖ No server area network (SAN) or IO over network
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
EBS Optimized
❖ Newer EC2 instance types support EBS Optimized
❖ Higher throughput
❖ Less jiggle
❖ More reliable
❖ Don’t use C3 or M3 use C4 and M4
❖ Uses Optimized by default
❖ New Feature added in Feb 2017: Elastic Volumes!
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
EBS: Don’t just guess
measure
❖ Make educated guess to pick the right EBS based on workload
❖ Deploying Kafka or Cassandra or MongoDB then you must
understand how to configure the tool
❖ Smaller nodes but more of them, or less nodes with larger
EBS volumes
❖ JBOD, RAID 1, etc.
❖ Use Amazon CloudWatch
❖ watch IOPs and IO throughput
❖ while load testing or watching production workloads
❖ quickest and best way to pick best EBS volume type
Amazon EBS
Cassandra / Kafka Support in EC2/AWS
™
Snapshots - EBS backups
❖ Data safety with EBS - Backup/Recovery (Snapshots)
❖ Amazon EBS allows you to easily backup data by taking
snapshots
❖ Snapshots are point-in-time backups
❖ Snapshots provide incremental backups of your data
❖ Snapshots just saves the blocks that have changed
❖ Only changed blocks since last snapshot saved in new
snapshot
❖ Only last snapshot needed to restore the volume
EBS Snapshots
A Cassandra node goes down, and its EBS volume is corrupt
and you have snapshots for this volume. Would it be faster to spin up an new
instance with a volume created from the last snapshot or just let Cassandra repopulate the node?
Cassandra / Kafka Support in EC2/AWS
™
Taking Snapshots
❖ Snapshots are done with:
❖ AWS Management Console
❖ Scheduled snapshots
❖ AWS API - AWS CLI
❖ snapshots backed by S3 but you can’t see them
❖ Snapshots are stored per region
❖ Use snapshots to create new EBS volumes
❖ Snapshots can be copied to other regions
EBS Snapshots
Cassandra / Kafka Support in EC2/AWS
™
Best Practices for Snapshots
❖ Test the process of recovering your instances from
snapshots if the Amazon EBS volumes fail
❖ Use separate volumes for the operating system versus
your data
❖ Make sure that the data persists after instance
termination
❖ Don’t use instance store for database storage, unless
you are using replication
EBS Snapshots
You wrote a Chef or Ansible script to update the JDK and Cassandra.
Should you perform a snapshot before you run this?
™
Cloudurable Cassandra AWS Support
VPC Software defined networking
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
VPC: 1 public and two private
subnets
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Amazon VPC
❖ Software defined networking
❖ Virtual private cloud
❖ Multiple VPCs can live in a AWS region
❖ VPC can span multiple availability zones
❖ Isolated area to deploy Amazon EC2 instances
❖ Associated with a CIDR block
❖ DHCP Options
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CIDR Block
❖ /# denotes the size of the network
❖ how many bits of the address will be used for the network
❖ Example: 10.10.1.32/27 denotes a CIDR range (also known as CIDR block).
❖ First 27 bits of address is for the network (32 bits total)
❖ 32 - 27 leaves five bits for your servers. 00000-11111
❖ First five addresses are reserved in a subnet, and the last address is reserved for broadcast
❖ Example leaves us 26 addresses for our servers (10.10.1.37 to 10.10.1.61)
❖ VPC address range may be as large as /16 (32-16 = 16 bits which allows for 65,536 available
addresses)
❖ or as small as 16 addresses (/28 is 32 - 28 = 4 bits which is 16 available addresses)
❖ Addresses of two VPC should not overlap if you plan on adding VPC peering.
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CIDR Block Diagram
Amazon VPC
Source WikiPedia
Cassandra / Kafka Support in EC2/AWS
™
Components of VPC
❖ Made up of subnets, route tables, DHCP option sets,
security groups, and Network ACLs.
❖ Can also have Internet Gateways (IGWs), Virtual
Private Gateways (VPGs), Elastic IP (EIP) addresses,
Elastic Network Interfaces (ENIs), Endpoints, Peering,
and NAT gateways
❖ A VPC has a router defined by its route tables
❖ per subnet and default
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation for VPC
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
VPC Subnets
❖ Part of an VPC’s IP address range
❖ Has CIDR blocks
❖ Associated with availability zones
❖ Can be public or private
❖ Private subnet has no route from the IGW (Internet
Gateway)
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation VPC Subnet
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Internet Gateway
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Internet Gateway
❖ Internet Gateway (IGW) enables inbound traffic from the public
Internet to your VPC
❖ Public subnets have route tables that target IGW
❖ IGW does network address translation from public IPs of EC2
instances to their private IP
❖ EC2 instance send IP traffic from a public subnet, the IGW acts as
the NAT for public subnet,
❖ translates the reply address to the EC2 instance’s public IP
(EIP)
❖ IGW keep track of the mappings of EC2 instances private IP
address and their public IP address
❖ Highly available and handles the horizontal scale, redundancy as
needed
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Route Tables
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Subnet Route Tables
❖ contain set of ingress and egress rules (aka routes)
❖ rules are applied to subnet
❖ connect subnets within a VPC so they can communicate
❖ routes direct network traffic
❖ routes are specified by CIDR and a target
❖ most specific route that matches traffic determines traffic route
❖ if subnet has route to the InternetGateway then public
❖ Each subnet associated with a route table (default route table)
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CF: Route from Pub Subnet to
IGW
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
VPC VPN Access via VGW and
CGW
❖ AWS to augment your existing IT infrastructure via VPN
❖ Connect existing datacenter to VPC using VPG (Virtual Private Gateways)
and CGW (Customer Gateways)
❖ VGW like the IGW but it sends traffic to/fro your corporate network instead of the public
Internet
❖ VPGs connect to your companies - VPG is the Amazon side of the VPN connection
❖ CGW is the customer side of the VPN connector
❖ CGWs are processes running on a server or network device.
❖ Connect a VPG and a CGW with a VPN tunnel
❖ Uses the IPSec to connect VPC to corporate network
❖ Use dynamic routing or static routes
Which subnets in a given VPC would have access
to the corporate internet connected via the VPN?
Cassandra / Kafka Support in EC2/AWS
™
NAT Gateway and EIP
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Elastic IP (EIP)
❖ AWS pool of public IP addresses - Available to rent per region
❖ Check out EIPs to use and assign - Allows you to keep same Public IP
❖ Example: Assign an EIP to an instance (and only one)
❖ Spin up a new upgraded version of the instance from a snapshot or
with Ansible, Chef, etc.
❖ Reassign the EIP to the new upgraded instance.
❖ Allow public IPs to be reassigned to new underlying infrastructure
❖ Allocated in a VPC, can be moved to another same region VPC
❖ Assigned to resources like EC2 instances, Nat Gateways, etc.
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Nat Gateways
❖ Needed so Amazon EC2 instances launched in a private subnet cannot access the
Internet
❖ NAT is a network address translator
❖ Why? yum install foo, you could not do it because instance by default have no route to
the public Internet.
❖ Similar to IGW but unlike IGWs they do not allow incoming traffic
❖ Only allow responses to outgoing traffic from your Amazon EC2 instances
❖ To maximize failover you will want to deploy a NAT gateway per AZ
❖ To setup
❖ Set up the route table by connecting private subnet to direct Internet traffic to
the NAT gateway
❖ Associate the NAT gateway with an EIP (covered shortly - elastic IP)
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation for NAT GW
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Placement groups per AZ
❖ Amazon Enhanced networking by using
❖ Placement groups
❖ Instance types m4, c4, p2, g2, r3, g2, x1, i2 and d2
support enhanced networking/placement groups
❖ Essential for high-speed server to server performance
which is important for clustering
❖ To achieve maximum throughput, placement groups
must be placed in the same AZ – 10Gbits
Why would this be important for Cassandra? Other systems?
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Elastic Network Interface ENI
❖ ENI is a virtual network interface - network interface in AWS speak
❖ Can attach to EC2 instance in a VPC - detach an ENI and attach to another
EC2 instance
❖ Attributes : description, primary private IPv4 address, multiple secondary private
IPv4 addresses, EIP per private address, public IPv4 address, multiple IPv6
addresses, multiple security groups (at least one), MAC address,
source/destination check flag
❖ Keeps its attributes no matter which EC2 instance it is attached to
❖ If an underlying instance fails, the IP address (MAC, public IP, EIPs, etc.) are
preserved
❖ Makes EC2 instances replaceable - low-budget, high-available solutions
What special Cassandra nodes might benefit from using an ENI
to keep their private IP constant even if instance goes down?
How are ENIs different than EIPs? How are they similar?
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Security Groups
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Security Groups (SG)
❖ stateful firewall - controls inbound and outbound network traffic
to EC2 instances and AWS resources
❖ Stateful means an Amazon instance (or resource) is allowed to
respond to an inbound traffic with outbound traffic
❖ EC2 instances have to be associated with a security group
❖ Rules are only allow rules
❖ Rules consist of the following attributes:
❖ Source (CIDR or SG id)
❖ Protocol (TCP, ICMP, UDP, HTTP, HTTPS, SSH, etc.)
❖ Port range (8000-8080)
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation SG Bastion
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation SG
Cassandra
The above allows all traffic from the VPC’s
CIDR to access this box.
-1 means all ports.
Private Subnet
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation SG
Cassandra
What is different about this SG for Cassandra than the last one?
How could we narrow which EC2 instances access Cassandra nodes?
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation: NACL
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Network ACL (NACL)
❖ Amazon Network Access Control List (NACL)
❖ Stateless firewall
❖ Provides a number ordered list of rules
❖ Lowest number rule evaluated first
❖ First rule that allows or denies wins
❖ Has both allow rules and deny rules
❖ Return traffic must be allowed (stateless)
❖ Applies to the whole subnet
How does this compare to Security Groups?
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
CloudFormation for NACL
Amazon VPC
™
Cloudurable Cassandra AWS Support
Related important
AWS Concepts
Important AWS concepts
helpful for DevOps of
clustered software
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
AWS Important Concepts
❖ Auto Scaling
❖ used scale Amazon EC2 capacity up or down automatically
❖ autoscale a group of instances based on workload
❖ used to recover when instances go down by automatically spinning
up an instance to take its place
❖ Amazon Route 53
❖ DNS as a service. Route 53 is highly available and scalable
❖ Easily assign DNS names instead of configuring with public IP
addresses (internal and external)
❖ Amazon CloudFormation: allows developers, DevOps, and Ops create
and manage a collection of related AWS resources
Could you use Route53 instead ENI for seed servers?
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Aws CMD: CloudFormation
Route53
Create a DNS name for a public IP address from an EC2 instance
Create a new CloudFormation stack (like JSON file from earlier)
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
More important AWS
concepts
❖ IAM
❖ AWS Identity and Access Management (IAM) enables secure control access to AWS Cloud
services and resources for their users
❖ Defines IAM defines, users, roles, and allows you to apply this to EC2 instances as well as users
or groups of users (example in notes)
❖ KMS
❖ AWS Key Management Service (KMS) allows you to create and control the encryption keys
❖ Uses Hardware Security Modules (HSMs) to protect the security of your keys
❖ Used to encrypt Amazon EBS volumes, Amazon S3 buckets and other services.
❖ S3
❖ Amazon Simple Storage Service to store your backups and big data.
❖ Good for backups of Cassandra snapshots
Amazon VPC
Cassandra / Kafka Support in EC2/AWS
™
Amazon CloudWatch
❖ Monitoring service it uses for its AWS Cloud resources and services
❖ Can be used for your services and applications
❖ Track key performance indicators (KPIs) and metrics
❖ Log aggregation, and can easily create alarms.
❖ Trigger AWS Lambda functions based on limits of an KPI or how often an item shows up
in log stream in a give period of time
❖ Provides system-wide visibility into resource utilization, and operational health
❖ Not passive – Action oriented
❖ Integration with the entire Amazon ecosystem **integration!
❖ Actionable: Triggers and events to keep everything running smoothly
Amazon VPC

More Related Content

PPTX
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
KEY
ElephantDB
PDF
LINE LIVE のチャットが
30,000+/min のコメント投稿を捌くようになるまで
PDF
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
PPTX
Less presentation
PDF
AWS EMR Cost optimization
PDF
HBase Advanced - Lars George
PDF
Getting Started with Confluent Schema Registry
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
ElephantDB
LINE LIVE のチャットが
30,000+/min のコメント投稿を捌くようになるまで
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Less presentation
AWS EMR Cost optimization
HBase Advanced - Lars George
Getting Started with Confluent Schema Registry

What's hot (20)

PDF
Kafka Streams: What it is, and how to use it?
PDF
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PPTX
SOAP To REST API Proxy
PDF
Making Apache Spark Better with Delta Lake
PPTX
Exactly-once Stream Processing with Kafka Streams
PDF
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
201809 DB tech showcase
PDF
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
PDF
Kafka internals
PDF
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
PPTX
Kafka Tutorial: Kafka Security
PDF
Introduction to WebSockets
PDF
Kafka Streams State Stores Being Persistent
PDF
Cassandra at eBay - Cassandra Summit 2012
PPTX
Jvm tuning for low latency application & Cassandra
PPTX
Azure Data Explorer deep dive - review 04.2020
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Kafka Streams: What it is, and how to use it?
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
APACHE KAFKA / Kafka Connect / Kafka Streams
SOAP To REST API Proxy
Making Apache Spark Better with Delta Lake
Exactly-once Stream Processing with Kafka Streams
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Apache Iceberg - A Table Format for Hige Analytic Datasets
201809 DB tech showcase
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Kafka internals
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Kafka Tutorial: Kafka Security
Introduction to WebSockets
Kafka Streams State Stores Being Persistent
Cassandra at eBay - Cassandra Summit 2012
Jvm tuning for low latency application & Cassandra
Azure Data Explorer deep dive - review 04.2020
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Ad

Similar to Amazon AWS basics needed to run a Cassandra Cluster in AWS (20)

PPTX
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
PPTX
Using Windows Storage Spaces and iSCSI on Amazon EBS
PPTX
Running Cassandra on Amazon EC2
PPT
StartPad Countdown 8 - Amazon Web Services and You
PDF
Introduction to AWS Services: Compute, Storage,_Databases
PDF
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
PDF
Artem Zhurbila - 2 aws - EC2
PDF
AWSomeDayOnline Q322_2. Introduction to AWS Services Compute, Storage, Databa...
PPT
A Step By Step Guide To Put DB2 On Amazon Cloud
PPTX
2020 re:Cap
PDF
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PPTX
Randall's re:Invent Recap
PPTX
Diveinto AWS
PDF
Running BSD on AWS
PDF
AWS PPT.pdfcustom work done by the team fit t
PPTX
Re:Invent 2019 Recap. AWS User Groups in Spain. Javier Ramirez
PPTX
AWS SSA Webinar - Cost optimisation on AWS
PPTX
Popular Cloud Services- in cloud computing.pptx
PDF
AWS Re:Invent 2019 Re:Cap
PPTX
Aws storage
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
Using Windows Storage Spaces and iSCSI on Amazon EBS
Running Cassandra on Amazon EC2
StartPad Countdown 8 - Amazon Web Services and You
Introduction to AWS Services: Compute, Storage,_Databases
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
Artem Zhurbila - 2 aws - EC2
AWSomeDayOnline Q322_2. Introduction to AWS Services Compute, Storage, Databa...
A Step By Step Guide To Put DB2 On Amazon Cloud
2020 re:Cap
PostgreSQL on AWS: Tips & Tricks (and horror stories)
Randall's re:Invent Recap
Diveinto AWS
Running BSD on AWS
AWS PPT.pdfcustom work done by the team fit t
Re:Invent 2019 Recap. AWS User Groups in Spain. Javier Ramirez
AWS SSA Webinar - Cost optimisation on AWS
Popular Cloud Services- in cloud computing.pptx
AWS Re:Invent 2019 Re:Cap
Aws storage
Ad

More from Jean-Paul Azar (14)

PPTX
Kafka Tutorial - DevOps, Admin and Ops
PPTX
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
PPTX
Kafka Tutorial Advanced Kafka Consumers
PPTX
Kafka Tutorial: Advanced Producers
PPTX
Kafka Tutorial: Streaming Data Architecture
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 2)
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
Kafka Tutorial, Kafka ecosystem with clustering examples
PPTX
Kafka and Avro with Confluent Schema Registry
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
PPTX
Kafka Tutorial - basics of the Kafka streaming platform
PPTX
Kafka Intro With Simple Java Producer Consumers
PPTX
Brief introduction to Kafka Streaming Platform
Kafka Tutorial - DevOps, Admin and Ops
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Streaming Data Architecture
Kafka Tutorial - Introduction to Apache Kafka (Part 2)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka and Avro with Confluent Schema Registry
Avro Tutorial - Records with Schema for Kafka and Hadoop
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Intro With Simple Java Producer Consumers
Brief introduction to Kafka Streaming Platform

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Approach and Philosophy of On baking technology

Amazon AWS basics needed to run a Cassandra Cluster in AWS

  • 1. ™ Cassandra and AWS Support on AWS/EC2 Cloudurable Amazon Cassandra Support around Cassandra and Kafka running in EC2
  • 2. ™ Cassandra / Kafka Support in EC2/AWS Company Overview How we got our start
  • 3. Cassandra / Kafka Support in EC2/AWS ™ Different companies same challenges ❖ How to setup a Cluster across multiple AZs ❖ Where does enhanced networking fit it ❖ Should we use EBS or instance storage ❖ Monitoring and logging that can be actionable ❖ Integration with AWS services like CloudFormation, and CloudWatch. ❖ Best fit for images, VPC setup, peering, subnets, firewalls
  • 4. Cassandra / Kafka Support in EC2/AWS ™ Services we provide ❖ Cassandra Training ❖ Cassandra Consulting ❖ Setting up Cassandra in AWS/EC2 ❖ AWS CloudFormations ❖ Subscription Support around Cassandra running in AWS/EC2 ❖ AWS CloudWatch monitoring ❖ AWS CloudWatch logging
  • 5. ™ Cloudurable Cassandra AWS Support AWS Review Review of key Amazon Services and features
  • 6. Cassandra / Kafka Support in EC2/AWS ™ Advice and documents AWS Cassandra ❖ There is a lot of advice on how to configure a Cassandra cluster on AWS ❖ Not every configuration meets every use case ❖ Best way to know how to deploy Cassandra on AWS is to know the basics of AWS ❖ We start covering AWS (as it applies to Cassandra) ❖ Later we go into detail with AWS Cassandra specifics
  • 7. Cassandra / Kafka Support in EC2/AWS ™ AWS Key Concepts ❖ EC2 – compute services, virtual servers ❖ EC2 instance a virtual server running in a VPC ❖ EBS – virtual disk drives ❖ VPC – software defined networks ❖ Public Subnets – have IGW ❖ Private Subnets – no route to IGW
  • 8. Cassandra / Kafka Support in EC2/AWS ™ Amazon Region and Availability Zones ❖ AWS supports regions around the world ❖ Regions are independent of each other ❖ place services in a region to be closer to your end consumer to lower latency and to improve reliability ❖ Availability Zone (AZ) are isolated - Multiple AZs live in a region ❖ AZ protects against outage ❖ Placing your services and application in different Azs ❖ AZs have independent power, backup generators, UPS units, etc. ❖ AZs if possible exists in a separate location of a metropolitan area ❖ AZs are redundantly connected together with fast connections that deliver low- latency using multiple tier-1 transit providers.
  • 9. ™ Cloudurable Cassandra AWS Support EC2 Compute EC2, EC2 instances, Instance types, networking speed Amazon EC2
  • 10. Cassandra / Kafka Support in EC2/AWS ™ EC2 Compute ❖ Resizable compute capacity in the cloud ❖ Compute: computational power needed for your use case ❖ Add compute resources as needed (IaaS) ❖ EC2 allows you to launch instances ❖ instance is a server ❖ install whatever software you need: NGINX, Apache httpd, Cassandra, Kafka, etc. ❖ Pay for compute power that you use ❖ Different instance types with various ranges of CPU, RAM, IO, and networking power ❖ Pay for compute resources by hour or longer Amazon EC2
  • 11. Cassandra / Kafka Support in EC2/AWS ™ EC2 Instance Types ❖ Defines the size/power of virtual server ❖ Many types of EC2 instances - Families of instance type ❖ Virtual CPUs (vCPUs) ❖ vCPU is a hyperthread of an Intel Xeon core for M4, M3, C4, C3, R3, HS1, G2, I2, and D2. ❖ Memory RAM (size and type) ❖ Network performance Amazon EC2
  • 12. Cassandra / Kafka Support in EC2/AWS ™ Families of types – Part 1 ❖ T2 - inexpensive and burst-able (good for less expensive and more sporadic workloads) ❖ M4 - new generation of general purpose instances (added clustering and placement groups to M3) ❖ C4 - compute optimized like M4 but less memory and more vCPUs (use this if you are not using all of your M4 memory) ❖ P2 - GPU intensive applications (Machine learning) ❖ G2 - graphics-intensive applications (server-side graphic workloads) What are the two most likely of these families that you would use with Cassandra? Amazon EC2
  • 13. Cassandra / Kafka Support in EC2/AWS ™ Families of types – Part 2 ❖ X1 - memory optimized for in-memory computing (SAP, HANA) ❖ R3 - memory intensive databases and distributed caches (MongoDB) ❖ I2 - High IOPS at lower cost, SSD instance storage (MongoDB, RDBMS) ❖ D2 - High IO throughput and large disks at lower cost, magnetic instance storage (MapReduce, Kafka) What are the two most likely of these families that you would use with Cassandra? Why? Amazon EC2
  • 14. ™ Cloudurable Cassandra AWS Support Amazon Elastic Block Storage Virtual volumes SSD Magnetic Amazon EBS
  • 15. Cassandra / Kafka Support in EC2/AWS ™ Elastic Block Storage (EBS) ❖ Amazon Web Services (AWS) provides Amazon Elastic Block Store (Amazon EBS) for EC2 instance storage ❖ EBS virtual hard drives and SSDs for your virtual servers (EC2 instances) ❖ EBS volumes are automatically replicated in same AZ ❖ Easy to take snapshots of volumes (back up) ❖ Advantages: reliability, snapshotting, resizing Amazon EBS
  • 16. Cassandra / Kafka Support in EC2/AWS ™ EBS Volume types ❖ Four EBS volume types ❖ Two types of Hard Disk Drives (HDD) (Magnetic) ❖ Two types of SSDs ❖ Volumes differ in price and performance ❖ EC2 instance can have many EBS volumes attached ❖ EBS volume can only be attached to one EC2 instance at a time Amazon EBS
  • 17. Cassandra / Kafka Support in EC2/AWS ™ Magnetic Volumes - HDD ❖  Magnetic volumes can’t be used as a boot volume. ❖  Lowest performance for random access ❖  least cost per gigabyte ❖  highest throughput (500 MB/s) for sequential access ❖  Magnetic volumes average 100 IOPS, but can burst to hundreds of IOPS. ❖  Good for services like Kafka which writes to a transaction log in long streams, ❖  Good for databases which use log structured storage or log structured merge tree ❖ LevelDB, RocksDB, Cassandra Amazon EBS
  • 18. Cassandra / Kafka Support in EC2/AWS ™ Good use cases for Magnetic Volumes ❖ streaming workloads which require cost effective, fast, consistent I/O ❖ big data ❖ data warehouses ❖ log processing ❖ Databases which use structured merge tree Amazon EBS
  • 19. Cassandra / Kafka Support in EC2/AWS ™ Two types of Magnetic Volumes ❖ st1 - Throughput Optimized HDD ❖ sc1 - Cold HDD and most cost effective Which would be better for a Cassandra production system with low reads but large rows with frequent writes? Amazon EBS
  • 20. Cassandra / Kafka Support in EC2/AWS ™ General-Purpose SSD (gp2) 1 ❖ Cost effective, and useful for many workloads. ❖ Minivan of EBS ❖ Performance of 3 IOPS per gigabyte provisioned ❖ 250 GB volume you can expect a baseline of 750 IOPS ❖ Peak capped @ 10,000 IOPS ❖ Sizes range from 1 GB to 16 TB ❖ Use Cases: Databases that use some form of BTrees (MongoDB, MySQL, etc.). ❖ Geared to a lower volume database or one that has peak load times but long periods at rest where IOPS credits can accumulate Amazon EBS
  • 21. Cassandra / Kafka Support in EC2/AWS ™ General-Purpose SSD (gp2) 2 ❖ Can be used for boot volumes ❖ Under 1 TB these volumes burst to 3,000 IOPS for extended periods of time ❖ Unused IOPS get accumulated as IOPS credits which can be used with bursting ❖ IOPS credits is like a savings account ❖ As you are using it, the bank account is being withdrawn from • Use Case • A server than does periodic batch or cron jobs • Low-latency interactive apps • Medium-sized databases Could you use this for Cassandra? If your Cassandra Cluster had 12 nodes And you got max 12,000 reads per second across the cluster and max 120,000 writes per second what size gp2 would work per node assuming the cluster grows 2 TB per year? Amazon EBS
  • 22. Cassandra / Kafka Support in EC2/AWS ™ Bursting Credit Amazon Documentation for both images Amazon EBS
  • 23. Cassandra / Kafka Support in EC2/AWS ™ Provisioned IO (io1) ❖ For I/O intensive workloads ❖ Most expensive EBS option ❖ IOPS up to 20,000 - you can purchase IOPs ❖ Use Cases ❖ Mission critical business applications that require sustained IOPS performance ❖ Databases with large, high-volume workloads ❖ For developers bad at math Amazon EBS
  • 24. Cassandra / Kafka Support in EC2/AWS ™ EBS Type Review How can Cassandra use HDD and get 1,000 IOPs? 3 ways Amazon EBS
  • 25. Cassandra / Kafka Support in EC2/AWS ™ NOT EBS: Instance storage ❖ Don’t forget you don’t have to use EBS ❖ Instances storage is faster than EBS ❖ EC2 instance types with instance storage are expensive ❖ No server area network (SAN) or IO over network Amazon EBS
  • 26. Cassandra / Kafka Support in EC2/AWS ™ EBS Optimized ❖ Newer EC2 instance types support EBS Optimized ❖ Higher throughput ❖ Less jiggle ❖ More reliable ❖ Don’t use C3 or M3 use C4 and M4 ❖ Uses Optimized by default ❖ New Feature added in Feb 2017: Elastic Volumes! Amazon EBS
  • 27. Cassandra / Kafka Support in EC2/AWS ™ EBS: Don’t just guess measure ❖ Make educated guess to pick the right EBS based on workload ❖ Deploying Kafka or Cassandra or MongoDB then you must understand how to configure the tool ❖ Smaller nodes but more of them, or less nodes with larger EBS volumes ❖ JBOD, RAID 1, etc. ❖ Use Amazon CloudWatch ❖ watch IOPs and IO throughput ❖ while load testing or watching production workloads ❖ quickest and best way to pick best EBS volume type Amazon EBS
  • 28. Cassandra / Kafka Support in EC2/AWS ™ Snapshots - EBS backups ❖ Data safety with EBS - Backup/Recovery (Snapshots) ❖ Amazon EBS allows you to easily backup data by taking snapshots ❖ Snapshots are point-in-time backups ❖ Snapshots provide incremental backups of your data ❖ Snapshots just saves the blocks that have changed ❖ Only changed blocks since last snapshot saved in new snapshot ❖ Only last snapshot needed to restore the volume EBS Snapshots A Cassandra node goes down, and its EBS volume is corrupt and you have snapshots for this volume. Would it be faster to spin up an new instance with a volume created from the last snapshot or just let Cassandra repopulate the node?
  • 29. Cassandra / Kafka Support in EC2/AWS ™ Taking Snapshots ❖ Snapshots are done with: ❖ AWS Management Console ❖ Scheduled snapshots ❖ AWS API - AWS CLI ❖ snapshots backed by S3 but you can’t see them ❖ Snapshots are stored per region ❖ Use snapshots to create new EBS volumes ❖ Snapshots can be copied to other regions EBS Snapshots
  • 30. Cassandra / Kafka Support in EC2/AWS ™ Best Practices for Snapshots ❖ Test the process of recovering your instances from snapshots if the Amazon EBS volumes fail ❖ Use separate volumes for the operating system versus your data ❖ Make sure that the data persists after instance termination ❖ Don’t use instance store for database storage, unless you are using replication EBS Snapshots You wrote a Chef or Ansible script to update the JDK and Cassandra. Should you perform a snapshot before you run this?
  • 31. ™ Cloudurable Cassandra AWS Support VPC Software defined networking Amazon VPC
  • 32. Cassandra / Kafka Support in EC2/AWS ™ VPC: 1 public and two private subnets Amazon VPC
  • 33. Cassandra / Kafka Support in EC2/AWS ™ Amazon VPC ❖ Software defined networking ❖ Virtual private cloud ❖ Multiple VPCs can live in a AWS region ❖ VPC can span multiple availability zones ❖ Isolated area to deploy Amazon EC2 instances ❖ Associated with a CIDR block ❖ DHCP Options Amazon VPC
  • 34. Cassandra / Kafka Support in EC2/AWS ™ CIDR Block ❖ /# denotes the size of the network ❖ how many bits of the address will be used for the network ❖ Example: 10.10.1.32/27 denotes a CIDR range (also known as CIDR block). ❖ First 27 bits of address is for the network (32 bits total) ❖ 32 - 27 leaves five bits for your servers. 00000-11111 ❖ First five addresses are reserved in a subnet, and the last address is reserved for broadcast ❖ Example leaves us 26 addresses for our servers (10.10.1.37 to 10.10.1.61) ❖ VPC address range may be as large as /16 (32-16 = 16 bits which allows for 65,536 available addresses) ❖ or as small as 16 addresses (/28 is 32 - 28 = 4 bits which is 16 available addresses) ❖ Addresses of two VPC should not overlap if you plan on adding VPC peering. Amazon VPC
  • 35. Cassandra / Kafka Support in EC2/AWS ™ CIDR Block Diagram Amazon VPC Source WikiPedia
  • 36. Cassandra / Kafka Support in EC2/AWS ™ Components of VPC ❖ Made up of subnets, route tables, DHCP option sets, security groups, and Network ACLs. ❖ Can also have Internet Gateways (IGWs), Virtual Private Gateways (VPGs), Elastic IP (EIP) addresses, Elastic Network Interfaces (ENIs), Endpoints, Peering, and NAT gateways ❖ A VPC has a router defined by its route tables ❖ per subnet and default Amazon VPC
  • 37. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation for VPC Amazon VPC
  • 38. Cassandra / Kafka Support in EC2/AWS ™ VPC Subnets ❖ Part of an VPC’s IP address range ❖ Has CIDR blocks ❖ Associated with availability zones ❖ Can be public or private ❖ Private subnet has no route from the IGW (Internet Gateway) Amazon VPC
  • 39. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation VPC Subnet Amazon VPC
  • 40. Cassandra / Kafka Support in EC2/AWS ™ Internet Gateway Amazon VPC
  • 41. Cassandra / Kafka Support in EC2/AWS ™ Internet Gateway ❖ Internet Gateway (IGW) enables inbound traffic from the public Internet to your VPC ❖ Public subnets have route tables that target IGW ❖ IGW does network address translation from public IPs of EC2 instances to their private IP ❖ EC2 instance send IP traffic from a public subnet, the IGW acts as the NAT for public subnet, ❖ translates the reply address to the EC2 instance’s public IP (EIP) ❖ IGW keep track of the mappings of EC2 instances private IP address and their public IP address ❖ Highly available and handles the horizontal scale, redundancy as needed Amazon VPC
  • 42. Cassandra / Kafka Support in EC2/AWS ™ Route Tables Amazon VPC
  • 43. Cassandra / Kafka Support in EC2/AWS ™ Subnet Route Tables ❖ contain set of ingress and egress rules (aka routes) ❖ rules are applied to subnet ❖ connect subnets within a VPC so they can communicate ❖ routes direct network traffic ❖ routes are specified by CIDR and a target ❖ most specific route that matches traffic determines traffic route ❖ if subnet has route to the InternetGateway then public ❖ Each subnet associated with a route table (default route table) Amazon VPC
  • 44. Cassandra / Kafka Support in EC2/AWS ™ CF: Route from Pub Subnet to IGW Amazon VPC
  • 45. Cassandra / Kafka Support in EC2/AWS ™ VPC VPN Access via VGW and CGW ❖ AWS to augment your existing IT infrastructure via VPN ❖ Connect existing datacenter to VPC using VPG (Virtual Private Gateways) and CGW (Customer Gateways) ❖ VGW like the IGW but it sends traffic to/fro your corporate network instead of the public Internet ❖ VPGs connect to your companies - VPG is the Amazon side of the VPN connection ❖ CGW is the customer side of the VPN connector ❖ CGWs are processes running on a server or network device. ❖ Connect a VPG and a CGW with a VPN tunnel ❖ Uses the IPSec to connect VPC to corporate network ❖ Use dynamic routing or static routes Which subnets in a given VPC would have access to the corporate internet connected via the VPN?
  • 46. Cassandra / Kafka Support in EC2/AWS ™ NAT Gateway and EIP Amazon VPC
  • 47. Cassandra / Kafka Support in EC2/AWS ™ Elastic IP (EIP) ❖ AWS pool of public IP addresses - Available to rent per region ❖ Check out EIPs to use and assign - Allows you to keep same Public IP ❖ Example: Assign an EIP to an instance (and only one) ❖ Spin up a new upgraded version of the instance from a snapshot or with Ansible, Chef, etc. ❖ Reassign the EIP to the new upgraded instance. ❖ Allow public IPs to be reassigned to new underlying infrastructure ❖ Allocated in a VPC, can be moved to another same region VPC ❖ Assigned to resources like EC2 instances, Nat Gateways, etc. Amazon VPC
  • 48. Cassandra / Kafka Support in EC2/AWS ™ Nat Gateways ❖ Needed so Amazon EC2 instances launched in a private subnet cannot access the Internet ❖ NAT is a network address translator ❖ Why? yum install foo, you could not do it because instance by default have no route to the public Internet. ❖ Similar to IGW but unlike IGWs they do not allow incoming traffic ❖ Only allow responses to outgoing traffic from your Amazon EC2 instances ❖ To maximize failover you will want to deploy a NAT gateway per AZ ❖ To setup ❖ Set up the route table by connecting private subnet to direct Internet traffic to the NAT gateway ❖ Associate the NAT gateway with an EIP (covered shortly - elastic IP) Amazon VPC
  • 49. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation for NAT GW Amazon VPC
  • 50. Cassandra / Kafka Support in EC2/AWS ™ Placement groups per AZ ❖ Amazon Enhanced networking by using ❖ Placement groups ❖ Instance types m4, c4, p2, g2, r3, g2, x1, i2 and d2 support enhanced networking/placement groups ❖ Essential for high-speed server to server performance which is important for clustering ❖ To achieve maximum throughput, placement groups must be placed in the same AZ – 10Gbits Why would this be important for Cassandra? Other systems? Amazon VPC
  • 51. Cassandra / Kafka Support in EC2/AWS ™ Elastic Network Interface ENI ❖ ENI is a virtual network interface - network interface in AWS speak ❖ Can attach to EC2 instance in a VPC - detach an ENI and attach to another EC2 instance ❖ Attributes : description, primary private IPv4 address, multiple secondary private IPv4 addresses, EIP per private address, public IPv4 address, multiple IPv6 addresses, multiple security groups (at least one), MAC address, source/destination check flag ❖ Keeps its attributes no matter which EC2 instance it is attached to ❖ If an underlying instance fails, the IP address (MAC, public IP, EIPs, etc.) are preserved ❖ Makes EC2 instances replaceable - low-budget, high-available solutions What special Cassandra nodes might benefit from using an ENI to keep their private IP constant even if instance goes down? How are ENIs different than EIPs? How are they similar? Amazon VPC
  • 52. Cassandra / Kafka Support in EC2/AWS ™ Security Groups Amazon VPC
  • 53. Cassandra / Kafka Support in EC2/AWS ™ Security Groups (SG) ❖ stateful firewall - controls inbound and outbound network traffic to EC2 instances and AWS resources ❖ Stateful means an Amazon instance (or resource) is allowed to respond to an inbound traffic with outbound traffic ❖ EC2 instances have to be associated with a security group ❖ Rules are only allow rules ❖ Rules consist of the following attributes: ❖ Source (CIDR or SG id) ❖ Protocol (TCP, ICMP, UDP, HTTP, HTTPS, SSH, etc.) ❖ Port range (8000-8080) Amazon VPC
  • 54. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation SG Bastion Amazon VPC
  • 55. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation SG Cassandra The above allows all traffic from the VPC’s CIDR to access this box. -1 means all ports. Private Subnet Amazon VPC
  • 56. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation SG Cassandra What is different about this SG for Cassandra than the last one? How could we narrow which EC2 instances access Cassandra nodes? Amazon VPC
  • 57. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation: NACL Amazon VPC
  • 58. Cassandra / Kafka Support in EC2/AWS ™ Network ACL (NACL) ❖ Amazon Network Access Control List (NACL) ❖ Stateless firewall ❖ Provides a number ordered list of rules ❖ Lowest number rule evaluated first ❖ First rule that allows or denies wins ❖ Has both allow rules and deny rules ❖ Return traffic must be allowed (stateless) ❖ Applies to the whole subnet How does this compare to Security Groups? Amazon VPC
  • 59. Cassandra / Kafka Support in EC2/AWS ™ CloudFormation for NACL Amazon VPC
  • 60. ™ Cloudurable Cassandra AWS Support Related important AWS Concepts Important AWS concepts helpful for DevOps of clustered software Amazon VPC
  • 61. Cassandra / Kafka Support in EC2/AWS ™ AWS Important Concepts ❖ Auto Scaling ❖ used scale Amazon EC2 capacity up or down automatically ❖ autoscale a group of instances based on workload ❖ used to recover when instances go down by automatically spinning up an instance to take its place ❖ Amazon Route 53 ❖ DNS as a service. Route 53 is highly available and scalable ❖ Easily assign DNS names instead of configuring with public IP addresses (internal and external) ❖ Amazon CloudFormation: allows developers, DevOps, and Ops create and manage a collection of related AWS resources Could you use Route53 instead ENI for seed servers? Amazon VPC
  • 62. Cassandra / Kafka Support in EC2/AWS ™ Aws CMD: CloudFormation Route53 Create a DNS name for a public IP address from an EC2 instance Create a new CloudFormation stack (like JSON file from earlier) Amazon VPC
  • 63. Cassandra / Kafka Support in EC2/AWS ™ More important AWS concepts ❖ IAM ❖ AWS Identity and Access Management (IAM) enables secure control access to AWS Cloud services and resources for their users ❖ Defines IAM defines, users, roles, and allows you to apply this to EC2 instances as well as users or groups of users (example in notes) ❖ KMS ❖ AWS Key Management Service (KMS) allows you to create and control the encryption keys ❖ Uses Hardware Security Modules (HSMs) to protect the security of your keys ❖ Used to encrypt Amazon EBS volumes, Amazon S3 buckets and other services. ❖ S3 ❖ Amazon Simple Storage Service to store your backups and big data. ❖ Good for backups of Cassandra snapshots Amazon VPC
  • 64. Cassandra / Kafka Support in EC2/AWS ™ Amazon CloudWatch ❖ Monitoring service it uses for its AWS Cloud resources and services ❖ Can be used for your services and applications ❖ Track key performance indicators (KPIs) and metrics ❖ Log aggregation, and can easily create alarms. ❖ Trigger AWS Lambda functions based on limits of an KPI or how often an item shows up in log stream in a give period of time ❖ Provides system-wide visibility into resource utilization, and operational health ❖ Not passive – Action oriented ❖ Integration with the entire Amazon ecosystem **integration! ❖ Actionable: Triggers and events to keep everything running smoothly Amazon VPC

Editor's Notes

  • #9: AWS supports regions around the world and throughout the USA. A region is like a datacenter. Regions are independent of each other. You can place services in a region to be closer to your end consumer to lower latency and to improve reliability. An Availability Zone (AZ) is isolated but multiple AZs live in a region. Placing your services and application in separate Availability Zones, protects you from outages. Each AZ in region has independent power, backup generators, UPS units, and often use different utility companies when possible. AZs may exists in a separate location of a metropolitan area. AZs are redundantly connected together with fast connections that deliver low-latency using multiple tier-1 transit providers. A VPC lives in a single region and a VPC subnet must live in a single AZ.
  • #11: Amazon Elastic Compute Cloud (Amazon EC2) Amazon EC2 is AWS primary web service that provides resizable compute capacity in the cloud. EC2 Compute Compute is computational power needed for your use case. Amazon EC2 allows add compute resources through its Web Service API. EC2 allows you to launch instances. An instance is a server and you can install whatever software you need for your service or web application: NGINX, Apache httpd, Cassandra, Kafka, etc. When you launch a virtual server, an instance in EC2 speak, you can use it as you like just like you would a server in your datacenter. You pay for the compute power that you use. There are different instance types with various ranges of CPU, RAM, IO, and networking power. You pay for compute resources by the hour. You can use more instances and you can reserve instances for longer periods of time for a price break.
  • #12: Instance Types The instance type defines the size of the virtual instance. There are many types of EC2 instances with different levels of: Virtual CPUs (vCPUs) Memory RAM (size and type) Network performance There are families of instance types. Amazon used its own way to measure compute power called ECU, but has since moved to the more industry standard vCPU. A vCPU is a hyperthread of an Intel Xeon core for M4, M3, C4, C3, R3, HS1, G2, I2, and D2.
  • #13: T2 - inexpensive and burst-able (good for less expensive and more sporadic workloads) M4 - new generation of general purpose instances (added clustering and placement groups to M3) M3 - old generation of general purpose instances (don’t use this one, M4 is cheaper and better) C4 - compute optimized like M4 but less memory and more vCPUs (use this if you are not using all of your M4 memory) C3 - use C4 as C3 does not provide clustering and placement groups C3 is to M3 as C4 is to M4 P2 - GPU intensive applications (Machine learning) G2 - graphics-intensive applications (server-side graphic workloads)
  • #14: M4 - new generation of general purpose instances (added clustering and placement groups to M3) C4 - compute optimized like M4 but less memory and more vCPUs (use this if you are not using all of your M4 memory) X1 - memory optimized for in-memory computing (SAP HANA) R3 - memory intensive databases and distributed caches (MongoDB) I2 - High IOPS at lower cost, SSD storage (MongoDB, Cassandra) D2 - High IO throughput and large disks at lower cost, magnetic storage (MapReduce, Cassandra, Kafka)
  • #16: Getting the most bang for your buck with AWS Elastic Block Store (EBS) Understanding what AWS/EC2 provides for provisioning on-demand storage is critical for DevOps. Companies waste tons by over provisioning AWS. Amazon Elastic Block Store Amazon Web Services (AWS) provides Amazon Elastic Block Store (Amazon EBS) for EC2 instance storage. EBS is the virtual hard drives and SSDs for your servers running in the cloud. Amazon EBS volumes are automatically replicated, and it is easy to take snapshots of volumes to back them up in a known state. The replication happens within an availability zone (AZ). AWS EBS has lots of advantages like reliability, snapshotting, and resizing.
  • #17: EBS Volumes Types There are many types of volumes. Different types have different performance characteristics. The trick is to pick the most cost-efficient for the workload of your service. AWS provides four volume types. It provides two types of Hard Disk Drives (HDD), and two types of SSDs. Volumes differ in price and performance. An EC2 instance can have many volumes attached to it, just like a server can have many drives. A volume can only be attached to one EC2 instance at a time. If you wanted to share files between EC2 instances than you would use Amazon Elastic File System or S3.
  • #18: Magnetic Volumes - Hard Disk Drives (HDD) Magnetic volumes have the lowest performance for random access. However, they have the least cost per gigabyte. But, they have the highest access for throughput (500 MB/s) for sequential access. Magnetic volumes average 100 IOPS, but can burst to hundreds of IOPS. IOPS are Input/output operations per second (pronounced eye-ops). IOPS are used to characterize storage devices. Services like Kafka which writes to a transaction log in long streams, and databases which use log structured storage or an approximate of that using some sort of log structured merge tree (examples LevelDB, RocksDB, Cassandra) might do well with HDD EBS - Magnetic volumes. Application that might employ streaming, or less operations per second but larger writes could actually benefit from using HDDs throughput performance.
  • #19: In general, magnetic volumes do best with sequential operations like: streaming workloads which require cost effective, fast, consistent I/O big data data warehouses log processing Databases that employ log structured merge tree
  • #20: There are two types of HDD - Magnetic Volumes: st1 - Throughput Optimized HDD sc1 - Cold HDD and most cost effective
  • #21: General-Purpose SSD (gp2) General-purpose SSD (gp2) volumes are cost effective, and useful for many workloads. It is the minivan of EBS. Not sexy but works for a lot of applications, and is common. Performance of gp2 is three IOPS per gigabyte provisioned, but capped at 10,000 IOPS. The sizes range from 1 GB to 16 TB. Databases that use some form of BTrees (MongoDB, MySQL, etc.) can benefit from using SSD. But gp2 would be more geared to a lower volume database or one that has peak load times but long periods at rest where IOPS credits can accumulate.
  • #22: Under 1 TB these volumes burst to 3,000 IOPS for extended periods of time. For example, if you have a 250 GB volume you can expect a baseline of 750 IOPS. When those 750 IOPS are not used, they are accumulated as IOPS credits. Under heavy traffic, those IOPS credits will be used and this is how you can burst up to 3,000 IOPS. IOPS credits is like a savings account. You use this savings when you get hit hard by a user tornado. But as you are using it, the bank account is being withdrawn from. Development and test environments A server than does periodic batch or cron jobs Recommended for most workloads Can be used for boot volumes Low-latency interactive apps Medium-sized databases
  • #24: Provisioned IOPS SSD (io1) Provisioned IOPS SSD volumes are for I/O-intensive workloads. These volumes are for random access I/O throughput. They are the most expensive Amazon EBS volume type per gigabyte. And, they provide the highest performance of random access of any Amazon EBS volume. With this volume type you can pick the number of IOPs (pay to play). The IOPs can be up to 20,000. These volumes are great for high-volume databases or just databases that need a constant level of performance. High volume databases that use some form of BTrees (MongoDB, MySQL, etc.) can benefit from using this SSD volume. The io1 IOPS can be ramped up. Provisioned IOPS SSD volumes are more predictable (don’t have to store up IOPS like gp2), and for application with higher performance needs like: Mission critical business applications that require sustained IOPS performance Databases with large, high-volume workloads Overcoming the performance problems by using Provisioned IOPS is expensive. Some companies have employed RAID-0 striping using a 4-way stripe and used EnhanceIO to effectively increased throughput by over 50% with no more additional expense. EnhanceIO driver is based on EnhanceIO SSD caching software product developed by STEC Inc. EnhanceIO was derived from Facebook’s open source Flashcache project. EnhanceIO uses SSDs as cache devices for traditional rotating hard disk drives (referred to as source volumes throughout this document). –EnhanceIO RAID-0 can be employed to increase size constraints of EBS and to increase throughput.
  • #27: “EBS–optimized instances deliver dedicated bandwidth to Amazon EBS, with options between 500 Mbps and 12,000 Mbps, depending on the instance type you use. When attached to an EBS–optimized instance, General Purpose SSD (gp2) volumes are designed to deliver within 10% of their baseline and burst performance 99% of the time in a given year, and Provisioned IOPS SSD (io1) volumes are designed to deliver within 10% of their provisioned performance 99.9% of the time in a given year. Both Throughput Optimized HDD (st1) and Cold HDD (sc1) guarantee performance consistency of 90% of burst throughput 99% of the time in a given year. Non-compliant periods are approximately uniformly distributed, targeting 99% of expected total throughput each hour. For more information, see Amazon EBS Volume Types.” http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html#enable-ebs-optimization
  • #28: Performance testing and monitoring The best way to make an educated guess to pick the right EBS is to know your tool. If you are deploying Kafka or Cassandra or MongoDB then you must understand how to configure the tool and EBS to get the most bang for the buck. When in doubt, test. You can make educated guesses about which EBS will fit your application or service the best. However using Amazon CloudWatch and watching the IOPs and IO throughput while load testing or watching production workloads could be the quickest way to to pick the best EBS volume type and get the most bang for your buck. This can also help you decide whether or not to use RAID 0, HDDs (st1 or sc1), provisioned IOPS SSD (io1), SSD general purpose (gp2) or not. There is no point in overpaying, and you do not want a laggy service or application that do not meet their SLAs.
  • #29: Understanding what AWS provides for backing up EBS volumes is an important concept for DevOps. Data safety with EBS - Backup/Recovery (Snapshots) Amazon EBS allows you to easily backup data. You do this by taking snapshots. Snapshots are point-in-time backups. Data written to an EBS volume can be periodically used to create a snapshot. Snapshots provide incremental backups of your data. Snapshots just saves the blocks that have changed. Only changed blocks since the last snapshot are saved in the new snapshot. Even though snapshots are saved incrementally, only the last snapshot is needed in order to restore the volume. You can delete older snapshot, and still use the latest snapshot.
  • #30: Taking EBS Snapshots Snapshots are done with: AWS Management Console Scheduled snapshots AWS API AWS CLI EBS snapshots are backed by S3 in AWS-controlled storage. You can’t see these snapshots your account’s Amazon S3 buckets. It is hidden from you but backed by S3. You use the snapshot tools to manage snapshots not S3. Snapshots are stored per region. You use snapshots to create new EBS volumes. Snapshots can be copied to other regions. Using snapshots Snapshots are used to create new EBS volumes. The volume is created and the data is transferred lazily into the EBS volume. Data accessed before the transfer is restored on request. If you need to increase the size of a volume, you create a snapshot and then recreate a larger volume from the snapshot. Then replace the original volume with the new volume from the snapshot. Tag snapshots to help manage them later. Describing the original volume of the snapshot with the device name (/dev/sdd). The AWS console can be used to take snapshots or the command line. Restoring volumes from snapshots Amazon EBS volumes persist beyond the EC2 instance lifecycle. Thus you can recover data of an instance that fails. Before the instance is terminated, the volume should be detached. Then the volume can attached as a data volume to another instance and then the data can be recovered. Backing up root devices To create a snapshot from a root devices EBS volume, should unmount the volume before taking the snapshot so the OS or application that have outstanding / cached blocks can flush them. To unmount the volume in Linux, use the following command: umount -d /dev/sdc
  • #34: Understanding what AWS provides for setting up private networks, security groups and more is important for anyone who calls themselves DevOps. AWS allows you to define a software defined network. You do this with Amazon Virtual Private Cloud (Amazon VPC). You can define subnets, ingress rules, security groups, NAT gateways, Internet gateways, and more. A VPC is a virtual private cloud. You can create multiple Amazon VPCs within a region that spans multiple availability zones. A VPC is an isolated area to deploy instances. A VPC is associated with a CIDR block. Amazon VPC DHCP Option Set A VPC is associated with a DHCP Option Set. Dynamic Host Configuration Protocol (DHCP) provides a standard for configuring TCP/IP networks. DHCP Options allow you to configure DHCP per VPC as follows: domain name domain name server netbios-node-type By default AWS creates and associates a DHCP option set for your Amazon VPC. The default DHCP option set uses domain-name-servers set to AmazonProvidedDNS (Amazon Domain Name System), and the domain-name set to the domain name for your region
  • #35: With CIDR block notation the /# denotes the size of the network or rather how many bits of the address will be used for the network. For example: 10.10.1.32⁄27 denotes a CIDR range (also known as CIDR block). It denotes that the first 27 bits of address is for the network (32 bits total). 32 - 27 leaves five bits for your servers. 00000-11111. The first five addresses are reserved in a subnet, and the last address is reserved for broadcast. This leaves us 26 addresses for our servers. There are tools to help build CIDR based subnets. From address 10.10.1.37 to 10.10.1.61. VPC address range may be as large as /16 (32-16 = 16 bits which allows for 65,536 available addresses) or as small as 16 addresses (/28 is 32 - 28 = 4 bits which is 16 available addresses). The addresses of two VPC should not overlap if you plan on adding VPC peering.
  • #37: An Amazon VPC is made up of subnets, route tables, DHCP option sets, security groups, and Network ACLs. An AWS VPC can also have Internet Gateways (IGWs), Virtual Private Gateways, VPGs, Elastic IP (EIP) addresses, Elastic Network Interfaces (ENIs), Endpoints, Peering, and NAT gateways. A VPC has a router defined by its route tables (per subnet and default).
  • #38: The VPC CIDR has 65536 hosts. Avoid the last few and first five hosts in any subnet. Amazon CloudFormation CloudFormation allows developers, DevOps, and Ops create and manage a collection of related AWS resources. You can create and update items in a predictable fashion. You do this by creating CloudFormation templates which are written in JSON or YAML. Then you can submit the templates to be create stacks which can be updated. We will use CloudFormation to cover the different components of the VPC in more detail.
  • #39: Amazon VPC Subnets A subnet is a part of an VPC’s IP address range. Just like a VPC you need to specify a CIDR blocks for a subnets. Subnets are associated with availability zones (independent power source and network). Subnets can be public or private. A private subnet is one that is not routable from the IGW.
  • #42: Internet Gateways An Internet Gateway (IGW) enables traffic from the public Internet to your VPC. Subnets that have route tables that target the IGW are public subnets. The IGW does network address translation from public IPs of EC2 instances to their private IP for incoming traffic. When an EC2 instance send IP traffic from a public subnet, the IGW acts as the NAT for the public subnet, and translates the reply address to the EC2 instance’s public IP (EIP). The IGW keep track of the mappings of EC2 instances private IP address and their public IP address. AWS ensures that the IGW is highly available and handles the horizontal scale, redundancy as needed.
  • #43: We are covering route tables next.
  • #44: Amazon VPC Subnet Route Tables Route tables contain a set of ingress and egress rules called routes. These rules are applied to the subnet. The routes direct network traffic. The route tables connect subnets within a VPC so they can communicate. Routes are specified by CIDR and a target. The most specific route that matches the traffic determines how to route the traffic. Route tables can specify which subnets are public and which subnets are private (if the subnet does or does not have a route to the InternetGateway). Each subnet is always associated with a route table which dictates routes for that subnet. If a route table for a subnet is not specified then that subnets uses the main route table (which is associated with the VPC).
  • #45: The above shows the Amazon CloudFormation from the public subnet defined earlier to the Internet Gateway.
  • #46: Amazon VPGs, CGWs and VPNs Amazon allows VPCs to be connected to your existing data center to allow AWS to augment your existing IT infrastructure. You can connect your existing datacenter to an Amazon VPC using VPG (Virtual Private Gateways) and CGW (Customer Gateways). Think of the VGW like the IGW but it sends traffic to your corporate network instead of the public Internet. VPGs connect to your companies Virtual Private Network (VPN) connector. The VPG is the Amazon side of the VPN connection. The CGW is the customer side of the VPN connector. CGWs are processes running on a server or network device. You connect a VPG and a CGW with a VPN tunnel, which allows traffic between your corporate network and your Amazon VPCs. The VPN connection uses the IPSec (Internet Security Protocol) tunnels for higher availability to the AWS VPC. You can setup the VPN connection to use dynamic routing if the CGW supports it (via Border Gateway Protocol). If your CGW does not support dynamic routing, use static routes to decide which traffic is meant for the VPC. Routes are propagated to the VPC to allow traffic back to your corporate network via the VGW.
  • #47: The above shows that we are covering Elastic IPs and NatGatways next.
  • #48: Amazon Elastic IP (EIP) AWS has a pool of public IP addresses available to rent per region. These public IP addresses are called Elastic IP Addresses (EIPs). You check out an EIP like a library book. As long as you have the EIP checked out, no one else can use it. You can keep the EIP as long as you want but you pay for it. Unused EIPs are more expensive than EIPs that you are using with an EC2 instance. You can assign an EIP to an instance (and only one). You could spin up a new upgraded version of the instance from a snapshot, and the reassign the EIP to the new upgraded instance. EIPs allow using a set of fixed public IP addresses that can be reassigned to underlying infrastructure which could change over time. EIPs are allocated in a VPC, but can be moved to another VPC in the same region. EIPs can be assigned to resources like EC2 instances.
  • #49: Amazon NAT Gateways Amazon EC2 instances launched in a private subnet cannot access the Internet unless there is a NAT. A NAT is a network address translator. Even if you wanted to update your instances with yum install foo, you could not do it because they have no route to the public Internet. AWS provides NAT gateways which are similar to IGW but unlike IGWs they do not allow incoming traffic, but rather only allow responses to outgoing traffic from your Amazon EC2 instances. NAT gateways are simple to manage and highly available. A VPC subnet lives in a single Availability Zone (AZ). To maximize failover you will want to deploy a NAT gateway per AZ. To allow Amazon EC2 instances within a private subnet to access Internet resources through the IGW using a NAT gateway, you must do the following: Set up the route table by connecting the private subnet to direct Internet traffic to the NAT gateway Associate the NAT gateway with an EIP They added an egress only gateway which is like a NAT gateway but only works with IPv6.
  • #51: Amazon Enhanced networking: Placement groups and networking speed Instance types m4, c4, p2, g2, r3, g2, x1, i2 and d2 support placement groups which are essential for server to server performance which is important for clustering. To achieve maximum throughput, placement groups must be placed in the same AZ. Amazon EC2 instances can achieve speeds of up to 10 Gbits if both instances are in the same placement group and in the same AZ.
  • #52: Amazon Elastic Network Interface (ENIs) An ENI is an Elastic Network Interfaces. An ENI is often just called a network interface in AWS speak. An ENI is a virtual network interface that you can attach to an instance in a VPC. You can also detach an ENI and attach to another EC2 instance. ENIs don’t work with EC2 classic (no VPC EC2). An ENI can have the following properties: description primary private IPv4 address potentially multiple secondary private IPv4 addresses EIP per private address public IPv4 address multiple IPv6 addresses multiple security groups (at least one) MAC address source/destination check flag Again, the what makes these ENIs elastic is that you can create an ENI, attach it to an EC2 instance, detach it from an EC2 instance, and attach it to another. The ENI keeps its properties no matter which instance it is attached to. If an underlying instance fails, the IP address (MAC, public IP, EIPs, etc.) are preserved by attaching the ENI to a new replacement EC2 instance. ENIs can be used to create low-budget, high-available solutions.
  • #53: We are covering Amazon Security Groups next.
  • #54: Amazon VPC Security Groups A security group (SG) is a stateful firewall that controls inbound and outbound network traffic to EC2 instances and AWS resources like Elastic Load Balancers. Security groups being stateful means an Amazon instance (or resource) is allowed to respond to an inbound traffic with outbound traffic. AWS EC2 instances have to be associated with a security group if not specified then it is associated with the default security group for the VPC. AWS EC2 instances can be associated with security groups after they are already running. Each VPC can have up to 500 security groups. Rules are only allow rules. Rules consist of the following attributes: Source (CIDR or SG id) Protocol (TCP, ICMP, UDP, HTTP, HTTPS, SSH, etc.) Port range (8000-8080) Security groups specify up to fifty inbound and 50 outbound rules using CIDRs or other security groups. AWS will evaluate every rule before deciding to permit traffic.
  • #55: Source can be a CIDR or other security groups in the same VPC. Notice the SG is tied to VpcMain defined before. Note that 0.0.0.0/0 equates to all public traffic.
  • #56: The above allows all traffic from the VPC’s CIDR to access this box. -1 means all ports. Ingress in incoming traffic. Egress is outgoing traffic.
  • #57: This is taken from an example that uses Cassandra running in multiple regions. The Cassandra EC2 instances are running in a public subnet.
  • #58: We are covering Amazon Network ACL Control List (NACL) next.
  • #59: Amazon Network Access Control List (NACL) Network ACL (NACL) is a stateless layer of security. NACLs act as a stateless firewall. NACLs provide a number ordered list of rules. The lowest number rule is evaluated first. First rule that allows or denies wins. NACLS support both allow rules and deny rules. Return traffic must be allowed (stateless). NACL applies to the whole subnet.
  • #62: Auto Scaling Auto Scaling is used scale Amazon EC2 capacity up or down automatically. You can autoscale a group of instances based on workload. It can be used to recover when instances go down by automatically spinning up an instance to take its place. Auto Scaling groups can span multiple AZs. Amazon Route 53 Amazon Route 53 is a DNS as a service. Route 53 is highly available and scalable. You can use easily assign resources DNS names instead of configuring with public IP addresses. Amazon CloudFormation CloudFormation allows developers, DevOps, and Ops create and manage a collection of related AWS resources. You can create and update items in a predictable fashion. You do this by creating CloudFormation templates which are written in JSON or YAML. Then you can submit the templates to be create stacks which can be updated.
  • #63: Amazon makes it easy to create immutable infrastructure with its AWS command line tools and CloudFormation. Get into the habit of using CloudFormation and AWS command line instead of the console to launch instances.
  • #64: IAM AWS Identity and Access Management (IAM) enables secure control access to AWS Cloud services and resources for their users. IAM defines, users, roles, and allows you to apply this to EC2 instances as well as users or groups of users. To use CloudWatch logging or metrics from an application, you would need to assign rights to a role and then associate an IAM role with your EC2 instance. KMS AWS Key Management Service (KMS) allows you to create and control the encryption keys. KMS uses Hardware Security Modules (HSMs) to protect the security of your keys. KMS can be used to encrypt Amazon EBS volumes, Amazon S3 buckets and other services. KMS can be used for compliance encryption operations for SOC1, SOC2, SOC 3, PCI DSS Level, ISO 27017⁄20018, and for FIPS 140-2. KMS also provides an REST API to encrypt data on an application basis. AWS Certificate Manager AWS Certificate Manager removes the time-consuming manual process of purchasing, uploading, and renewing SSL/ TLS certificates. AWS Certificate Manager allows provision, manage, and deploy Secure Sockets Layer/ Transport Layer Security (SSL/ TLS) certificates for use with AWS Cloud services like ELB or CloudFront (an Amazon CDN). No longer do you have to purchase, upload and manually update/renew SSL/TLS certificates for the ELB or CDN. Amazon CloudFront Amazon’s CDN to put resources closer to end users of your applications and services. Amazon S3 Amazon Simple Storage Service to store your backups and big data. Good for backups.
  • #65: Amazon CloudWatch Amazon CloudWatch is a monitoring service it uses for its AWS Cloud resources and services. However you can use CloudWatch for your services and applications. CloudWatch can track key performance indicators (KPIs) and metrics, allow log aggregation, and can easily create alarms. You can even trigger AWS Lambda functions based on limits of an KPI or how often an item shows up in log stream in a give period of time. Amazon CloudWatch provides system-wide visibility into resource utilization, and operational health. Unlike many monitoring systems, CloudWatch integration with the entire Amazon ecosystem so you not only have insights into your systems but you can react to triggers and events to keep everything running smoothly.