Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical Architecture (Cloud Foundry Summit 2016)

By Sergey Sverchkov
Software Architect at Altoros
sergey.sverchkov@altoros.com
Taking Cloud to Extremes: Scaled-down, Highly
Available, and Mission-critical Architecture
www.altoros.com
@altoros

@altoros
Solution Requirements
● An IoT healthcare solution:
○ Connect devices and users located at customer sites
○ Thousands of devices
○ Hundreds of customers
○ Collect, process, and visualize device data

@altoros
● Available as a private regional cloud:
○ Operated by a third-party
○ Addressing specific region regulations
○ Serving clients and providing region proximity
● A “scaled-down” version for on-site deployments:
○ Cost-effective
○ Easy remote maintenance
○ Backup data to the regional cloud
Regional Cloud
Customer Facility 1
Local Cloud
Customer Facility 2
Local Cloud

@altoros
● Consider implementation restrictions:
○ Limited resources for on-site deployment
● Review and approval by government agencies:
○ Open source technologies and products
○ Unified architecture for regional and local clouds

@altoros
● High availability and scalability:
○ A hardware and infrastructure platform
○ Cloud services and applications
● Security is essential:
○ VPN connectivity
○ Non-VPN connections should be supported
○ WebSocket, TCP, and HTTP protocols

@altoros
Infrastructure: OpenStack vs. VMware
● VMware vSphere is about virtualization:
○ ESXi is the only supported hypervisor
○ vCenter for management
● OpenStack is about cloud:
○ Storage, network, and compute services
○ Security groups and access control
○ Projects and quotas
○ Supports KVM, ESXi, and QEMU

@altoros
VMware component License cost, USD
VMware vSphere Standard, 1 CPU $995
VMware vCenter Server Standard $4,995
Server CPU Cost per node, USD
SuperMicro 5038MR-H8TRF Intel Xeon E5-2620 v2 $1,800
OpenStack Cost, USD
5 compute nodes 5 * $1,800
3 controller nodes 3 * $1,800
Total $14,400
VMware Cost, USD
5 ESXi (compute) nodes 5 * $1,800 + 5 * $995
1 vCenter appliance 1 * $4,995
Total $18,970
Infrastructure: OpenStack vs. VMware
● Cost estimation for 5 nodes

@altoros
Platform Deployment View

@altoros
OpenStack Deployment Considerations
● Availability zones:
○ Identical zones for compute and storage services
● Support for VM migration:
○ Use Ceph for volumes and ephemeral disks
○ Free the capacity of one compute node
● Increase default values in nova.conf:
○ security_groups = 100
○ security_group_rule=300
○ volumes = 500
○ cpu_overcommit = 4

@altoros
Cloud Services
● Cloud Services—HA support:
○ Cassandra
○ MariaDB Galera
○ RabbitMQ
○ ElasticSearch, Logstash, Kibana (ELK)

@altoros
● For microservices architecture
● Runtime automation
● Organizations, users, spaces, and security groups
● Health checks, load balancing, and scaling
● AWS, OpenStack, and VMware
The Application Platform: Cloud Foundry

@altoros
The Cloud Platform: HA Deployment

@altoros
Jobs
Instances,
zone 1
Instances,
zone 2
Instances,
zone 3
CPU per
instance
RAM per
instance, GB
RAM
total, GB CPU total
etcd 1 1 1 1 2 6 3
UAA + CC DB 1 1 2 2 1
Cloud Controller 1 1 1 4 8 2
Doppler 1 1 1 1 1 3 3
Traffic Controller 1 1 1 1 2 2
Runners 2 2 2 16 64 384 96
Total for CF jobs 33 447 133
Cloud Foundry Planning

@altoros
Cloud Foundry HA Deployment Issues
● CC and UAA databases?
✓ Use BOSH Resurrector
✓ Use external MariaDB Galera
● BOSH Director ?
✓ Plan BOSH VM Recovery
● Blob store ?
✓ Store blobs in OpenStack Swift

@altoros
BOSH Director Recovery
● You will need:
○ bosh-state.json
○ bosh.yml manifest
○ BOSH persistent disk
● Edit bosh-state.json only with these properties:
○ installation_id
○ current_disk_id
● Re-deploy BOSH and attach the persistent disk:
bosh-init deploy bosh.yml
Total time: around 25 min

@altoros
Blob Storage in OpenStack Swift
● Set OpenStack as the provider in the deployment manifest:
properties:
cc:
packages:
app_package_directory_key: cc-packages
fog_connection: &fog_connection
provider: 'OpenStack'
openstack_username: 'cfdeployer'
openstack_api_key: 'ddd3dd23'
openstack_auth_url: 'http://172.30.0.3:5000/v2.0/tokens'
openstack_temp_url_key: '1328d0212'

@altoros
BOSH Resurrection
● Configure resurrection for the database VM:
$ bosh vm resurrection pg_data/0 on
● Measure the approximate time for restoring a VM:
○ 60 sec: agent health-check every
○ 60 sec: to mark agent as unresponsive
○ 120 sec: time to recreate the VM on OpenStack
○ 60 sec: time to initialize
Total: around 5 min.
● When a physical VM is down:
○ Resurrector recreates all VMs in the same AZ

@altoros
Cassandra in OpenStack Ceph

@altoros
Cassandra in OpenStack Ceph: Pros and Cons
● Pros:
○ Automation—all cloud services are in OpenStack.
○ Ceph is distributed and replicated storage.
○ Low cost compared to hardware SAN.
● Cons:
○ The replication factor is 6: 2 in Ceph * 3 in Cassandra.
○ Cassandra performance is impacted by network performance.

@altoros
Testing Cassandra in OpenStack Ceph
● OpenStack configuration:
○ 1 Gb network
○ 1 CPU per node — E5-2630 v3 2.40 GHz
○ 2.0 TB SATA 6.0 Gb/s 7200RPM for Ceph
● Cassandra configuration:
○ Node: 8 vCPUs, 32 GB of RAM
○ 6 nodes in 3 AZ; 2 nodes per AZ
○ A simple strategy with a replication factor of 3
○ Cassandra stress-test tool

@altoros
Operations / sec Avg. latency, ms Latency 99%, ms Max. latency, ms
47,700 2.8 10.1 3,851.7
Operations / sec Avg. latency, ms Latency 99%, ms Max latency, ms
65,250 2.1 5.5 50.8
Operations / sec Avg. latency, ms Latency 99%, ms Max latency, ms
54,150 2.5 7.1 2,062.1
Testing Cassandra in OpenStack Ceph
● 100% writes
● 100% reads
● 50% writes, 50% reads

@altoros
Cassandra Recommendations
● Cluster and node sizing:
○ Effective data size per node: 3–5 TB
○ Tables in all keyspaces: 500–1,000
○ 30–50% of free space for the compaction process
● DataStax storage recommendations:
○ Use local SSD drives in the JBOD mode

@altoros
Altoros’s Contributions to Cloud Foundry
● Cassandra Service Broker for CF :
https://github.com/Altoros/cf-cassandra-broker-release.git
● Improvements to the ELK BOSH release and CF integration:
○ RabbitMQ input, Cassandra output for Logstash
○ Logstash filters
https://github.com/logsearch/logsearch-boshrelease/commits?author=axelaris
https://github.com/cloudfoundry-community/logsearch-for-cloudfoundry/

@altoros
Altoros’s Contributions to Other Projects
● Cassandra Web Tool for Developers—run CQL
○ Coming soon in open source!

@altoros
Questions?
sergey.sverchkov@altoros.com
Thank you!
For more:
altoros.com
altoros.com/research-papers
blog.altoros.com
twitter.com/altoros

Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical Architecture (Cloud Foundry Summit 2016)

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical Architecture (Cloud Foundry Summit 2016) (20)

More from Altoros (20)

Recently uploaded (20)

Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical Architecture (Cloud Foundry Summit 2016)

Editor's Notes