Zenko: Enabling Data Control in a Multi-cloud World

Thursday, August 3, 2017
Enabling Data
Control in a
Multi-Cloud
World
Giorgio
Regni
Scality Co-founder & CTO
Laure
Vergeron
Software Engineer
Zenko Webinar:

We’ll shed some light on a few questions:
• What does multicloud mean?
• How to leverage the efficiency of both public and private clouds?
• How can this multi-cloud data controller do search and discovery across
clouds?
• How can you get involved with Zenko?
1
Zenko Webinar: Multi-cloud, hybrid stores, open-source

Freedom to leverage multiple, different cloud infrastructures, private or
public
Acknowledge that each application has its own infrastructure requirements
that will evolve over time
Acknowledge that each cloud service has their own domain of expertise
and leverage the native services they offer
2
What does multicloud mean?

To you, what does multi-cloud enable?
(here is one of our answers)
Examples of Use-Cases For Multi-Cloud?
3

▪ Content Distribution
▪ Media companies have tens of thousands of movies, which they store on Private Cloud for
control. When it is time to publish a movie, it makes sense to copy it to a public cloud to use
its transcoding and CDN services.
▪ Compute Bursting
▪ Banks have to do risk analysis leveraging thousands of CPU every night. These intense
computation only run for a few hours. Rather than having idle servers for the rest of the day, it
makes sense to use Public Cloud services for the computation
▪ Analytics
▪ E-commerce company do more and more machine learning on their very large data lake.
Rather than setting up Hadoop infrastructures in-house, the company can copy just a data set
to an Hadoop cloud, compute the appropriate algorithm, and get back the result and destroy
the cloud copy of the data to save on storage cost.
▪ Long-term Archival / cold storage
▪ While storing data which is regularly accessed is cheaper in a private cloud, long term archive
of never accessed data is cheaper in long term archive cloud offering. Automatic archival of
never accessed data would save a lot of money.
Examples of Use-Cases for Multi-Cloud

5
The Zenko Vision
Control and Freedom for data in a
Multi-Cloud world.

• Single Interface to any Cloud
▪ S3 API as a single API set to any cloud
• Allow reuse in the Cloud
▪ Maintain the native cloud format
• Always know your data and where it is
▪ Metadata search
• Trigger actions based on data
▪ Data Workflow to manage replication, location
The Zenko Multi-Cloud Data Controller
Native
format
Data
Management
Data Insight
One S3 API for
any cloud
6

• Allow reuse in the Cloud
▪ Maintain the native cloud format
• Full compatibility with S3
▪ IAM, policies, request syntax...
The Zenko Multi-Cloud Data Controller
7

What was Zenko’s CloudServer previous name?
(check your answer on DockerHub)
Zenko is not Scality’s first open-source project...
8

Open Source Scality CloudServer Adoption
▪Launched June 2016
▪Open-source implementation
of AWS S3 API
▪Code available on Github
under Apache 2.0 license
▪Packaged in Docker
container for easy
deployment
▪Seamless upgrade to S3
Connector for the RING
Now Over
700,000

"Scality provides our backend storage and gives us a single interface for
developers to code within any cloud on a common API set. With Scality,
we can write an application once and deploy anywhere on any cloud.”
Mathias Herberts, co-founder and CTO at Cityzen Data
…Developers Are Seeing the Benefits
10
“We are big users of Docker in our production environment and implemented
the Docker version of Scality S3 Server. It is efficient, secure with encryption
and S3 authentication, and very easy to maintain.”
Christian Patry, System Engineer, BlueSolutions by Polyconseil

• Amazon S3 API is the defacto standard
▪ Extended to support multiple cloud backends
▪ Provides a full featured S3 interface independent of the backend cloud stores ability
• Native Cloud Data Format
▪ Data stored in cloud must be accessible in standard format
▪ Access via standard keys and methods - enables use by cloud services without change
• Highly Available Cloud Service
▪ Integrated in HA services through Docker
▪ If higher level of availability required then extensible to RING
CloudServer – Amazon S3 & Native Format
11
S3 API
?OPAQUE
DATA
CLOUD
GATEWAY
?
Many gateway products store data in a closed "black box" format
that is unreadable by native cloud services and apps in the cloud.
Gateways store "opaque data" in the cloud as a black box. Cloud
apps try to access it, but cannot due to the proprietary format.

What is the default location constraint for AWS S3?
(AWS S3 documentation answers...)
Do you know your AWS basics?
12

Zenko Manages Bucket
Metadata Namespace
• Decoupled from the underlying data
location
• Native APIs used to store data
Control of Buckets location
• Provides the default storage location for
objects stored (PUT) into that Bucket
• Buckets can be managed across
multiple RINGs & Public Cloud regions
• Location Mapping is managed through
a configuration file
Native Cloud Data & Extended Location Control
13
Location:
RING-West
Location:
us-east-1
PUT Bucket1
LocationConstraint:
“RING-West”
PUT Bucket2
LocationConstraint:
“S3-US-East-1”
METADATA: Zenko Namespace
REST/Sproxyd S3 API
PUT Bucket3
LocationConstraint:
“Azure-US”
Blob Storage API
Location:
Windows.AzureStorage.US
DATA: via native drivers/APIs
Innovation: Extended Location
Control across clouds

• Replication supports copy of objects across RINGs
▪ Externalized via the AWS S3 API for Cross-Region Replication (CRR)
▪Configure replication on a source Bucket and assign the Target Bucket in any cloud
▪ Objects are asynchronously replicated to target – with native key/data format
Backbeat - Policy-Based Replication Across Clouds
14
S3-us-east-1
RING1
OBJ1
OBJ1
AWS S3 Namespace
PUT PUT
Backbeat (Async Replication)
Amazon CloudFront
Amazon EC2
Amazon EMR
S3 BUCKET CRR API
Zenko Namespace

15
Backbeat Architecture (Details)

Clueso - Metadata Search
16
Federated Search on Metadata
Search across cloud namespaces independent of
location
Applications can attach extended metadata as
key/value pairs through optional S3 “x-amz-meta-”
headers
Search on one or multiple attributes, fuzzy
searches
Programmatic Access to Search
Accessed through RESTful API with attribute
parameters
Natural fit with S3 semantics, e.g
GET /bucketName?search&attributeKey=’attributeValue’
Clueso Search Engine
Examples:
“SELECT key where ContentType = ‘PDF’”
“SELECT key where Title=‘Mathematics%’”

•Goals are to allow user to retrieve keys:
▪ By user metadata (x-amz-meta headers and tags) which are not
predefined
▪ By object owner
▪ Created before or after some date, or range of dates
▪ Using multiple attributes using AND and OR conditions
▪ Fuzzy search on attribute values, such as partial strings and wildcards
such as with SQL “LIKE” statements
▪ Programmatic access to search functionality via an API
Clueso - Overview and Goals

Why Spark?
• Distributed Search by design
• SQL Semantics
▪ Not a separate database
▪ No need to pre-index (Juliet can create new attributes on the fly)
▪ No need to store separate indexes
• Fast Processing - 100x faster than Hadoop
• Flexible Engine
▪ Can use same Spark cluster to do Athena-like search on the actual data
• Largest Open Source community in big data - over 1000 contributors
▪ Ecosystem of contributors and companies
18

19
Open Source Code on Github under Apache 2.0 License

Zenko Open Source: Features & Capabilities
METADATA
DATA STORAGE
DMD REST/Sproxyd AWS S3 API AZURE BLOB API
Shared Local
Storage
S3 API
APP
METADATA
APP
S3 CALLS
Zenko Open Source
S3 API—Single API set and 360° access to any cloud
Native format—Data written through Zenko is stored
in the native format of the target cloud storage and
can be read directly, without going through Zenko.
Project Backbeat for data workflow—Policy-based
data management engine
Project Clueso for metadata search— Apache
Spark-based metadata search tool for optimal data
insight
HA/Failover – Deployed as dual-containers managed
by Docker Swarm for HA, but not full scale-out
Simple Security –single-tenant credentials managed
locally
S3 API
S3 CALLS
METADATA DATA
CLUESO
Metadata Search
Bucket LOCATION
BACKBEAT
Data Policy Engine
Bucket LOCATION
CRR/DATADATA

Data Storage Back-ends
- Existing interest in integration of NAS filers
- Other public clouds: Oracle, Backblaze and OpenStack based
Clueso Search Plugins
- GDPR discovery (find data that is not compliant)
- Data analytics
- eDiscovery for legal documents
Backbeat plugins for Data Management & Mobility
- Migration
- Compliance
21
Ecosystem Extensions: Community & Partner Driven

Community Meetups
• Initiated prior to our S3 Server launch
• Participating at open source events
for Docker, Nodejs, etc...
Developer “Hackathons”
• Paris and San Francisco in 2015 & 2016
• Co-sponsoring with partners –
focused on a specific project goal (e.g., IP Drives, S3 API)
• Great for building visibility & community participation
Next hackathon to develop creative extensions to Zenko!
• 42 Silicon Valley (free coding university)
• August 14-18 in Fremont, CA: https://www.zenko.io/hackathon
22
Building a Developer community

Zenko Installation & Portal
Demo
23

• Open Source Community Edition –July 11th
▪ Available through github and docker
▪ Common API through S3 API
▪ Dual Server HA configuration (non scale-out)
▪ Backend store as volumes, AWS S3
• Open Source Community Edition – September
▪ Clueso Metadata Search
▪ Backbeat Data Workflow
▪ and MS Azure
• Enterprise Edition (EE) – target beginning 2018
▪ Scale-out solution
▪ Enterprise support
S3
SearchEngine
File
ManagementUI
Search
BACKBEAT
Data Management
Engine
Location Control
Release Plans
24

Zenko EE: Enterprise Security, File & Scale-Out
METADATA: HA/Consistency Cluster
DATA STORAGE
DMD REST/Sproxyd AWS S3 API AZURE BLOB API
Shared Local
Storage
S3 API
APP APP
DATA
CLUESO
Metadata Search
S3 CALLS
Zenko Enterprise Edition
Multi-tenancy & Enterprise Security – Full IAM support of
Multi Accounts, Users, Groups, Policies & Single-Sign On
(SSO) to AD & LDAP security servers
Scale-Out – N-Way scale-out to any number of servers to
deliver capacity AND performance for massive workloads,
leverages the Metadata engine cluster from S3 Connector
File & S3 Shared Access – bi-directional file & object
sharing with NFS v4/v3 & SMB for legacy apps
Enables full Scale-Out for all key Zenko Services:
• Native Cloud Storage — Support for multiple public
clouds and Scality RING in native data format
• Backbeat for data workflow—Policy-based data
management engine
• Clueso for metadata search— Apache Spark-based
metadata search tool for optimal data insight
S3 API S3 API← Scale Out →
S3 CALLS
← Scale Out →
NFS/SMB
Google CS API
← Enterprise Apps →
Legacy
App
NFS/ / SMB
Identity & Access Management (IAM): SAML 2.0/SSO with AD/LDAP
BACKBEAT
Data Policy Engine
METADATA DATACRR/DATAMETADATA
LOCATION LOCATION LOCATION LOCATION

26
Getting involved with Zenko

How can I get involved with Zenko?
• Let us know what you do with Zenko stack!
▪ zenko@scality.com
▪ Get your project/company featured on the website in a quote
• Contribute tutorials
▪ Get a blogpost featuring your introduction of your tutorial
▪ Become part of our readTheDocs hosted documentation
• Contribute code
▪ It’s an opportunity to drive the roadmap with us !
▪ Join the team and be part of the Zenko craze !
▪ We have Contributing Guidelines on the GitHub repos, and we’ll answer your
questions via GitHub issues or our forum forum.scality.com
• Meet us at Microsoft Ignite, AWS Re:invent, Meetups...
▪ All info is on www.zenko.io
27

Email: zenko@scality.com
Thank You

Zenko: Enabling Data Control in a Multi-cloud World

More Related Content

What's hot (20)

Similar to Zenko: Enabling Data Control in a Multi-cloud World (20)

More from Scality (8)

Recently uploaded (20)

Zenko: Enabling Data Control in a Multi-cloud World