SlideShare a Scribd company logo
Thursday, August 3, 2017
Enabling Data
Control in a
Multi-Cloud
World
Giorgio
Regni
Scality Co-founder & CTO
Laure
Vergeron
Software Engineer
Zenko Webinar:
We’ll shed some light on a few questions:
• What does multicloud mean?
• How to leverage the efficiency of both public and private clouds?
• How can this multi-cloud data controller do search and discovery across
clouds?
• How can you get involved with Zenko?
1
Zenko Webinar: Multi-cloud, hybrid stores, open-source
Freedom to leverage multiple, different cloud infrastructures, private or
public
Acknowledge that each application has its own infrastructure requirements
that will evolve over time
Acknowledge that each cloud service has their own domain of expertise
and leverage the native services they offer
2
What does multicloud mean?
To you, what does multi-cloud enable?
(here is one of our answers)
Examples of Use-Cases For Multi-Cloud?
3
▪ Content Distribution
▪ Media companies have tens of thousands of movies, which they store on Private Cloud for
control. When it is time to publish a movie, it makes sense to copy it to a public cloud to use
its transcoding and CDN services.
▪ Compute Bursting
▪ Banks have to do risk analysis leveraging thousands of CPU every night. These intense
computation only run for a few hours. Rather than having idle servers for the rest of the day, it
makes sense to use Public Cloud services for the computation
▪ Analytics
▪ E-commerce company do more and more machine learning on their very large data lake.
Rather than setting up Hadoop infrastructures in-house, the company can copy just a data set
to an Hadoop cloud, compute the appropriate algorithm, and get back the result and destroy
the cloud copy of the data to save on storage cost.
▪ Long-term Archival / cold storage
▪ While storing data which is regularly accessed is cheaper in a private cloud, long term archive
of never accessed data is cheaper in long term archive cloud offering. Automatic archival of
never accessed data would save a lot of money.
Examples of Use-Cases for Multi-Cloud
5
 The Zenko Vision
Control and Freedom for data in a
Multi-Cloud world.
• Single Interface to any Cloud
▪ S3 API as a single API set to any cloud
• Allow reuse in the Cloud
▪ Maintain the native cloud format
• Always know your data and where it is
▪ Metadata search
• Trigger actions based on data
▪ Data Workflow to manage replication, location
The Zenko Multi-Cloud Data Controller
Native
format
Data
Management
Data Insight
One S3 API for
any cloud
6
• Allow reuse in the Cloud
▪ Maintain the native cloud format
• Full compatibility with S3
▪ IAM, policies, request syntax...
The Zenko Multi-Cloud Data Controller
7
What was Zenko’s CloudServer previous name?
(check your answer on DockerHub)
Zenko is not Scality’s first open-source project...
8
Open Source Scality CloudServer Adoption
▪Launched June 2016
▪Open-source implementation
of AWS S3 API
▪Code available on Github
under Apache 2.0 license
▪Packaged in Docker
container for easy
deployment
▪Seamless upgrade to S3
Connector for the RING
Now Over
700,000
"Scality provides our backend storage and gives us a single interface for
developers to code within any cloud on a common API set. With Scality,
we can write an application once and deploy anywhere on any cloud.”
Mathias Herberts, co-founder and CTO at Cityzen Data
…Developers Are Seeing the Benefits
10
“We are big users of Docker in our production environment and implemented
the Docker version of Scality S3 Server. It is efficient, secure with encryption
and S3 authentication, and very easy to maintain.”
Christian Patry, System Engineer, BlueSolutions by Polyconseil
• Amazon S3 API is the defacto standard
▪ Extended to support multiple cloud backends
▪ Provides a full featured S3 interface independent of the backend cloud stores ability
• Native Cloud Data Format
▪ Data stored in cloud must be accessible in standard format
▪ Access via standard keys and methods - enables use by cloud services without change
• Highly Available Cloud Service
▪ Integrated in HA services through Docker
▪ If higher level of availability required then extensible to RING
CloudServer – Amazon S3 & Native Format
11
S3 API
?OPAQUE
DATA
CLOUD
GATEWAY
?
Many gateway products store data in a closed "black box" format
that is unreadable by native cloud services and apps in the cloud.
Gateways store "opaque data" in the cloud as a black box. Cloud
apps try to access it, but cannot due to the proprietary format.
What is the default location constraint for AWS S3?
(AWS S3 documentation answers...)
Do you know your AWS basics?
12
Zenko Manages Bucket
Metadata Namespace
• Decoupled from the underlying data
location
• Native APIs used to store data
Control of Buckets location
• Provides the default storage location for
objects stored (PUT) into that Bucket
• Buckets can be managed across
multiple RINGs & Public Cloud regions
• Location Mapping is managed through
a configuration file
Native Cloud Data & Extended Location Control
13
Location:
RING-West
Location:
us-east-1
PUT Bucket1
LocationConstraint:
“RING-West”
PUT Bucket2
LocationConstraint:
“S3-US-East-1”
METADATA: Zenko Namespace
REST/Sproxyd S3 API
PUT Bucket3
LocationConstraint:
“Azure-US”
Blob Storage API
Location:
Windows.AzureStorage.US
DATA: via native drivers/APIs
Innovation: Extended Location
Control across clouds
• Replication supports copy of objects across RINGs
▪ Externalized via the AWS S3 API for Cross-Region Replication (CRR)
▪Configure replication on a source Bucket and assign the Target Bucket in any cloud
▪ Objects are asynchronously replicated to target – with native key/data format
Backbeat - Policy-Based Replication Across Clouds
14
S3-us-east-1
RING1
OBJ1
OBJ1
AWS S3 Namespace
PUT PUT
Backbeat (Async Replication)
Amazon CloudFront
Amazon EC2
Amazon EMR
S3 BUCKET CRR API
Zenko Namespace
15
Backbeat Architecture (Details)
Clueso - Metadata Search
16
Federated Search on Metadata
Search across cloud namespaces independent of
location
Applications can attach extended metadata as
key/value pairs through optional S3 “x-amz-meta-”
headers
Search on one or multiple attributes, fuzzy
searches
Programmatic Access to Search
Accessed through RESTful API with attribute
parameters
Natural fit with S3 semantics, e.g
GET /bucketName?search&attributeKey=’attributeValue’
Clueso Search Engine
Examples:
“SELECT key where ContentType = ‘PDF’”
“SELECT key where Title=‘Mathematics%’”
•Goals are to allow user to retrieve keys:
▪ By user metadata (x-amz-meta headers and tags) which are not
predefined
▪ By object owner
▪ Created before or after some date, or range of dates
▪ Using multiple attributes using AND and OR conditions
▪ Fuzzy search on attribute values, such as partial strings and wildcards
such as with SQL “LIKE” statements
▪ Programmatic access to search functionality via an API
Clueso - Overview and Goals
Why Spark?
• Distributed Search by design
• SQL Semantics
▪ Not a separate database
▪ No need to pre-index (Juliet can create new attributes on the fly)
▪ No need to store separate indexes
• Fast Processing - 100x faster than Hadoop
• Flexible Engine
▪ Can use same Spark cluster to do Athena-like search on the actual data
• Largest Open Source community in big data - over 1000 contributors
▪ Ecosystem of contributors and companies
18
19
Open Source Code on Github under Apache 2.0 License
Zenko Open Source: Features & Capabilities
METADATA
DATA STORAGE
DMD REST/Sproxyd AWS S3 API AZURE BLOB API
Shared Local
Storage
S3 API
APP
METADATA
APP
S3 CALLS
Zenko Open Source
S3 API—Single API set and 360° access to any cloud
Native format—Data written through Zenko is stored
in the native format of the target cloud storage and
can be read directly, without going through Zenko.
Project Backbeat for data workflow—Policy-based
data management engine
Project Clueso for metadata search— Apache
Spark-based metadata search tool for optimal data
insight
HA/Failover – Deployed as dual-containers managed
by Docker Swarm for HA, but not full scale-out
Simple Security –single-tenant credentials managed
locally
S3 API
S3 CALLS
METADATA DATA
CLUESO
Metadata Search
Bucket LOCATION
BACKBEAT
Data Policy Engine
Bucket LOCATION
CRR/DATADATA
Data Storage Back-ends
- Existing interest in integration of NAS filers
- Other public clouds: Oracle, Backblaze and OpenStack based
Clueso Search Plugins
- GDPR discovery (find data that is not compliant)
- Data analytics
- eDiscovery for legal documents
Backbeat plugins for Data Management & Mobility
- Migration
- Compliance
21
Ecosystem Extensions: Community & Partner Driven
Community Meetups
• Initiated prior to our S3 Server launch
• Participating at open source events
for Docker, Nodejs, etc...
Developer “Hackathons”
• Paris and San Francisco in 2015 & 2016
• Co-sponsoring with partners –
focused on a specific project goal (e.g., IP Drives, S3 API)
• Great for building visibility & community participation
Next hackathon to develop creative extensions to Zenko!
• 42 Silicon Valley (free coding university)
• August 14-18 in Fremont, CA: https://www.zenko.io/hackathon
22
Building a Developer community
Zenko Installation & Portal
Demo
23
• Open Source Community Edition –July 11th
▪ Available through github and docker
▪ Common API through S3 API
▪ Dual Server HA configuration (non scale-out)
▪ Backend store as volumes, AWS S3
• Open Source Community Edition – September
▪ Clueso Metadata Search
▪ Backbeat Data Workflow
▪ and MS Azure
• Enterprise Edition (EE) – target beginning 2018
▪ Scale-out solution
▪ Enterprise support
S3
SearchEngine
File
ManagementUI
Search
BACKBEAT
Data Management
Engine
Location Control
Release Plans
24
Zenko EE: Enterprise Security, File & Scale-Out
METADATA: HA/Consistency Cluster
DATA STORAGE
DMD REST/Sproxyd AWS S3 API AZURE BLOB API
Shared Local
Storage
S3 API
APP APP
DATA
CLUESO
Metadata Search
S3 CALLS
Zenko Enterprise Edition
Multi-tenancy & Enterprise Security – Full IAM support of
Multi Accounts, Users, Groups, Policies & Single-Sign On
(SSO) to AD & LDAP security servers
Scale-Out – N-Way scale-out to any number of servers to
deliver capacity AND performance for massive workloads,
leverages the Metadata engine cluster from S3 Connector
File & S3 Shared Access – bi-directional file & object
sharing with NFS v4/v3 & SMB for legacy apps
Enables full Scale-Out for all key Zenko Services:
• Native Cloud Storage — Support for multiple public
clouds and Scality RING in native data format
• Backbeat for data workflow—Policy-based data
management engine
• Clueso for metadata search— Apache Spark-based
metadata search tool for optimal data insight
S3 API S3 API← Scale Out →
S3 CALLS
← Scale Out →
NFS/SMB
Google CS API
← Enterprise Apps →
Legacy
App
NFS/ / SMB
Identity & Access Management (IAM): SAML 2.0/SSO with AD/LDAP
BACKBEAT
Data Policy Engine
METADATA DATACRR/DATAMETADATA
LOCATION LOCATION LOCATION LOCATION
26
Getting involved with Zenko
How can I get involved with Zenko?
• Let us know what you do with Zenko stack!
▪ zenko@scality.com
▪ Get your project/company featured on the website in a quote
• Contribute tutorials
▪ Get a blogpost featuring your introduction of your tutorial
▪ Become part of our readTheDocs hosted documentation
• Contribute code
▪ It’s an opportunity to drive the roadmap with us !
▪ Join the team and be part of the Zenko craze !
▪ We have Contributing Guidelines on the GitHub repos, and we’ll answer your
questions via GitHub issues or our forum forum.scality.com
• Meet us at Microsoft Ignite, AWS Re:invent, Meetups...
▪ All info is on www.zenko.io
27
Email: zenko@scality.com
Thank You
Email: zenko@scality.com
Thank You

More Related Content

PPTX
2017 Hackathon Scality & 42 School
PPTX
Leader in Cloud and Object Storage for Service Providers
PPTX
Wally MacDermid presents Scality Connect for Microsoft Azure at Microsoft Ign...
PPTX
Superior Streaming and CDN Solutions: Cloud Storage Revolutionizes Digital Media
PPTX
Scality Holberton Interview Training
PDF
Zenko @Cloud Native Foundation London Meetup March 6th 2018
PDF
Search for All with Elastic Enterprise Search
PDF
Building a reliable and cost effect logging system at Box
2017 Hackathon Scality & 42 School
Leader in Cloud and Object Storage for Service Providers
Wally MacDermid presents Scality Connect for Microsoft Azure at Microsoft Ign...
Superior Streaming and CDN Solutions: Cloud Storage Revolutionizes Digital Media
Scality Holberton Interview Training
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Search for All with Elastic Enterprise Search
Building a reliable and cost effect logging system at Box

What's hot (20)

PDF
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
PPTX
Reblaze Case Study on GCP
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
PDF
Google Cloud Dataflow
PDF
Achieving cyber mission assurance with near real-time impact
PDF
Search for All with Elastic Workplace Search
PPTX
Google cloud Study Essentials
PDF
Monitoring docker, k8s and your applications with the elastic stack
PPTX
Netflix Big Data Paris 2017
PDF
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
PDF
Introducing the Hub for Data Orchestration
PPTX
Kubecon - Democratizing my sql_ cloud managed to k8s managed (1)
PDF
Leveraging Scala and Akka to build NSDb
PDF
RedisConf17 - Real-time Intelligence with Redis-ML and Apache Spark
PDF
Turning Evidence into Insights: How NCIS Leverages Elastic
PPTX
Cloudian HyperStore Operating Environment
PPTX
30 daysofcloud - 2
PDF
Log Monitoring and Anomaly Detection at Scale at ORNL
PDF
What’s Evolving in the Elastic Stack
PDF
Elastic @ John Deere
Scalable Data Management for Kafka and Beyond | Dan Rice, BigID
Reblaze Case Study on GCP
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Google Cloud Dataflow
Achieving cyber mission assurance with near real-time impact
Search for All with Elastic Workplace Search
Google cloud Study Essentials
Monitoring docker, k8s and your applications with the elastic stack
Netflix Big Data Paris 2017
Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...
Introducing the Hub for Data Orchestration
Kubecon - Democratizing my sql_ cloud managed to k8s managed (1)
Leveraging Scala and Akka to build NSDb
RedisConf17 - Real-time Intelligence with Redis-ML and Apache Spark
Turning Evidence into Insights: How NCIS Leverages Elastic
Cloudian HyperStore Operating Environment
30 daysofcloud - 2
Log Monitoring and Anomaly Detection at Scale at ORNL
What’s Evolving in the Elastic Stack
Elastic @ John Deere
Ad

Similar to Zenko: Enabling Data Control in a Multi-cloud World (20)

PDF
Docker Meetup Tokyo #23 - Zenko Open Source Multi-Cloud Data Controller - Lau...
PDF
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
PDF
Pdf tech deep dive 42 paris
PDF
Map Services on Amazon AWS, Microsoft Azure and Google Cloud Platform
PPTX
Amazon Web Services OverView
PDF
Introduction to Amazon Web Services
PPTX
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
PDF
AWS CSAA Certification - Mindmaps and StudyNotes
PDF
An intro to Amazon Web Services (AWS)
PPTX
Integrating with Aws s3
PPTX
amazon web servics in the cloud aws and its categories compute cloud and stor...
PPTX
cse40822-amazon.pptx
PPT
Jim Liddle, SMEStorage, "Need For Multi Cloud Storage API"
PDF
Object Storage: How Can it Work for You
PPTX
Achieve big data analytic platform with lambda architecture on cloud
PDF
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaS
PDF
Data Analytics on AWS
PPTX
Tổng quan về AWS cực hay
PDF
Amazon Elastic Map Reduce - Ian Meyers
Docker Meetup Tokyo #23 - Zenko Open Source Multi-Cloud Data Controller - Lau...
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Pdf tech deep dive 42 paris
Map Services on Amazon AWS, Microsoft Azure and Google Cloud Platform
Amazon Web Services OverView
Introduction to Amazon Web Services
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
AWS CSAA Certification - Mindmaps and StudyNotes
An intro to Amazon Web Services (AWS)
Integrating with Aws s3
amazon web servics in the cloud aws and its categories compute cloud and stor...
cse40822-amazon.pptx
Jim Liddle, SMEStorage, "Need For Multi Cloud Storage API"
Object Storage: How Can it Work for You
Achieve big data analytic platform with lambda architecture on cloud
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaS
Data Analytics on AWS
Tổng quan về AWS cực hay
Amazon Elastic Map Reduce - Ian Meyers
Ad

More from Scality (8)

PDF
QuadIron An open source library for number theoretic transform-based erasure ...
PDF
Introducing MetalK8s, An Opinionated Kubernetes Implementation
PPTX
Storage that Powers Digital Business: Scality for Enterprise Backup
PPTX
Scality medical imaging storage
PPTX
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
PDF
Hackathon scality holberton seagate 2016 v5
PDF
S3 Server Hackathon Presented by S3 Server, a Scality Product, Seagate and Ho...
PPTX
Scality S3 Server: Node js Meetup Presentation
QuadIron An open source library for number theoretic transform-based erasure ...
Introducing MetalK8s, An Opinionated Kubernetes Implementation
Storage that Powers Digital Business: Scality for Enterprise Backup
Scality medical imaging storage
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
Hackathon scality holberton seagate 2016 v5
S3 Server Hackathon Presented by S3 Server, a Scality Product, Seagate and Ho...
Scality S3 Server: Node js Meetup Presentation

Recently uploaded (20)

PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
innovation process that make everything different.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PPTX
E -tech empowerment technologies PowerPoint
PPTX
SAP Ariba Sourcing PPT for learning material
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PDF
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
PPTX
Internet___Basics___Styled_ presentation
PDF
The Internet -By the Numbers, Sri Lanka Edition
PPTX
Digital Literacy And Online Safety on internet
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
Introduction to Information and Communication Technology
PPTX
international classification of diseases ICD-10 review PPT.pptx
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PPTX
Funds Management Learning Material for Beg
An introduction to the IFRS (ISSB) Stndards.pdf
innovation process that make everything different.pptx
Paper PDF World Game (s) Great Redesign.pdf
Decoding a Decade: 10 Years of Applied CTI Discipline
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
E -tech empowerment technologies PowerPoint
SAP Ariba Sourcing PPT for learning material
RPKI Status Update, presented by Makito Lay at IDNOG 10
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
Internet___Basics___Styled_ presentation
The Internet -By the Numbers, Sri Lanka Edition
Digital Literacy And Online Safety on internet
Cloud-Scale Log Monitoring _ Datadog.pdf
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Tenda Login Guide: Access Your Router in 5 Easy Steps
Power Point - Lesson 3_2.pptx grad school presentation
Introduction to Information and Communication Technology
international classification of diseases ICD-10 review PPT.pptx
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
Funds Management Learning Material for Beg

Zenko: Enabling Data Control in a Multi-cloud World

  • 1. Thursday, August 3, 2017 Enabling Data Control in a Multi-Cloud World Giorgio Regni Scality Co-founder & CTO Laure Vergeron Software Engineer Zenko Webinar:
  • 2. We’ll shed some light on a few questions: • What does multicloud mean? • How to leverage the efficiency of both public and private clouds? • How can this multi-cloud data controller do search and discovery across clouds? • How can you get involved with Zenko? 1 Zenko Webinar: Multi-cloud, hybrid stores, open-source
  • 3. Freedom to leverage multiple, different cloud infrastructures, private or public Acknowledge that each application has its own infrastructure requirements that will evolve over time Acknowledge that each cloud service has their own domain of expertise and leverage the native services they offer 2 What does multicloud mean?
  • 4. To you, what does multi-cloud enable? (here is one of our answers) Examples of Use-Cases For Multi-Cloud? 3
  • 5. ▪ Content Distribution ▪ Media companies have tens of thousands of movies, which they store on Private Cloud for control. When it is time to publish a movie, it makes sense to copy it to a public cloud to use its transcoding and CDN services. ▪ Compute Bursting ▪ Banks have to do risk analysis leveraging thousands of CPU every night. These intense computation only run for a few hours. Rather than having idle servers for the rest of the day, it makes sense to use Public Cloud services for the computation ▪ Analytics ▪ E-commerce company do more and more machine learning on their very large data lake. Rather than setting up Hadoop infrastructures in-house, the company can copy just a data set to an Hadoop cloud, compute the appropriate algorithm, and get back the result and destroy the cloud copy of the data to save on storage cost. ▪ Long-term Archival / cold storage ▪ While storing data which is regularly accessed is cheaper in a private cloud, long term archive of never accessed data is cheaper in long term archive cloud offering. Automatic archival of never accessed data would save a lot of money. Examples of Use-Cases for Multi-Cloud
  • 6. 5  The Zenko Vision Control and Freedom for data in a Multi-Cloud world.
  • 7. • Single Interface to any Cloud ▪ S3 API as a single API set to any cloud • Allow reuse in the Cloud ▪ Maintain the native cloud format • Always know your data and where it is ▪ Metadata search • Trigger actions based on data ▪ Data Workflow to manage replication, location The Zenko Multi-Cloud Data Controller Native format Data Management Data Insight One S3 API for any cloud 6
  • 8. • Allow reuse in the Cloud ▪ Maintain the native cloud format • Full compatibility with S3 ▪ IAM, policies, request syntax... The Zenko Multi-Cloud Data Controller 7
  • 9. What was Zenko’s CloudServer previous name? (check your answer on DockerHub) Zenko is not Scality’s first open-source project... 8
  • 10. Open Source Scality CloudServer Adoption ▪Launched June 2016 ▪Open-source implementation of AWS S3 API ▪Code available on Github under Apache 2.0 license ▪Packaged in Docker container for easy deployment ▪Seamless upgrade to S3 Connector for the RING Now Over 700,000
  • 11. "Scality provides our backend storage and gives us a single interface for developers to code within any cloud on a common API set. With Scality, we can write an application once and deploy anywhere on any cloud.” Mathias Herberts, co-founder and CTO at Cityzen Data …Developers Are Seeing the Benefits 10 “We are big users of Docker in our production environment and implemented the Docker version of Scality S3 Server. It is efficient, secure with encryption and S3 authentication, and very easy to maintain.” Christian Patry, System Engineer, BlueSolutions by Polyconseil
  • 12. • Amazon S3 API is the defacto standard ▪ Extended to support multiple cloud backends ▪ Provides a full featured S3 interface independent of the backend cloud stores ability • Native Cloud Data Format ▪ Data stored in cloud must be accessible in standard format ▪ Access via standard keys and methods - enables use by cloud services without change • Highly Available Cloud Service ▪ Integrated in HA services through Docker ▪ If higher level of availability required then extensible to RING CloudServer – Amazon S3 & Native Format 11 S3 API ?OPAQUE DATA CLOUD GATEWAY ? Many gateway products store data in a closed "black box" format that is unreadable by native cloud services and apps in the cloud. Gateways store "opaque data" in the cloud as a black box. Cloud apps try to access it, but cannot due to the proprietary format.
  • 13. What is the default location constraint for AWS S3? (AWS S3 documentation answers...) Do you know your AWS basics? 12
  • 14. Zenko Manages Bucket Metadata Namespace • Decoupled from the underlying data location • Native APIs used to store data Control of Buckets location • Provides the default storage location for objects stored (PUT) into that Bucket • Buckets can be managed across multiple RINGs & Public Cloud regions • Location Mapping is managed through a configuration file Native Cloud Data & Extended Location Control 13 Location: RING-West Location: us-east-1 PUT Bucket1 LocationConstraint: “RING-West” PUT Bucket2 LocationConstraint: “S3-US-East-1” METADATA: Zenko Namespace REST/Sproxyd S3 API PUT Bucket3 LocationConstraint: “Azure-US” Blob Storage API Location: Windows.AzureStorage.US DATA: via native drivers/APIs Innovation: Extended Location Control across clouds
  • 15. • Replication supports copy of objects across RINGs ▪ Externalized via the AWS S3 API for Cross-Region Replication (CRR) ▪Configure replication on a source Bucket and assign the Target Bucket in any cloud ▪ Objects are asynchronously replicated to target – with native key/data format Backbeat - Policy-Based Replication Across Clouds 14 S3-us-east-1 RING1 OBJ1 OBJ1 AWS S3 Namespace PUT PUT Backbeat (Async Replication) Amazon CloudFront Amazon EC2 Amazon EMR S3 BUCKET CRR API Zenko Namespace
  • 17. Clueso - Metadata Search 16 Federated Search on Metadata Search across cloud namespaces independent of location Applications can attach extended metadata as key/value pairs through optional S3 “x-amz-meta-” headers Search on one or multiple attributes, fuzzy searches Programmatic Access to Search Accessed through RESTful API with attribute parameters Natural fit with S3 semantics, e.g GET /bucketName?search&attributeKey=’attributeValue’ Clueso Search Engine Examples: “SELECT key where ContentType = ‘PDF’” “SELECT key where Title=‘Mathematics%’”
  • 18. •Goals are to allow user to retrieve keys: ▪ By user metadata (x-amz-meta headers and tags) which are not predefined ▪ By object owner ▪ Created before or after some date, or range of dates ▪ Using multiple attributes using AND and OR conditions ▪ Fuzzy search on attribute values, such as partial strings and wildcards such as with SQL “LIKE” statements ▪ Programmatic access to search functionality via an API Clueso - Overview and Goals
  • 19. Why Spark? • Distributed Search by design • SQL Semantics ▪ Not a separate database ▪ No need to pre-index (Juliet can create new attributes on the fly) ▪ No need to store separate indexes • Fast Processing - 100x faster than Hadoop • Flexible Engine ▪ Can use same Spark cluster to do Athena-like search on the actual data • Largest Open Source community in big data - over 1000 contributors ▪ Ecosystem of contributors and companies 18
  • 20. 19 Open Source Code on Github under Apache 2.0 License
  • 21. Zenko Open Source: Features & Capabilities METADATA DATA STORAGE DMD REST/Sproxyd AWS S3 API AZURE BLOB API Shared Local Storage S3 API APP METADATA APP S3 CALLS Zenko Open Source S3 API—Single API set and 360° access to any cloud Native format—Data written through Zenko is stored in the native format of the target cloud storage and can be read directly, without going through Zenko. Project Backbeat for data workflow—Policy-based data management engine Project Clueso for metadata search— Apache Spark-based metadata search tool for optimal data insight HA/Failover – Deployed as dual-containers managed by Docker Swarm for HA, but not full scale-out Simple Security –single-tenant credentials managed locally S3 API S3 CALLS METADATA DATA CLUESO Metadata Search Bucket LOCATION BACKBEAT Data Policy Engine Bucket LOCATION CRR/DATADATA
  • 22. Data Storage Back-ends - Existing interest in integration of NAS filers - Other public clouds: Oracle, Backblaze and OpenStack based Clueso Search Plugins - GDPR discovery (find data that is not compliant) - Data analytics - eDiscovery for legal documents Backbeat plugins for Data Management & Mobility - Migration - Compliance 21 Ecosystem Extensions: Community & Partner Driven
  • 23. Community Meetups • Initiated prior to our S3 Server launch • Participating at open source events for Docker, Nodejs, etc... Developer “Hackathons” • Paris and San Francisco in 2015 & 2016 • Co-sponsoring with partners – focused on a specific project goal (e.g., IP Drives, S3 API) • Great for building visibility & community participation Next hackathon to develop creative extensions to Zenko! • 42 Silicon Valley (free coding university) • August 14-18 in Fremont, CA: https://www.zenko.io/hackathon 22 Building a Developer community
  • 24. Zenko Installation & Portal Demo 23
  • 25. • Open Source Community Edition –July 11th ▪ Available through github and docker ▪ Common API through S3 API ▪ Dual Server HA configuration (non scale-out) ▪ Backend store as volumes, AWS S3 • Open Source Community Edition – September ▪ Clueso Metadata Search ▪ Backbeat Data Workflow ▪ and MS Azure • Enterprise Edition (EE) – target beginning 2018 ▪ Scale-out solution ▪ Enterprise support S3 SearchEngine File ManagementUI Search BACKBEAT Data Management Engine Location Control Release Plans 24
  • 26. Zenko EE: Enterprise Security, File & Scale-Out METADATA: HA/Consistency Cluster DATA STORAGE DMD REST/Sproxyd AWS S3 API AZURE BLOB API Shared Local Storage S3 API APP APP DATA CLUESO Metadata Search S3 CALLS Zenko Enterprise Edition Multi-tenancy & Enterprise Security – Full IAM support of Multi Accounts, Users, Groups, Policies & Single-Sign On (SSO) to AD & LDAP security servers Scale-Out – N-Way scale-out to any number of servers to deliver capacity AND performance for massive workloads, leverages the Metadata engine cluster from S3 Connector File & S3 Shared Access – bi-directional file & object sharing with NFS v4/v3 & SMB for legacy apps Enables full Scale-Out for all key Zenko Services: • Native Cloud Storage — Support for multiple public clouds and Scality RING in native data format • Backbeat for data workflow—Policy-based data management engine • Clueso for metadata search— Apache Spark-based metadata search tool for optimal data insight S3 API S3 API← Scale Out → S3 CALLS ← Scale Out → NFS/SMB Google CS API ← Enterprise Apps → Legacy App NFS/ / SMB Identity & Access Management (IAM): SAML 2.0/SSO with AD/LDAP BACKBEAT Data Policy Engine METADATA DATACRR/DATAMETADATA LOCATION LOCATION LOCATION LOCATION
  • 28. How can I get involved with Zenko? • Let us know what you do with Zenko stack! ▪ zenko@scality.com ▪ Get your project/company featured on the website in a quote • Contribute tutorials ▪ Get a blogpost featuring your introduction of your tutorial ▪ Become part of our readTheDocs hosted documentation • Contribute code ▪ It’s an opportunity to drive the roadmap with us ! ▪ Join the team and be part of the Zenko craze ! ▪ We have Contributing Guidelines on the GitHub repos, and we’ll answer your questions via GitHub issues or our forum forum.scality.com • Meet us at Microsoft Ignite, AWS Re:invent, Meetups... ▪ All info is on www.zenko.io 27