SlideShare a Scribd company logo
TECHNOLOGY IN BRIEF




                                      THE OBJECT EVOLUTION
           EMC OBJECT-BASED STORAGE FOR ACTIVE ARCHIVING AND
                       APPLICATION DEVELOPMENT
                                               NOVEMBER 2012

                  A few years ago, object-based storage made a huge splash on-premise with the
                  promise of meaningful data relationships, information accessibility and strong
                  compliance. It remains an important component for information management
                  based on compliance and single-tenant architectures. However, the evolution of
                  object-based storage has big implications for the cloud and unstructured data:
 new approaches to active archiving, web/mobile application development and a changing model for
 cloud storage service providers.
 Object storage is optimal for the web. It has a very different architecture from file systems, which
 are frankly overkill for most cloud storage. On-premise can be a different story; having data close to
 hand under single-tenant access control is right for some data storage. But on-premise stored data
 requires that the enterprise maintain a primary data center, a cold data center for DR, replication,
 continuous data protection, and so on. Given the right set of needs this is a fine trade-off of course
 and we certainly do not counsel people to get rid of their internal data centers and redundant
 systems.
 However, cloud-based object architecture offers big benefits for storing unstructured data for
 active archiving, global access to data, fast application development and much lower cost compared
 to the high computing and data protection costs of on-premise NAS. EMC has engineered Atmos to
 provide these capabilities and many more as a massively scalable, distributed cloud-based system.
 In this Technology in Brief we will examine the fast-changing world of archiving and development
 on the web, and how object-based storage is the best way to go for these monumental tasks.

 When Object Trumps File
 The go-to architecture for unstructured data has traditionally been an application-centric system
 containing the operating system, the application, and a NAS filer using hierarchical file architecture.
 This infrastructure works acceptably well in a slow-growth, consistent workload setting; although
 even then it is far too easy to add complexity along with additional systems and filers.
 However, business needs have evolved far beyond this sleepy storage model. Unstructured data
 now comprises a massive portion of large data growth, and hierarchical file systems are difficult to
 optimize and scale. For example, file system-based storage requires near-constant provisioning. As
 storage requests grow (which they inevitably do), IT administrators must manually provision
 storage to meet the expanded requirements. Meanwhile, large volume and spiky workloads make
 provisioning both “up” and “down” an expensive and time-consuming proposition.
 And difficult provisioning is hardly the only problem: siloed data protection with individual backup,
 replication and archiving applications steadily raises OPEX. Scaling is an issue as well. Large critical
 big data applications may warrant scale-out or scale-up file systems (which are challenges in and of


Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                      1 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557   www.tanejagroup.com
Technology in Brief



themselves). Most do not rate this architecture, and instead reside on poorly scalable systems. The
number of these systems grows as applications come online, making it even harder for IT and
application owners to administrate and for users to get the value from the application that they
need. This already difficult scenario gets even worse when NAS storage is used for what is
essentially a cloud use case, such as extending existing assets over the cloud.




                                      Figure: Traditional NAS infrastructure 3

In contrast to hierarchical file system-based storage silos, object-based storage opens up a whole
new range of dynamic functionality. Object-based storage assigns unique object IDs to access data
across all federated locations. This goes a long way towards eliminating traditional, time-
consuming storage management tasks like LUN creation and RAID groups. Active archives and
applications needing fast global access particularly benefit from global namespaces and location
transparency. The flat, universal namespace allows global access to stored content from anywhere
the distributed application runs. Applications can also efficiently associate metadata with stored
objects without using a dedicated database. Sharing vast storage resources means application
administrators do not need to modify application files. Object-based storage usually has elements of
file systems in order to handle processes like file archiving, but it is not founded on that
architecture and its drawbacks.
Object-based storage originally developed as a type of specialized NAS storage where the
hierarchical system was replaced with an object-oriented system that made file storage far more
secure and scalable. One of its most popular incarnations is still going strong today: Content-
Addressable Storage (CAS). A subset of object-oriented storage, CAS ensures there is only one ID for
any object. When the CAS object is retrieved, it can be hashed again and checked against its ID to
verify identity. CAS de-dupes at the object level for copy control.




Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                       2 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557    www.tanejagroup.com
Technology in Brief



TABLE: CONTRASTING FILE SYSTEMS WITH OBJECT STORAGE
        Characteristic                                  File                                       Object
                                    File    systems      implement      a            Object metadata is stored along
                                    centralized file layer metadata                  with the object data to avoid
                                    service that tracks directory                    metadata service bottlenecks. This
Metadata                            structures, permissions, and on-disk             ID may be used to also uniquely
                                    locations of files. All file requests            verify and validate the data being
                                    must access metadata first for                   stored.
                                    permission and file information.

                                    File    systems    have       built-in           Object storage provides a single
                                    namespace constraints for files and              flat namespace for objects.
                                    directories they can store and                   Replacing path and filenames with
Namespace                           manage. Hierarchical directory                   object identifiers makes the
                                    structures can become unwieldy,                  address space practically infinite
                                    performing poorly at navigating                  with very fast performance for
                                    large numbers of users or files.                 users and applications.
                                    File systems are designed to offer               Objects are inherently immutable
                                    in-place editing and updating of                 once stored under a unique ID,
                                    files using sophisticated, yet highly            and can be easily replicated and
                                    complex,          locking        and             accessed globally. Programming
Interaction
                                    synchronization mechanisms. These                for object storage leads to simpler,
                                    methods make it difficult to                     supportable, and more reliable
                                    distribute or extend file systems                programs.
                                    across multiple locations.
                                    File systems present a real                      Object stores are simple, clean and
                                    challenge for cloud-based archival               quick to access. Since objects are
                                    management         and       mobile              easily distributed, replicated, and
                                    application      delivery.     Poor              globally accessible in the cloud,
Cloud Applications                  scalability, lagging performance,                they are ideal for active global
                                    and        complex       application             archives and distributed mobile
                                    development make traditional file                applications.
                                    systems a poor choice for
                                    compelling new cloud usages.


Object-based storage both on-premise and in the cloud require certain key capabilities. On-premise
object storage has great benefits for local file storage including multiple application access, massive
scaling, high availability; and in some architectures, information governance as well.
   Multiple application access. Applications simultaneously leverage the same centralized
    object-based storage infrastructure. This enables local object-based storage to execute
    application-specific archiving management attributes for a complete chain of information
    custody.
   Massive scaling. Massive scaling is problematical with file-based archive solutions. As the file
    system reaches its maximum capacity, administrators must expand the entire system’s
    operating system, file system and application in order to scale the archive. By contrast, object-



Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                                      3 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557                   www.tanejagroup.com
Technology in Brief



    based storage can expand in an open fashion into multiple petabytes due to their flat address
    space.
   High availability. Object storage often archives data that has heavy retention and government
    requirements. In this environment, 5 9’s or higher availability (99.999%) is a necessity.
    Mirroring and parity help to protect availability; other beneficial features include self-healing,
    detecting and fixing soft corruptions in the background, and addressing hardware failures
    before they impact data availability.
   Information governance. A subset of object-based storage, Content-Addressable Storage
    (CAS) is purpose-built for long-term defensible retention of fixed files and data. As opposed to
    other archival storage methods like tape or monolithic “tar” files that bundle data up and/or
    move it offline, CAS stores data as objects that can be strictly and individually managed for
    governance and compliance and yet remain actively accessible on-line.

Best Practices: Object and the Cloud
We strongly support on-premise object storage such as CAS for local space savings, performance
and information governance. However, we find that object storage is roaring to life in the cloud,
where cloud-based active archiving and application development require highly distributed and
single namespace storage for unstructured content. These critical usage cases benefit far more from
object-based storage than they do from traditional file systems. Let’s look at best practices
architectural features for object-based storage in the cloud.

DATA AND METADATA
When data is stored as an object, a unique object identifier is created out of a single universal global
namespace. The object ID is retained by the client application and used to subsequently retrieve
that object. Objects can effectively live anywhere in the cloud-wide system without the storage
client needing to know about actual data locations, file system structures or LUN details. This
provides a complete location transparency that serves to reduce intentional storage management
and inherently supports globally distributed access by web and mobile applications.
Because of the location transparency provided by the object storage layer, objects can be
automatically load-balanced across nodes, and replicated within and across sites without
disrupting applications or users. Wide data distribution and federation can be managed through
systematic policies to meet various service level goals for access, high availability, protection, cost
and performance.
The object layer abstraction also provides a great benefit to applications that previously might have
had to be intimately storage aware to avoid running out of space or had to otherwise actively
manage data locations. Because applications written to leverage object storage don’t have to embed
rules or code specific knowledge of storage infrastructure details, they avoid having to be re-
written or re-architected for “changing” storage assignments as users spread, features expand, and
data sets grow.

MULTI-TENANCY
Secure multi-tenancy is a key requirement of cloud object storage, which should support two levels
of multi-tenancy: tenants and sub-tenants. Tenants are top-level entities that each has its own
access points, security controls and master storage policies. Tenants share nothing with other
tenants and are fully isolated. Every node gets assigned to a specific tenant; tenants do not share
nodes and therefore each tenant has its own dedicated access points and storage. Within a large
company, a tenant could be set up for independently managed divisions or subsidiaries. In a service
provider implementation, the tenant might be mapped to a broad storage service offering.


Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                       4 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557    www.tanejagroup.com
Technology in Brief



Sub-tenants are then created within each tenant with security controls and defined management
policies assigned by the tenant. Each sub-tenancy defines a distinct storage environment with
isolated management for its own users, object namespace, and defined shares. A sub-tenant within
a company might correspond to a department, while a storage provider's sub-tenant might track to
a specific client account.
This highly functional multi-tenancy capability makes it easy to create private sandboxes or
implement a global content delivery scheme. With some planning, this scheme could enable large
corporations to facilitate aggregating “big data” distributed across the enterprise.

ACCESS FROM ANYWHERE
As a cloud object storage service with a flat global namespace, an object can be accessed through
any site (although for performance, policies might strive to replicate objects to sites closer to where
they will be read). In addition, object storage for the cloud must present a broad range of access
methods including both web services and traditional file services.
REST (and SOAP) web services are key APIs. REST is the most common cloud storage access
method for browser and custom mobile applications. REST as a protocol over HTTP was designed
to optimize web-style remote access to “resources”, and is an ideal match to object storage where
each object can be easily treated as a REST resource.




                             Figure: Typical cloud-based object storage deployment

POLICY DRIVEN MANAGEMENT
A key benefit of object storage is the ability to use metadata to drive automatic data management
policies. Policies should support service levels, and should be triggered when data objects are
created, objects hit certain ages, or upon metadata updates. Policies can control data protection
operations including the number, type and target locations for replicas, inherent storage features
for striping, compression and de-duplication, retention locks and automatic deletion, and shifting
objects into different policies over time.


Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                       5 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557    www.tanejagroup.com
Technology in Brief



The policy mechanism should be highly flexible, targeting policies to any group of objects based on
both system and user defined metadata. Policies can be used to build service levels by defining the
amount of replication, implement archive rules for compliance, and optimize capacity and
performance as items age.

Primary Object Use Cases in the Cloud
Cloud-based archiving, particularly medical and file archiving, forms the primary use case for
object-based storage. Web application development is surging forward, and Archive-as-a-Service
and its providers round out the fastest-growing use cases.

PRIMARY USE CASE: ACTIVE ARCHIVES
Archived information is playing a more strategic role in workflows and business processes. On-
premise archiving is essentially static and used to reduce storage costs, improve operational
efficiency, retention and compliance, and enable the business to use archived data to make better
business decisions. Cloud-based archiving retains elements of these features but adds new dynamic
ones: instant access from any device, archive as a service and federating to private or public cloud.
Atmos provides both the static and dynamic features that massive active archives require.
   Federate to public or private clouds. Federation enables companies to treat on-premise and
    cloud object storage as a single efficient infrastructure. Companies may pool distributed storage
    assets including data, applications and policies to take full advantage of the cloud’s massive
    scalability and global access features. Federation also lowers cost and risk: application
    workloads run on cloud resources with a low execution cost, and if a cloud-based storage
    system goes down the distributed workload remains protected. Federation extends internal
    policies to cloud-based storage environments by applying existing policies and settings to
    cloud-based storage.
   Use metadata to drive business and storage decisions. We expect the use of metadata to
    expand quickly to directly feed business exploitation processes, as well as support more
    automatic and intelligent storage management decisions. A singly managed distributed system
    that maintains directly accessible object metadata yields rich support for business decisions.
    Object-based storage also enables IT to automate information lifecycle management across the
    entire distributed data store, not just by storage silo. Policies should be flexible enough to be set
    at the object, tenant or system levels, to automate archive decisions, set and manage retention,
    expiration, and disposition.
   Multi-tenancy for secure shared storage. Multiple applications can safely co-exist as separate
    tenants. Isolation by tenant protects security while enabling the sharing of system-wide
    resources and capacity. Multi-tenancy is also efficient since it is subscribed to a highly scalable
    pool of storage, which can flexibly up-scale and down-scale on demand.
   Massive scalability. Unstructured data storage is growing so fast that traditional storage
    systems are straining purchase, maintenance and management resources to the brink.
    Distributed object-based architecture yields near-limitless scale. Object also allows for
    automatic load balancing whenever new objects are stored, which protects high performance
    across the entire distributed system.
   Multi-site active/active. Multi-site active/active architecture is an important component of
    object-based storage, especially in the cloud. Cloud object storage systems span multiple sites
    and provide for multi-site direct access to objects through both synchronous and asynchronous
    replications. This model replicates between multiple storage nodes and sites, which not only
    increases distributed availability and content distribution, but also supports disaster recovery.



Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                       6 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557    www.tanejagroup.com
Technology in Brief



   Archive-as-a-service. The most agile and flexible way for IT to deliver archive services is with
    the cloud model of self-service portals. This model manages and meters utilization and
    bandwidth and supports third-party chargeback. Within an enterprise this flexibility and
    instant storage relieves users of the temptation of using commercial cloud services simply
    because they can get the storage they need fast – even though security might not be in place.
    This approach also enables ISVs and MSPs to extend archive requirements and offerings.
   Reduce manual tasks and provisioning across multiple archives. Cloud-based archives
    must be easy to set-up and for reliability and consistency must not require long or deep manual
    configuration. They should also automate underlying complexities including security, audit,
    retention, performance, and capacity growth. Atmos provides these features and more,
    relieving the cloud administrator of enormous burdens. Distributed systems may be managed
    as a single entity with policies to automate hundreds of management and data protection tasks.
    And perhaps the most important of all, object-based systems like Atmos offer massive
    scalability of capacity and performance thanks to their unique architecture.

FAST-GROWING USE CASE: WEB AND MOBILE APPLICATION DEVELOPMENT
Web and mobile applications development using unstructured data also has driving needs that
object-based cloud storage meets. Web application development requires quick access to storage
resources, test/dev environments capable of storing multiple copies of large data sets, and the
ability to test web applications in real-time online environments. These requirements are
understandably hard to achieve in traditional using file-based storage systems.
Applications written to leverage object storage won’t need to be rewritten or even taken offline as
the object storage seamlessly (or elastically) expands over time. Atmos provides the key
capabilities that web application development require, including location transparency, self-
managing storage and REST APIs.
   Enable instant access to data from any device. Web and mobile applications are inherently
    geographically distributed, yet file systems are usually limited in both effective access points
    (location) and number of files that they can manage. Object-based storage abstracts its storage
    from physical locations, providing a secure access point in place of device-specific mount points.
    Web services APIs and file-based access allow approved users to easily access their archives
    from computers and a broad array of mobile devices. Integrated web services over REST and
    SOAP are key to this instant access. Other support components are file-based access (CIFS / NFS
    / IFS / CAS), and expanded access via ISV applications.
   Self-managing storage. In traditional development, applications have often been hard-coded
    to specific data stores through pointers to identified LUN’s or file system navigation paths. In
    contrast, object storage provides a clean mapping from application to data through a simple
    REST API with an immutable unique object ID to the stored object. This goes a long way
    towards eliminating traditional, time-consuming storage management tasks like LUN creation
    and RAID groups. Cloud owners may choose to extend self-management options to customers,
    making it simple for users to grow storage capacity on demand.
   Broad API support. Cloud object storage is basically shared storage accessed through web-
    based services. Atmos’ architecture supports rapid web application development with a broad
    API set including REST and S3. REST API leverages HTTP operations on objects that are directly
    addressed, which reduces code complexity and provides the kind of easy, automatically
    distributed, protected, persistent storage the developer needs. In addition to the REST API,
    EMC Atmos also natively supports the Amazon S3 API. This provides customers with the ability
    to simply point S3 applications to Atmos and seamlessly migrate their applications to any of the
    more than 40 Atmos powered public clouds around the globe.


Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                       7 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557    www.tanejagroup.com
Technology in Brief




EMC and Object-Based Storage
EMC first introduced Centera CAS for archiving in 2002. Centera offers 5 9’s data availability with
its redundant array of independent nodes (RAIN) is interconnected via cube switches, protecting
data across independent nodes in a cube. Mirroring and parity provide additional protection and
availability.
Centera’s CAS architecture keeps the retained data from being compromised or deleted before the
end of its retention period. Centera assigns unique hash-code identifiers specific to each unique
object including content elements, metadata, and data/metadata relationships. This inextricably
links content elements with their metadata, which are stored within a flat address space – no need
for a separate database. This architecture ensures authenticity of the archived objects. Centera
abstracts the unique objects from their generating applications and operating systems, which
enables Centera to flexibly act as the single, highly optimized data store for previously siloed
archives.
Centera retains single instances of archived objects. In the case of multiple users of the same file –
such as a PowerPoint file sent over a distribution list – Centera retains metadata with information
about each user’s interaction with the file, but points to the single instance of the object. By cutting
down on data copies, this results in dramatic reductions in the quantity of archive storage.
Centera searches using metadata, rather than opening up the content objects on application-specific
storage. This results in much faster and more efficient searches without using application cycles.
This is possible because content and metadata stored on Centera is application, file and operating
system independent; and Centera offers is a search engine right in its repository.
Centera’s content-based addressing integrates directly with application environments via APIs,
with no need for kernel level dependencies. This means that multiple applications can
simultaneously use Centera, and that specific archiving management attributes – such as data aging
and data protection -- can be executed per application. These capabilities create a complete chain of
custody once the data leaves the primary application to be archived on Centera. Media
independence also leverages Centera’s application support. Centera objects are independent of
specific storage media and protocols, which means that the storage system can migrate to new
storage media over time without disturbing the integrity of the archived objects. For long term
disk-based archiving, this represents significant risk mitigation and investment protection.
Centera architecture is highly scalable and self-managing. Traditional file systems scale based on
the amount of stored data versus remaining available address space – which may not be much. As
the file system reaches its maximum capacity, administrators must expand the entire file system
including operating system, file system, and application in order to scale the archive. In contrast,
Centera expands to petabyte-high capacities due to their flat address space. It also leverages its
architecture to distribute management controls across the entire archive infrastructure. For
example, if a Centera disk or node fails, the archive cluster knows how to self heal without manual
intervention. This distributed management structure extends to cover the deployment, scaling,
recovery and protection of all the archival objects being stored by Centera.
Centera optimizes archiving, information governance and compliance. Users may choose from 300
native, integrated archiving applications to manage archival needs for email, files, medical imaging,
content management, video, voice, and more on the single Centera archiving platform. In addition,
Centera offers Compliance Edition Plus for compliance and eDiscovery, and Governance Edition for
data retention management.



Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                       8 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557    www.tanejagroup.com
Technology in Brief



Centera Compliance Edition Plus captures and preserves original content, protecting data and
proving chain of custody for legal eDiscovery and litigation. Retention classes assign a logical
reference to each electronic record object; policies enforce data retention and safe disposition.
Centera Governance Edition enforces internal policies for data retention and disposition. Policies
may be organizational or application-specific, which improves corporate accountability, reduces the
cost of eDiscovery and compliance, and proves the integrity of governance controls.

To the Cloud: Atmos Architecture
EMC’s Atmos supports the same CAS API as Centera for seamless migration, and brings object
storage into the cloud with massive scalability and geographic federation supported with multi-
tenancy, cloud provisioning and global access features. While Atmos is readily leveraged to extend
active global archives, it also offers an exceptional platform for web and mobile application
development. Atmos even enables new opportunities for global “big” data aggregation and
distribution.
Atmos is at heart a software storage system for building private and public cloud storage. Atmos
implementations are available from EMC either already integrated into pre-packaged physical
building blocks or as a virtual machine solution for VMware vSphere that can leverage other EMC or
3rd party storage resources. Additionally, there is a rich ecosystem of service providers providing
Atmos as cloud Storage-as-a-Service directly. Any and all of these options can be federated together
as needed within and across a given organization.
EMC uses REST and SOAP web services, and has also implemented file services on top of Atmos to
serve underlying objects through the lens of either an NFS or CIFS file server. When NFS or CIFS
shares are defined, they are assigned to specific Atmos nodes (or dedicated pairs for HA) and utilize
the Atmos node’s inherent Linux capabilities (leveraging an Installable File System with the FUSE
extension). Layering a file system over Atmos imposes some constraints regarding universal access,
but also enables both traditional and transitional applications and file system type usage.
EMC Atmos Windows and Linux users can also leverage the EMC GeoDrive add-on that installs on a
single user workstation or server to provide remote virtual NFS/CIFS style access (over REST) to
Atmos object storage. GeoDrive supports local caching of files for offline use and eventual
synchronization on reconnection. One of the major benefits of GeoDrive is enabling a user to access
large amounts of protected storage from anywhere. It can also be used for the disaster recovery of
files pushed or mirrored into Atmos.
Atmos technically maintains a given piece of data as an object with associated metadata that
includes the object ID, system and user-defined metadata fields and the internal object layout
information (and parent/child information for objects saved through a file system “namespace”
interface). Applications and users can store arbitrary metadata with each object that can be
leveraged by group management policies. Policies can be created at the tenant level as a design
scheme to provide various service levels of performance access, and data protection based on some
awareness of the multi-site architecture of the cloud implementation. They are then assigned to
subtenants, who need to not be aware of the underlying implementation, to apply as target service
levels to their objects. For example, the power to explicitly enforce compression of image files (e.g.
jpegs) after a number of days would present a significant capacity optimization for a web-based
application dealing with millions of images.
In addition to supporting compliance and retention policies, metadata can be used to drive
automated file distribution, access control and data protection activities optimizing for the
appropriate level of data resiliency, performance and availability. For most applications, thoughtful
use of user metadata can remove any need to implement a separate management tracking database
for stored objects.


Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                       9 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557    www.tanejagroup.com
Technology in Brief



Replication is controlled by automated policies which can mirror data objects at many points in an
object’s lifecycle both within and across multiple sites. Within a data center site, replication might
for example be set to happen synchronously upon ingestion while between replication between
sites might be set asynchronously and launched with an arbitrary delay to allow for data settling.
Replications can be targeted to specific locations, or abstractly sent to “other” sites as the system
decides.
For performance and availability, replicas are all active for read access (objects are inherently
immutable so there is no issue with having to manage distributed locking mechanisms). Because it
is “multi-site active/active”, any site can fulfill new object write requests when the local primary
site is unavailable.
In addition to full replication, EMC also provides an erasure coding option called GeoParity. Instead
of keeping two or more full 100% copies, “9/12” erasure coding enables storing an “expanded”
object containing only 33% additional encoded “redundant” data broken up into 12 segments. By
using erasure coding, the original data can be reconstructed dynamically from any 9 of the
segments. These segments are cleverly distributed so that the object can survive (and even be
accessed during) multiple failures. For greater protection there is also a “10/16” coding with a 60%
capacity overhead. Erasure coding does impact access performance, especially at ingestion, but
provides great fault tolerance with much lower capacity utilization. Of course, policies can be
written to convert replicated objects to erasure coded schemes as they age appropriately.
With object stores there is generally no need for low-level RAID or disk level protection and Atmos
is no exception. Upon hardware failures, replications and/or GeoParity across nodes (RAIN)
combined with built-in node auto-healing features suffice to provide the full data protection as
determined by the service level “policies” implemented for each type of data object. Atmos can
withstand the loss of any disk, node, rack, or even site.

Atmos Pre-built Hardware Configurations
EMC Atmos pre-configured hardware “appliances” consists of a rack/cabinet containing from 4 to
16 Atmos nodes in various configurations and disk capacities. Flexible configurations enable
smooth scalability, and allow for mixes of capacity and performance in and across Atmos sites. An
Atmos storage node consists of a 1GbE server front-end running the Atmos storage services
connected to one or more SAS attached disk array enclosures (DAE), each containing 15 1-3TB
7200RPM disks. Every node runs all object storage services (the first two nodes in each site also
run the site metadata locator service that indexes which node contains which objects) supporting
tremendous horizontal system scalability.
EMC has also introduced their new Atmos G3 series for new levels of density and energy efficiency.
G3-Dense-480 is the first in the Atmos G3 series and consists of 4, 6, or 8 nodes with 480 disks in
40U, and 3TB drives.

TABLE: ALIGNING TOP CLOUD USE CASES WITH EMC ATMOS
     Use case                          Challenge                                        Benefits

 Medical               Over 800 million medical imaging                Vendor Neutral Archive (VNA) on Atmos:
 Archiving             procedures a year require huge                  integrates with EMR/EHR and improves
                       storage scalability; collaboration and          PACs for better patient care and
                       compliance increase complexity.                 collaboration, improves data lifecycle
                                                                       management, reduces IT costs, and
                                                                       preserves HIPAA compliance.
 File Archiving        Corporate file sharing is popular with          With EMC Sync & Share, users can securely


Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                                10 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557             www.tanejagroup.com
Technology in Brief



                       employees but syncing and sharing               share Atmos files across mobile devices,
                       are hard to manage. Employees will              Linux and Windows. GeoDrive creates a
                       frequently share files anyway over              Dropbox-like service that is secure and
                       mobile devices, leaving corporations            manageable, powered by Atmos’ fast
                       accountable for risky behavior.                 performance. Atmos policies monitor
                                                                       changes to data and provide access control,
                                                                       benefitting regulated verticals like finance.
 Archive as a          Both the enterprise and storage                 The Atmos Cloud Delivery Platform enables
 Service               service providers struggle to provide           corporations and service providers to meter
                       IT services to their respective                 capacity, bandwidth, and usage across
                       customers. Provisioning,                        tenants. Provisioning is automated by
                       maintenance, and security are all               tenant, and Atmos allows tenants to safely
                       difficult issues in traditional storage         self-manage and access their own storage.
                       offerings.
 Managed               Many MSPs suffer from narrow profit             Atmos lets MSPs efficiently offer storage as
 Service               margins because of the expense of               a service and better monetize new service
 Providers             delivering storage to customers.                offerings. MSPs can monitor capacity and
                       Managing multiple tenants, manual               usage for chargeback, reduce provisioning
                       provisioning and maintaining service            costs, and replace multiple tenant manage-
                       level agreements all cut into revenue           ment systems with a single system. Dynamic
                       and make it too expensive to add                scaling, high availability and security cost-
                       new storage services.                           effectively meet service level requirements.
 Content-Rich          Traditional storage is a poor                   Atmos provides location transparency for
 Web                   environment for Web application                 global applications and a highly mobile user
 Applications          development, which needs highly                 base. The single namespace means that
                       scalable capacity for multiple large            application developers never need to recode
                       data sets, a secure environment for             pathnames and locations, and do not need
                       test/dev and application testing in             to code for limited storage environments.
                       real-time environments.                         Self-management options make it easy for
                                                                       customers to provision their own storage,
                                                                       and REST APIs reduce application
                                                                       complexity.


Taneja Group Opinion
When on-premise archive solutions smoothly integrate with federated storage, then public and
private clouds provide extensive scalability and global availability. Yet we see too many end-users
treating the cloud as just another storage tier for low value retained data. This is a huge waste of
cloud possibilities but we understand why it happens: cloud platforms with poor performance and
delivery mechanisms can make cloud-based storage more trouble than it’s worth.
But when we talk about EMC Atmos we are not talking about a low-cost storage tier, far from it. We
are describing the heart of business innovation based on highly secure and highly accessible global
data stores. EMC’s long expertise with object-based storage has kept Centera relevant and has
extended dynamic data management to the cloud with Atmos. The Atmos-fueled cloud replaces
hierarchical file storage while allowing the secure flow of information between the data center, the
distributed cloud, and global access points. Customers profit from greatly improved application and
data delivery, and the deep business value inherent in their valuable data.


Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                                  11 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557               www.tanejagroup.com
Technology in Brief



When a company is dealing with geographic reach and large growing volumes of rich content, then
they should look to object-based storage in the cloud. We fully support EMC in its push to scale
capacity, performance, availability and management far beyond what traditional file systems are
capable of, and more massively than ever before.


.NOTICE: The information and product recommendations made by Taneja Group are based upon public
information and sources and may also include personal opinions both of Taneja Group and others, all of which we
believe to be accurate and reliable. However, as market conditions change and not within our control, the
information and recommendations are made without warranty of any kind. All product names used and
mentioned herein are the trademarks of their respective owners. Taneja Group, Inc. assumes no responsibility or
liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or
reliance upon, the information and recommendations presented herein, nor for any inadvertent errors that may
appear in this document.




Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.                                              12 of 12
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557           www.tanejagroup.com

More Related Content

PDF
Silverton cleversafe-object-based-dispersed-storage
PDF
Novell File Management Suite Use Cases
PDF
Dell - Storage 12sept2012
PDF
NAS Systems Scale Out to Meet Growing Storage Demands
PDF
Security and Compliance for Scale-Out Hadoop Data Lakes
 
PDF
Flevy.com - Feasibility Study for an eMail Archiving solution
PDF
Leveraging Swift Storage Policies using Scality RING
PDF
M.Sc. Research Proposal
Silverton cleversafe-object-based-dispersed-storage
Novell File Management Suite Use Cases
Dell - Storage 12sept2012
NAS Systems Scale Out to Meet Growing Storage Demands
Security and Compliance for Scale-Out Hadoop Data Lakes
 
Flevy.com - Feasibility Study for an eMail Archiving solution
Leveraging Swift Storage Policies using Scality RING
M.Sc. Research Proposal

What's hot (12)

PDF
gfs-sosp2003
PDF
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
PDF
Cidr11 paper32
PPT
Digital preservation geoscinfo
PDF
Msc Proposal Presentation
PDF
Storage Virtualization: Towards an Efficient and Scalable Framework
PPTX
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
PDF
Mcae brief
PDF
Accel Partners New Data Workshop 7-14-10
PDF
Research on big data
PPTX
Scality presentation cloud Computing Expo NY 2012 v1.0
PDF
Filename intelvmwaresolutionbrief asset4
gfs-sosp2003
Panzura & Scality - Cloud Storage made seamless - Cloud Expo New York City 2012
Cidr11 paper32
Digital preservation geoscinfo
Msc Proposal Presentation
Storage Virtualization: Towards an Efficient and Scalable Framework
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Mcae brief
Accel Partners New Data Workshop 7-14-10
Research on big data
Scality presentation cloud Computing Expo NY 2012 v1.0
Filename intelvmwaresolutionbrief asset4
Ad

Viewers also liked (20)

PPTX
Data Explosion in Medical Imaging
PDF
Sub formulario2
PPTX
DOC
发音要领
PPS
10 roses for_u
PPTX
PPTX
Substitutes income effect
PPTX
Finland
PPTX
Personality test
PPTX
Changes to SRAD
PPTX
It’s a Jungle Out There - Improving Communications with Your Volunteers
PPT
Tuesday marginal analysis
PPT
Wed demand consumer surplus
PPTX
Atlassian Crowd
PPT
Automatic Annotation in UniProtKB
 
PDF
2012 key financial numbers report
PDF
International Conference on Cloud and Big Data Analytics ICCBDA 2013
 
PDF
vCloud Air Network Has Arrived
 
PPS
Amarnath darshan
PPTX
20121025cafesemi
Data Explosion in Medical Imaging
Sub formulario2
发音要领
10 roses for_u
Substitutes income effect
Finland
Personality test
Changes to SRAD
It’s a Jungle Out There - Improving Communications with Your Volunteers
Tuesday marginal analysis
Wed demand consumer surplus
Atlassian Crowd
Automatic Annotation in UniProtKB
 
2012 key financial numbers report
International Conference on Cloud and Big Data Analytics ICCBDA 2013
 
vCloud Air Network Has Arrived
 
Amarnath darshan
20121025cafesemi
Ad

Similar to The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development (20)

PPTX
Survey of distributed storage system
PPTX
What is Object storage ?
PDF
GPFS Solution Brief
PDF
Dynamic Metadata Management in Semantic File Systems
PDF
Cloud Storage Adoption, Practice, and Deployment
PDF
Object-Based Storage is the Future of Unstructured Data Management
PDF
Cloud File System and Cloud Data Management Interface (CDMI)
PDF
Software Developer Conference 2012 - Paper Presentation - Cloud File Systems
PDF
A FILE STORAGE SERVICE ON A CLOUD COMPUTING ENVIRONMENT FOR DIGITAL.pdf
PDF
Novell File Management Suite Use Cases
PDF
IBM SONAS Brochure
PPTX
Storage Systems Overview of unit 1
PDF
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
PPTX
storage system, iscsi,file storage, NAS, SAS
PDF
[IC Manage] Workspace Acceleration & Network Storage Reduction
PDF
Different Storage Models in Big Data Analytics
PDF
Understanding Object-Based Storage: Efficient, Scalable, and Cost-Effective
PDF
Building a Resilient, Scalable, Storage System with OpenStack
PPTX
Advanced Operating Systems- Distributed file system and caching
ODP
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Survey of distributed storage system
What is Object storage ?
GPFS Solution Brief
Dynamic Metadata Management in Semantic File Systems
Cloud Storage Adoption, Practice, and Deployment
Object-Based Storage is the Future of Unstructured Data Management
Cloud File System and Cloud Data Management Interface (CDMI)
Software Developer Conference 2012 - Paper Presentation - Cloud File Systems
A FILE STORAGE SERVICE ON A CLOUD COMPUTING ENVIRONMENT FOR DIGITAL.pdf
Novell File Management Suite Use Cases
IBM SONAS Brochure
Storage Systems Overview of unit 1
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
storage system, iscsi,file storage, NAS, SAS
[IC Manage] Workspace Acceleration & Network Storage Reduction
Different Storage Models in Big Data Analytics
Understanding Object-Based Storage: Efficient, Scalable, and Cost-Effective
Building a Resilient, Scalable, Storage System with OpenStack
Advanced Operating Systems- Distributed file system and caching
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source

More from EMC (20)

PPTX
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
PDF
Cloud Foundry Summit Berlin Keynote
 
PPTX
EMC GLOBAL DATA PROTECTION INDEX
 
PDF
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
PDF
Citrix ready-webinar-xtremio
 
PDF
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
PPTX
EMC with Mirantis Openstack
 
PPTX
Modern infrastructure for business data lake
 
PDF
Force Cyber Criminals to Shop Elsewhere
 
PDF
Pivotal : Moments in Container History
 
PDF
Data Lake Protection - A Technical Review
 
PDF
Mobile E-commerce: Friend or Foe
 
PDF
Virtualization Myths Infographic
 
PDF
Intelligence-Driven GRC for Security
 
PDF
The Trust Paradox: Access Management and Trust in an Insecure Age
 
PDF
EMC Technology Day - SRM University 2015
 
PDF
EMC Academic Summit 2015
 
PDF
Data Science and Big Data Analytics Book from EMC Education Services
 
PDF
Using EMC Symmetrix Storage in VMware vSphere Environments
 
PDF
Using EMC VNX storage with VMware vSphereTechBook
 
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
 

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Unlocking AI with Model Context Protocol (MCP)
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
Encapsulation_ Review paper, used for researhc scholars
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...

The Object Evolution - EMC Object-Based Storage for Active Archiving and Application Development

  • 1. TECHNOLOGY IN BRIEF THE OBJECT EVOLUTION EMC OBJECT-BASED STORAGE FOR ACTIVE ARCHIVING AND APPLICATION DEVELOPMENT NOVEMBER 2012 A few years ago, object-based storage made a huge splash on-premise with the promise of meaningful data relationships, information accessibility and strong compliance. It remains an important component for information management based on compliance and single-tenant architectures. However, the evolution of object-based storage has big implications for the cloud and unstructured data: new approaches to active archiving, web/mobile application development and a changing model for cloud storage service providers. Object storage is optimal for the web. It has a very different architecture from file systems, which are frankly overkill for most cloud storage. On-premise can be a different story; having data close to hand under single-tenant access control is right for some data storage. But on-premise stored data requires that the enterprise maintain a primary data center, a cold data center for DR, replication, continuous data protection, and so on. Given the right set of needs this is a fine trade-off of course and we certainly do not counsel people to get rid of their internal data centers and redundant systems. However, cloud-based object architecture offers big benefits for storing unstructured data for active archiving, global access to data, fast application development and much lower cost compared to the high computing and data protection costs of on-premise NAS. EMC has engineered Atmos to provide these capabilities and many more as a massively scalable, distributed cloud-based system. In this Technology in Brief we will examine the fast-changing world of archiving and development on the web, and how object-based storage is the best way to go for these monumental tasks. When Object Trumps File The go-to architecture for unstructured data has traditionally been an application-centric system containing the operating system, the application, and a NAS filer using hierarchical file architecture. This infrastructure works acceptably well in a slow-growth, consistent workload setting; although even then it is far too easy to add complexity along with additional systems and filers. However, business needs have evolved far beyond this sleepy storage model. Unstructured data now comprises a massive portion of large data growth, and hierarchical file systems are difficult to optimize and scale. For example, file system-based storage requires near-constant provisioning. As storage requests grow (which they inevitably do), IT administrators must manually provision storage to meet the expanded requirements. Meanwhile, large volume and spiky workloads make provisioning both “up” and “down” an expensive and time-consuming proposition. And difficult provisioning is hardly the only problem: siloed data protection with individual backup, replication and archiving applications steadily raises OPEX. Scaling is an issue as well. Large critical big data applications may warrant scale-out or scale-up file systems (which are challenges in and of Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 1 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 2. Technology in Brief themselves). Most do not rate this architecture, and instead reside on poorly scalable systems. The number of these systems grows as applications come online, making it even harder for IT and application owners to administrate and for users to get the value from the application that they need. This already difficult scenario gets even worse when NAS storage is used for what is essentially a cloud use case, such as extending existing assets over the cloud. Figure: Traditional NAS infrastructure 3 In contrast to hierarchical file system-based storage silos, object-based storage opens up a whole new range of dynamic functionality. Object-based storage assigns unique object IDs to access data across all federated locations. This goes a long way towards eliminating traditional, time- consuming storage management tasks like LUN creation and RAID groups. Active archives and applications needing fast global access particularly benefit from global namespaces and location transparency. The flat, universal namespace allows global access to stored content from anywhere the distributed application runs. Applications can also efficiently associate metadata with stored objects without using a dedicated database. Sharing vast storage resources means application administrators do not need to modify application files. Object-based storage usually has elements of file systems in order to handle processes like file archiving, but it is not founded on that architecture and its drawbacks. Object-based storage originally developed as a type of specialized NAS storage where the hierarchical system was replaced with an object-oriented system that made file storage far more secure and scalable. One of its most popular incarnations is still going strong today: Content- Addressable Storage (CAS). A subset of object-oriented storage, CAS ensures there is only one ID for any object. When the CAS object is retrieved, it can be hashed again and checked against its ID to verify identity. CAS de-dupes at the object level for copy control. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 2 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 3. Technology in Brief TABLE: CONTRASTING FILE SYSTEMS WITH OBJECT STORAGE Characteristic File Object File systems implement a Object metadata is stored along centralized file layer metadata with the object data to avoid service that tracks directory metadata service bottlenecks. This Metadata structures, permissions, and on-disk ID may be used to also uniquely locations of files. All file requests verify and validate the data being must access metadata first for stored. permission and file information. File systems have built-in Object storage provides a single namespace constraints for files and flat namespace for objects. directories they can store and Replacing path and filenames with Namespace manage. Hierarchical directory object identifiers makes the structures can become unwieldy, address space practically infinite performing poorly at navigating with very fast performance for large numbers of users or files. users and applications. File systems are designed to offer Objects are inherently immutable in-place editing and updating of once stored under a unique ID, files using sophisticated, yet highly and can be easily replicated and complex, locking and accessed globally. Programming Interaction synchronization mechanisms. These for object storage leads to simpler, methods make it difficult to supportable, and more reliable distribute or extend file systems programs. across multiple locations. File systems present a real Object stores are simple, clean and challenge for cloud-based archival quick to access. Since objects are management and mobile easily distributed, replicated, and application delivery. Poor globally accessible in the cloud, Cloud Applications scalability, lagging performance, they are ideal for active global and complex application archives and distributed mobile development make traditional file applications. systems a poor choice for compelling new cloud usages. Object-based storage both on-premise and in the cloud require certain key capabilities. On-premise object storage has great benefits for local file storage including multiple application access, massive scaling, high availability; and in some architectures, information governance as well.  Multiple application access. Applications simultaneously leverage the same centralized object-based storage infrastructure. This enables local object-based storage to execute application-specific archiving management attributes for a complete chain of information custody.  Massive scaling. Massive scaling is problematical with file-based archive solutions. As the file system reaches its maximum capacity, administrators must expand the entire system’s operating system, file system and application in order to scale the archive. By contrast, object- Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 3 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 4. Technology in Brief based storage can expand in an open fashion into multiple petabytes due to their flat address space.  High availability. Object storage often archives data that has heavy retention and government requirements. In this environment, 5 9’s or higher availability (99.999%) is a necessity. Mirroring and parity help to protect availability; other beneficial features include self-healing, detecting and fixing soft corruptions in the background, and addressing hardware failures before they impact data availability.  Information governance. A subset of object-based storage, Content-Addressable Storage (CAS) is purpose-built for long-term defensible retention of fixed files and data. As opposed to other archival storage methods like tape or monolithic “tar” files that bundle data up and/or move it offline, CAS stores data as objects that can be strictly and individually managed for governance and compliance and yet remain actively accessible on-line. Best Practices: Object and the Cloud We strongly support on-premise object storage such as CAS for local space savings, performance and information governance. However, we find that object storage is roaring to life in the cloud, where cloud-based active archiving and application development require highly distributed and single namespace storage for unstructured content. These critical usage cases benefit far more from object-based storage than they do from traditional file systems. Let’s look at best practices architectural features for object-based storage in the cloud. DATA AND METADATA When data is stored as an object, a unique object identifier is created out of a single universal global namespace. The object ID is retained by the client application and used to subsequently retrieve that object. Objects can effectively live anywhere in the cloud-wide system without the storage client needing to know about actual data locations, file system structures or LUN details. This provides a complete location transparency that serves to reduce intentional storage management and inherently supports globally distributed access by web and mobile applications. Because of the location transparency provided by the object storage layer, objects can be automatically load-balanced across nodes, and replicated within and across sites without disrupting applications or users. Wide data distribution and federation can be managed through systematic policies to meet various service level goals for access, high availability, protection, cost and performance. The object layer abstraction also provides a great benefit to applications that previously might have had to be intimately storage aware to avoid running out of space or had to otherwise actively manage data locations. Because applications written to leverage object storage don’t have to embed rules or code specific knowledge of storage infrastructure details, they avoid having to be re- written or re-architected for “changing” storage assignments as users spread, features expand, and data sets grow. MULTI-TENANCY Secure multi-tenancy is a key requirement of cloud object storage, which should support two levels of multi-tenancy: tenants and sub-tenants. Tenants are top-level entities that each has its own access points, security controls and master storage policies. Tenants share nothing with other tenants and are fully isolated. Every node gets assigned to a specific tenant; tenants do not share nodes and therefore each tenant has its own dedicated access points and storage. Within a large company, a tenant could be set up for independently managed divisions or subsidiaries. In a service provider implementation, the tenant might be mapped to a broad storage service offering. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 4 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 5. Technology in Brief Sub-tenants are then created within each tenant with security controls and defined management policies assigned by the tenant. Each sub-tenancy defines a distinct storage environment with isolated management for its own users, object namespace, and defined shares. A sub-tenant within a company might correspond to a department, while a storage provider's sub-tenant might track to a specific client account. This highly functional multi-tenancy capability makes it easy to create private sandboxes or implement a global content delivery scheme. With some planning, this scheme could enable large corporations to facilitate aggregating “big data” distributed across the enterprise. ACCESS FROM ANYWHERE As a cloud object storage service with a flat global namespace, an object can be accessed through any site (although for performance, policies might strive to replicate objects to sites closer to where they will be read). In addition, object storage for the cloud must present a broad range of access methods including both web services and traditional file services. REST (and SOAP) web services are key APIs. REST is the most common cloud storage access method for browser and custom mobile applications. REST as a protocol over HTTP was designed to optimize web-style remote access to “resources”, and is an ideal match to object storage where each object can be easily treated as a REST resource. Figure: Typical cloud-based object storage deployment POLICY DRIVEN MANAGEMENT A key benefit of object storage is the ability to use metadata to drive automatic data management policies. Policies should support service levels, and should be triggered when data objects are created, objects hit certain ages, or upon metadata updates. Policies can control data protection operations including the number, type and target locations for replicas, inherent storage features for striping, compression and de-duplication, retention locks and automatic deletion, and shifting objects into different policies over time. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 5 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 6. Technology in Brief The policy mechanism should be highly flexible, targeting policies to any group of objects based on both system and user defined metadata. Policies can be used to build service levels by defining the amount of replication, implement archive rules for compliance, and optimize capacity and performance as items age. Primary Object Use Cases in the Cloud Cloud-based archiving, particularly medical and file archiving, forms the primary use case for object-based storage. Web application development is surging forward, and Archive-as-a-Service and its providers round out the fastest-growing use cases. PRIMARY USE CASE: ACTIVE ARCHIVES Archived information is playing a more strategic role in workflows and business processes. On- premise archiving is essentially static and used to reduce storage costs, improve operational efficiency, retention and compliance, and enable the business to use archived data to make better business decisions. Cloud-based archiving retains elements of these features but adds new dynamic ones: instant access from any device, archive as a service and federating to private or public cloud. Atmos provides both the static and dynamic features that massive active archives require.  Federate to public or private clouds. Federation enables companies to treat on-premise and cloud object storage as a single efficient infrastructure. Companies may pool distributed storage assets including data, applications and policies to take full advantage of the cloud’s massive scalability and global access features. Federation also lowers cost and risk: application workloads run on cloud resources with a low execution cost, and if a cloud-based storage system goes down the distributed workload remains protected. Federation extends internal policies to cloud-based storage environments by applying existing policies and settings to cloud-based storage.  Use metadata to drive business and storage decisions. We expect the use of metadata to expand quickly to directly feed business exploitation processes, as well as support more automatic and intelligent storage management decisions. A singly managed distributed system that maintains directly accessible object metadata yields rich support for business decisions. Object-based storage also enables IT to automate information lifecycle management across the entire distributed data store, not just by storage silo. Policies should be flexible enough to be set at the object, tenant or system levels, to automate archive decisions, set and manage retention, expiration, and disposition.  Multi-tenancy for secure shared storage. Multiple applications can safely co-exist as separate tenants. Isolation by tenant protects security while enabling the sharing of system-wide resources and capacity. Multi-tenancy is also efficient since it is subscribed to a highly scalable pool of storage, which can flexibly up-scale and down-scale on demand.  Massive scalability. Unstructured data storage is growing so fast that traditional storage systems are straining purchase, maintenance and management resources to the brink. Distributed object-based architecture yields near-limitless scale. Object also allows for automatic load balancing whenever new objects are stored, which protects high performance across the entire distributed system.  Multi-site active/active. Multi-site active/active architecture is an important component of object-based storage, especially in the cloud. Cloud object storage systems span multiple sites and provide for multi-site direct access to objects through both synchronous and asynchronous replications. This model replicates between multiple storage nodes and sites, which not only increases distributed availability and content distribution, but also supports disaster recovery. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 6 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 7. Technology in Brief  Archive-as-a-service. The most agile and flexible way for IT to deliver archive services is with the cloud model of self-service portals. This model manages and meters utilization and bandwidth and supports third-party chargeback. Within an enterprise this flexibility and instant storage relieves users of the temptation of using commercial cloud services simply because they can get the storage they need fast – even though security might not be in place. This approach also enables ISVs and MSPs to extend archive requirements and offerings.  Reduce manual tasks and provisioning across multiple archives. Cloud-based archives must be easy to set-up and for reliability and consistency must not require long or deep manual configuration. They should also automate underlying complexities including security, audit, retention, performance, and capacity growth. Atmos provides these features and more, relieving the cloud administrator of enormous burdens. Distributed systems may be managed as a single entity with policies to automate hundreds of management and data protection tasks. And perhaps the most important of all, object-based systems like Atmos offer massive scalability of capacity and performance thanks to their unique architecture. FAST-GROWING USE CASE: WEB AND MOBILE APPLICATION DEVELOPMENT Web and mobile applications development using unstructured data also has driving needs that object-based cloud storage meets. Web application development requires quick access to storage resources, test/dev environments capable of storing multiple copies of large data sets, and the ability to test web applications in real-time online environments. These requirements are understandably hard to achieve in traditional using file-based storage systems. Applications written to leverage object storage won’t need to be rewritten or even taken offline as the object storage seamlessly (or elastically) expands over time. Atmos provides the key capabilities that web application development require, including location transparency, self- managing storage and REST APIs.  Enable instant access to data from any device. Web and mobile applications are inherently geographically distributed, yet file systems are usually limited in both effective access points (location) and number of files that they can manage. Object-based storage abstracts its storage from physical locations, providing a secure access point in place of device-specific mount points. Web services APIs and file-based access allow approved users to easily access their archives from computers and a broad array of mobile devices. Integrated web services over REST and SOAP are key to this instant access. Other support components are file-based access (CIFS / NFS / IFS / CAS), and expanded access via ISV applications.  Self-managing storage. In traditional development, applications have often been hard-coded to specific data stores through pointers to identified LUN’s or file system navigation paths. In contrast, object storage provides a clean mapping from application to data through a simple REST API with an immutable unique object ID to the stored object. This goes a long way towards eliminating traditional, time-consuming storage management tasks like LUN creation and RAID groups. Cloud owners may choose to extend self-management options to customers, making it simple for users to grow storage capacity on demand.  Broad API support. Cloud object storage is basically shared storage accessed through web- based services. Atmos’ architecture supports rapid web application development with a broad API set including REST and S3. REST API leverages HTTP operations on objects that are directly addressed, which reduces code complexity and provides the kind of easy, automatically distributed, protected, persistent storage the developer needs. In addition to the REST API, EMC Atmos also natively supports the Amazon S3 API. This provides customers with the ability to simply point S3 applications to Atmos and seamlessly migrate their applications to any of the more than 40 Atmos powered public clouds around the globe. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 7 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 8. Technology in Brief EMC and Object-Based Storage EMC first introduced Centera CAS for archiving in 2002. Centera offers 5 9’s data availability with its redundant array of independent nodes (RAIN) is interconnected via cube switches, protecting data across independent nodes in a cube. Mirroring and parity provide additional protection and availability. Centera’s CAS architecture keeps the retained data from being compromised or deleted before the end of its retention period. Centera assigns unique hash-code identifiers specific to each unique object including content elements, metadata, and data/metadata relationships. This inextricably links content elements with their metadata, which are stored within a flat address space – no need for a separate database. This architecture ensures authenticity of the archived objects. Centera abstracts the unique objects from their generating applications and operating systems, which enables Centera to flexibly act as the single, highly optimized data store for previously siloed archives. Centera retains single instances of archived objects. In the case of multiple users of the same file – such as a PowerPoint file sent over a distribution list – Centera retains metadata with information about each user’s interaction with the file, but points to the single instance of the object. By cutting down on data copies, this results in dramatic reductions in the quantity of archive storage. Centera searches using metadata, rather than opening up the content objects on application-specific storage. This results in much faster and more efficient searches without using application cycles. This is possible because content and metadata stored on Centera is application, file and operating system independent; and Centera offers is a search engine right in its repository. Centera’s content-based addressing integrates directly with application environments via APIs, with no need for kernel level dependencies. This means that multiple applications can simultaneously use Centera, and that specific archiving management attributes – such as data aging and data protection -- can be executed per application. These capabilities create a complete chain of custody once the data leaves the primary application to be archived on Centera. Media independence also leverages Centera’s application support. Centera objects are independent of specific storage media and protocols, which means that the storage system can migrate to new storage media over time without disturbing the integrity of the archived objects. For long term disk-based archiving, this represents significant risk mitigation and investment protection. Centera architecture is highly scalable and self-managing. Traditional file systems scale based on the amount of stored data versus remaining available address space – which may not be much. As the file system reaches its maximum capacity, administrators must expand the entire file system including operating system, file system, and application in order to scale the archive. In contrast, Centera expands to petabyte-high capacities due to their flat address space. It also leverages its architecture to distribute management controls across the entire archive infrastructure. For example, if a Centera disk or node fails, the archive cluster knows how to self heal without manual intervention. This distributed management structure extends to cover the deployment, scaling, recovery and protection of all the archival objects being stored by Centera. Centera optimizes archiving, information governance and compliance. Users may choose from 300 native, integrated archiving applications to manage archival needs for email, files, medical imaging, content management, video, voice, and more on the single Centera archiving platform. In addition, Centera offers Compliance Edition Plus for compliance and eDiscovery, and Governance Edition for data retention management. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 8 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 9. Technology in Brief Centera Compliance Edition Plus captures and preserves original content, protecting data and proving chain of custody for legal eDiscovery and litigation. Retention classes assign a logical reference to each electronic record object; policies enforce data retention and safe disposition. Centera Governance Edition enforces internal policies for data retention and disposition. Policies may be organizational or application-specific, which improves corporate accountability, reduces the cost of eDiscovery and compliance, and proves the integrity of governance controls. To the Cloud: Atmos Architecture EMC’s Atmos supports the same CAS API as Centera for seamless migration, and brings object storage into the cloud with massive scalability and geographic federation supported with multi- tenancy, cloud provisioning and global access features. While Atmos is readily leveraged to extend active global archives, it also offers an exceptional platform for web and mobile application development. Atmos even enables new opportunities for global “big” data aggregation and distribution. Atmos is at heart a software storage system for building private and public cloud storage. Atmos implementations are available from EMC either already integrated into pre-packaged physical building blocks or as a virtual machine solution for VMware vSphere that can leverage other EMC or 3rd party storage resources. Additionally, there is a rich ecosystem of service providers providing Atmos as cloud Storage-as-a-Service directly. Any and all of these options can be federated together as needed within and across a given organization. EMC uses REST and SOAP web services, and has also implemented file services on top of Atmos to serve underlying objects through the lens of either an NFS or CIFS file server. When NFS or CIFS shares are defined, they are assigned to specific Atmos nodes (or dedicated pairs for HA) and utilize the Atmos node’s inherent Linux capabilities (leveraging an Installable File System with the FUSE extension). Layering a file system over Atmos imposes some constraints regarding universal access, but also enables both traditional and transitional applications and file system type usage. EMC Atmos Windows and Linux users can also leverage the EMC GeoDrive add-on that installs on a single user workstation or server to provide remote virtual NFS/CIFS style access (over REST) to Atmos object storage. GeoDrive supports local caching of files for offline use and eventual synchronization on reconnection. One of the major benefits of GeoDrive is enabling a user to access large amounts of protected storage from anywhere. It can also be used for the disaster recovery of files pushed or mirrored into Atmos. Atmos technically maintains a given piece of data as an object with associated metadata that includes the object ID, system and user-defined metadata fields and the internal object layout information (and parent/child information for objects saved through a file system “namespace” interface). Applications and users can store arbitrary metadata with each object that can be leveraged by group management policies. Policies can be created at the tenant level as a design scheme to provide various service levels of performance access, and data protection based on some awareness of the multi-site architecture of the cloud implementation. They are then assigned to subtenants, who need to not be aware of the underlying implementation, to apply as target service levels to their objects. For example, the power to explicitly enforce compression of image files (e.g. jpegs) after a number of days would present a significant capacity optimization for a web-based application dealing with millions of images. In addition to supporting compliance and retention policies, metadata can be used to drive automated file distribution, access control and data protection activities optimizing for the appropriate level of data resiliency, performance and availability. For most applications, thoughtful use of user metadata can remove any need to implement a separate management tracking database for stored objects. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 9 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 10. Technology in Brief Replication is controlled by automated policies which can mirror data objects at many points in an object’s lifecycle both within and across multiple sites. Within a data center site, replication might for example be set to happen synchronously upon ingestion while between replication between sites might be set asynchronously and launched with an arbitrary delay to allow for data settling. Replications can be targeted to specific locations, or abstractly sent to “other” sites as the system decides. For performance and availability, replicas are all active for read access (objects are inherently immutable so there is no issue with having to manage distributed locking mechanisms). Because it is “multi-site active/active”, any site can fulfill new object write requests when the local primary site is unavailable. In addition to full replication, EMC also provides an erasure coding option called GeoParity. Instead of keeping two or more full 100% copies, “9/12” erasure coding enables storing an “expanded” object containing only 33% additional encoded “redundant” data broken up into 12 segments. By using erasure coding, the original data can be reconstructed dynamically from any 9 of the segments. These segments are cleverly distributed so that the object can survive (and even be accessed during) multiple failures. For greater protection there is also a “10/16” coding with a 60% capacity overhead. Erasure coding does impact access performance, especially at ingestion, but provides great fault tolerance with much lower capacity utilization. Of course, policies can be written to convert replicated objects to erasure coded schemes as they age appropriately. With object stores there is generally no need for low-level RAID or disk level protection and Atmos is no exception. Upon hardware failures, replications and/or GeoParity across nodes (RAIN) combined with built-in node auto-healing features suffice to provide the full data protection as determined by the service level “policies” implemented for each type of data object. Atmos can withstand the loss of any disk, node, rack, or even site. Atmos Pre-built Hardware Configurations EMC Atmos pre-configured hardware “appliances” consists of a rack/cabinet containing from 4 to 16 Atmos nodes in various configurations and disk capacities. Flexible configurations enable smooth scalability, and allow for mixes of capacity and performance in and across Atmos sites. An Atmos storage node consists of a 1GbE server front-end running the Atmos storage services connected to one or more SAS attached disk array enclosures (DAE), each containing 15 1-3TB 7200RPM disks. Every node runs all object storage services (the first two nodes in each site also run the site metadata locator service that indexes which node contains which objects) supporting tremendous horizontal system scalability. EMC has also introduced their new Atmos G3 series for new levels of density and energy efficiency. G3-Dense-480 is the first in the Atmos G3 series and consists of 4, 6, or 8 nodes with 480 disks in 40U, and 3TB drives. TABLE: ALIGNING TOP CLOUD USE CASES WITH EMC ATMOS Use case Challenge Benefits Medical Over 800 million medical imaging Vendor Neutral Archive (VNA) on Atmos: Archiving procedures a year require huge integrates with EMR/EHR and improves storage scalability; collaboration and PACs for better patient care and compliance increase complexity. collaboration, improves data lifecycle management, reduces IT costs, and preserves HIPAA compliance. File Archiving Corporate file sharing is popular with With EMC Sync & Share, users can securely Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 10 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 11. Technology in Brief employees but syncing and sharing share Atmos files across mobile devices, are hard to manage. Employees will Linux and Windows. GeoDrive creates a frequently share files anyway over Dropbox-like service that is secure and mobile devices, leaving corporations manageable, powered by Atmos’ fast accountable for risky behavior. performance. Atmos policies monitor changes to data and provide access control, benefitting regulated verticals like finance. Archive as a Both the enterprise and storage The Atmos Cloud Delivery Platform enables Service service providers struggle to provide corporations and service providers to meter IT services to their respective capacity, bandwidth, and usage across customers. Provisioning, tenants. Provisioning is automated by maintenance, and security are all tenant, and Atmos allows tenants to safely difficult issues in traditional storage self-manage and access their own storage. offerings. Managed Many MSPs suffer from narrow profit Atmos lets MSPs efficiently offer storage as Service margins because of the expense of a service and better monetize new service Providers delivering storage to customers. offerings. MSPs can monitor capacity and Managing multiple tenants, manual usage for chargeback, reduce provisioning provisioning and maintaining service costs, and replace multiple tenant manage- level agreements all cut into revenue ment systems with a single system. Dynamic and make it too expensive to add scaling, high availability and security cost- new storage services. effectively meet service level requirements. Content-Rich Traditional storage is a poor Atmos provides location transparency for Web environment for Web application global applications and a highly mobile user Applications development, which needs highly base. The single namespace means that scalable capacity for multiple large application developers never need to recode data sets, a secure environment for pathnames and locations, and do not need test/dev and application testing in to code for limited storage environments. real-time environments. Self-management options make it easy for customers to provision their own storage, and REST APIs reduce application complexity. Taneja Group Opinion When on-premise archive solutions smoothly integrate with federated storage, then public and private clouds provide extensive scalability and global availability. Yet we see too many end-users treating the cloud as just another storage tier for low value retained data. This is a huge waste of cloud possibilities but we understand why it happens: cloud platforms with poor performance and delivery mechanisms can make cloud-based storage more trouble than it’s worth. But when we talk about EMC Atmos we are not talking about a low-cost storage tier, far from it. We are describing the heart of business innovation based on highly secure and highly accessible global data stores. EMC’s long expertise with object-based storage has kept Centera relevant and has extended dynamic data management to the cloud with Atmos. The Atmos-fueled cloud replaces hierarchical file storage while allowing the secure flow of information between the data center, the distributed cloud, and global access points. Customers profit from greatly improved application and data delivery, and the deep business value inherent in their valuable data. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 11 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com
  • 12. Technology in Brief When a company is dealing with geographic reach and large growing volumes of rich content, then they should look to object-based storage in the cloud. We fully support EMC in its push to scale capacity, performance, availability and management far beyond what traditional file systems are capable of, and more massively than ever before. .NOTICE: The information and product recommendations made by Taneja Group are based upon public information and sources and may also include personal opinions both of Taneja Group and others, all of which we believe to be accurate and reliable. However, as market conditions change and not within our control, the information and recommendations are made without warranty of any kind. All product names used and mentioned herein are the trademarks of their respective owners. Taneja Group, Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the information and recommendations presented herein, nor for any inadvertent errors that may appear in this document. Copyright The TANEJA Group, Inc. 2012. All Rights Reserved. 12 of 12 87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557 www.tanejagroup.com