SlideShare a Scribd company logo
Welcome                                                                 NoSQL for Data Services,
                                                                                Data Virtualization & Big
                                                                                Data
                                                                                Guido Schmutz
                                                                                25.9.2012




BASEL   BERN   LAUSANNE     ZÜRICH         DÜSSELDORF          FRANKFURT A.M.   FREIBURG I.BR.   HAMBURG   MÜNCHEN   STUTTGART   WIEN




                      2012 © Trivadis
 1
                      NoSQL for Data Services, Data Virtualization & Big Data
                      25.9.2012
Guido Schmutz

•   Working for Trivadis for more than 15 years
•   Oracle ACE Director for Fusion Middleware and SOA
•   Co-Author of different books
•   Consultant, Trainer Software Architect for Java, Oracle, SOA
    and EDA
•   Member of Trivadis Architecture Board
•   Technology Manager @ Trivadis


•   More than 20 years of software development
    experience


•   Contact: guido.schmutz@trivadis.com
•   Blog: http://guidoschmutz.wordpress.com
•   Twitter: gschmutz

                2012 © Trivadis
2
                NoSQL for Data Services, Data Virtualization & Big Data
                25.9.2012
Agenda


1. Why NoSQL and what is it?
2. NoSQL Database Types
3. Polyglot Persistence
4. Data Virtualization Layer
5. Summary




              2012 © Trivadis
3
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
History of Database


1960s            File-based, Network (CODASYL) and Hierarchical
Databases
1970s            Relational Database
1980             SQL became the standard query language
Early 1990       Object-Databases
Late 1990        XML Databases
2004             NoSQL Databases




             2012 © Trivadis
4
             NoSQL for Data Services, Data Virtualization & Big Data
             25.9.2012
What„s wrong with Relational Databases ? They are great ….


•   SQL provides a rich, declarative query language
•   Database enforce referential integrity
•   ACID semantics
•   Well understood by developers, database administrators
•   Well supported by different languages, frameworks and tools
    • Hibernate, JPA, JDBC, iBATIS, Entity Framework
•   Well understood and accepted by operations people (DBAs)
    •   Configuration
    •   Monitoring
    •   Backup and Recovery
    •   Tuning
    •   Design

                2012 © Trivadis
5
                NoSQL for Data Services, Data Virtualization & Big Data
                25.9.2012
Relational Databases are great ... But!
                                                                                                                        ORDER
                                                                           Order
                                                                            ID: 1001

Problem: Complex Object graphs
                                                                            Order Date: 15.9.2012

                                                                             Customer

     Object/Relational impedance mismatch
                                                                                                                        CUSTOMER
                                                                              First Name: Peter
                                                                              Last Name: Sample


     Complicated to map rich domain model
                                                                               Billing Address
                                                                                   Street: Somestreet 10

      to relational schema
                                                                                   City: Somewhere
                                                                                   Postal Code: 55901                   ADDRESS



     Performance issues                                                     Line Items


         Many rows in many tables                                            Name

                                                                              Ipod Touch
                                                                                                 Quantity

                                                                                                           1
                                                                                                                Price

                                                                                                               220.95
                                                                                                                        ORDER_LINES



         Many joins                                                          Monster Beat                 2   190.00


         Eager vs. lazy loading                                              Apple Mouse                  1    69.90




Problem: Schema evolution
     Adding attributes to an object => have to add columns to table
     Expensive, if lots of data in that table
      - Holding locks on the tables for long time
      - Application downtime …

                 2012 © Trivadis
6
                 NoSQL for Data Services, Data Virtualization & Big Data
                 25.9.2012
ORDER
                                                                      Order



Relational Databases are great ... But!
                                                                       ID: 1001
                                                                       Order Date: 15.9.2012

                                                                        Customer
                                                                                                                   CUSTOMER
                                                                         First Name: Peter
                                                                         Last Name: Sample
                                                                          Billing Address
                                                                              Street: Somestreet 10
                                                                              City: Somewhere
                                                                              Postal Code: 55901                   ADDRESS



                                                                        Line Items
                                                                         Name               Quantity       Price   ORDER_LINES

                                                                         Ipod Touch                   1   220.95

                                                                         Monster Beat                 2   190.00

                                                                         Apple Mouse                  1    69.90




                                                                              Consumer


                                                                                          REST/SOAP



                                                                              Service


                                                                              Repository/DAO


                                                                              O/R Mapping



                                                                                                SQL




                                                                              RDBMS




            2012 © Trivadis
7
            NoSQL for Data Services, Data Virtualization & Big Data
            25.9.2012
Relational Databases are great ... But!


Problem: Semi-structured data
     Relational schema doesn„t easily handle semi-structured data
     Common solutions
      - Name/Value table
        - Poor performance
        - Lack of constraint
      - Serialize as Blob
        - Fewer joins, but no query capabilities
Problem: Scaling
     Scaling writes difficult/expensive/impossible => BigData
     Vertical scaling is limited and is expensive
     Horizontal scaling is limited and is expensive



                 2012 © Trivadis
8
                 NoSQL for Data Services, Data Virtualization & Big Data
                 25.9.2012
Solution: NoSQL ?


No standard definition of what NoSQL means
•   Not Only SQL
Term began in a workshop organized in 2009
but some common characteristics of NoSQL databases
•   They don„t use the relational data model and thus don„t use SQL
•   Tend to be designed to run on cluster
                                                                                                  RDBMS                                NoSQL
•   Tend to be Open Source                                           Presentation
                                                                         Tier
                                                                                                 User Interface                       User Interface




•   Schema-Less - Don„t have a fixed                                                                                                                         Key Value Stores


    schema, allowing to store any




                                                                                                                                                                                Services
                                                                                                                                Caching             Search
                                                                      Middle Tier    Object-Relational    Relational-Object                                      Lucene
                                                                                                                              Transactions          Batch


    data in any record
                                                                                                                                                               MapReduce




•   Different APIs
                                                                                           Search             Blobs


                                                                     Database Tier      Transactions          Batch                          Data


                                                                                          Caching            Triggers




                 2012 © Trivadis
9
                 NoSQL for Data Services, Data Virtualization & Big Data
                 25.9.2012
Central vs. Application Databases

Central Database                                                                   Application Database
•    Using SQL as the integration mechanism                                        •   Only accessed by a single application
     between applications
                                                                                   •   Only the application using the database
•    applications store data in common DB                                              needs to know about the structure
•    Improves communication, all applications                                      •   Easier to maintain and evolve the schema
     operate on consistent set of data
                                                                                   •   More freedom to choose the database
•    Structure ends up to be more complex
                                                                                   •   Applicable to SOA (i.e. Data Service/Entity
•    Changes need to be coordinated with all                                           Service) with good Service Autonomy
     other applications using the database
                                                                                   •   Ready for the cloud
•    Side-effects (i.e. adding database index)

     Application 1   Application 2         Application 3                                Application 1   Application 2   Application 3




                         DB                                                                 DB              DB               DB


                         2012 © Trivadis
10
                         NoSQL for Data Services, Data Virtualization & Big Data
                         25.9.2012
Relational vs. Aggregate Data Models


 The relational model takes the                                             Aggregate is a term that comes
  information and divides it into                                             from Domain-Driven Design
  tuples (rows)                                                               (Evans)
 A tuple is a limited data structure                                        An aggregate is a collection of
      no nesting of tuples                                                   related objects, that should be
      no list of values                                                      treated as a unit
                                                                               Unit for data manipulation and
                                                                                management of consistency




                  2012 © Trivadis
11
                  NoSQL for Data Services, Data Virtualization & Big Data
                  25.9.2012
Relational vs. Aggregate Data Model


Relational Instance                                                               Aggregate Instance
CUSTOMER                                     PRODUCT
 ID    NAME                                   ID       NAME                       {
  1    Guido                                 1000    IPod Touch                   „id“:1,
                                             1020   Monster Beat                  „name“:“Guido“,
 BILLING_ADDRESS                                                                  „billingAddress“:[{„street“:“Chaumontweg“,“city“:“Spiegel“,“postCode“:“3095“}]
                                                                                  }
 ID      CUSTOMER_ID        ADDRESS_ID
  1           1                 55                                                {
                                                                                  „id“:90,
      ADDRESS                                                                     „customerId“:1,
 ID         STREET               CITY               POST_CODE                     „orderItems“:[
                                                                                    {
 55       Chaumontweg           Spiegel                3095
                                                                                    „productId“:1000,“price“: 250.55, „produtName“: „iPod Touch“
                                                                                    },
 ORDER                                                                              {
 ID    CUSTOMER_ID         SHIPPING_ADDRESS_ID                                      „productId“:1020,“price“: 199.55, „produtName“: „Monster Beat“
 90         1                       55                                              }],
                                                                                  „sippingAddress“:[{„street“:“Chaumontweg“,“city“:“Spiegel“,“postCode“:“3095“}]
                                                                                  }
ORDER_ITEM
 ID       ORDER_ID           PRODUCT_ID                PRICE
  1          90                 1000                   250.55
  1          90                 1020                   199.55




                        2012 © Trivadis
12
                        NoSQL for Data Services, Data Virtualization & Big Data
                        25.9.2012
Agenda


1. Why NoSQL and what is it?
2. NoSQL Database Types
3. Polyglot Persistence
4. Data Virtualization Layer
5. Summary




              2012 © Trivadis
15
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
NoSQL Database Types
              Key/Value                              Column Family          Document          Graph
 Key/Value Stores
Design            Collections of  Columns and                               Key/Value pairs   Focus on the
    Ordered Key-Value Stores Colum Families.
                  Key/Value Pairs                                           but value is      connections
                                  Acesses directly                          interpreted by    between data and
    Big Table Stores (map-of-maps-of-maps)
                                  the colum values.                         the database      the fast navigation

   Document Stores
Scalability/ +++                                    +++                    ++                ++
Performance
  Graph Databases
Aggregate-  Yes                                     Yes                    Yes               No
oriented
Complexity         +                                 ++                     ++                +++
Inspiration and    Berkley DB,                       SAP Sybase IQ,         Lotus Notes       Graph Theory
Relation           Memcached,                        BigTable
                   Distributed
                   Hashmaps

NoSQL              Voldemort                         Hbase                  CouchDB           Sones
Products           Redis                             Cassandra              MongoDB           Neo4J
                   Riak                              Hypertable             OrientDB          InfoGrid
                                                     Amazon SimpleDB        RavenDB           FlockDB
                  2012 © Trivadis
16
                  NoSQL for Data Services, Data Virtualization & Big Data
                  25.9.2012
NoSQL Database Types
Size




       Key-value stores


                                        Column Family

                                                                          Document
                                                                                     Graph




                Relational



                                                                                      Complexity

                2012 © Trivadis
17
                NoSQL for Data Services, Data Virtualization & Big Data
                25.9.2012
Key Value Databases

 A key-value store is a simple hash table
 Primarily used when all access to the database is via primary key
 Simplest NoSQL data stores to use (from an API perspective)
            PUT, GET, DELETE (matches REST)

 Value is a blob with the data store not caring or knowing what is inside
 Aggregate-Oriented


Suitable Use Cases
•        Storing Session Information
•        User Profiles, Preferences
•        Shopping Cart Data


                        2012 © Trivadis
    18
                        NoSQL for Data Services, Data Virtualization & Big Data
                        25.9.2012
Column-Family Stores

 Store data in column families as rows that have many columns associated with a
  row key
 Column families are groups of related data, often accessed together
 Aggregate-Oriented




Suitable Use Cases
•        Event Logging
•        Content Management Systems
•        Counters
                                                                                   Source: NoSQL Distilled
•        Expiring Usage
                         2012 © Trivadis
    19
                         NoSQL for Data Services, Data Virtualization & Big Data
                         25.9.2012
Document Databases

 Documents are the main concept
 Stores and retrieves documents, which can be XML, JSON, BSON, …
 Documents are self-describing, hierarchical tree data structures which can consist
  of maps, collections and scalar values
 Documents stored are similar to each other but do not have to be exactly the
  same
 Aggregate-Oriented

Suitable Use Cases
•        Event Logging
•        Content Management Systems
•        Web Analytics or Real-Time Analytics
•        Product Catalog
                         2012 © Trivadis
    20
                         NoSQL for Data Services, Data Virtualization & Big Data
                         25.9.2012
Document Database - MongoDB




         2012 © Trivadis
21
         NoSQL for Data Services, Data Virtualization & Big Data
         25.9.2012
Graph Databases

 Allow to store entities and relationships between these entities
 Entities are known as nodes, which have properties
 Relations are known as edges, which also have properties
 A query on the graph is also known as traversing the graph
 Traversing the relationships is very fast

                                                                                                                                            Tag
                                                                                                              Customer


Suitable Use Cases
                                                                                Country
                                                                                                                         RATED            TAG



•        Connected Data
                                                                                                    ADDRESS
                                                                                   COUNTRY                                            Product


•        Routing, Dispatch and Location-Based                                                                 BILLING_
                                                                                                                          LINE_ITEM
         Services
                                                                                                              ADDRESS
                                                                                          Address


         Recommendation Engines
                                                                                                                         Order
•                                                                                                       DELIVERY_
                                                                                                        ADDRESS




                      2012 © Trivadis
    22
                      NoSQL for Data Services, Data Virtualization & Big Data
                      25.9.2012
Graph Database – Neo4J




Query through Cypher
 START MATCH WHERE RETURN ORDER BY LIMIT
 customer=node:Customer(email = "david@dmband.com")
 customer-[:ORDERED]->order-[item:LINEITEM]->product
 order.date > 20120101
 product.name, sum(item.amount) AS product
 products DESC 20

                     2012 © Trivadis
 23
                     NoSQL for Data Services, Data Virtualization & Big Data
                     25.9.2012
Agenda


1. Why NoSQL and what is it?
2. NoSQL Database Types
3. Polyglot Persistence
4. Data Virtualization Layer
5. Summary




              2012 © Trivadis
24
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
Polyglot Persistence


In 2006, Neal Ford coined the term Polyglot Programming
 Applications should be written in a mix of languages to take advantage
  of the fact that different languages are suitable for tackling different
  problems
Polyglot Persistence defines a a hybrid approach to persistence
 Using multiple data storage technologies
 Selected based on the way data is being used by individual applications
      Why store binary images in relational databases, when there are better
       storage systems?
 Can occur both over the enterprise as well as within a single application



                 2012 © Trivadis
25
                 NoSQL for Data Services, Data Virtualization & Big Data
                 25.9.2012
„Traditional“ Persistence Model
Polyglot Persistence                                                                                         E-commerce Application




Today we use the same
database for all kind of data                                        Shopping cart data      User Sessions        Completed Order     Product Catalog   Recomendations




•    Business transactions, session
     management data, reporting,                                                                                    RDBMS


     logging information, content
     information, ...
No need for same properties of                                                                    Polygot Persistence Model
availability, consistency or                                                                                 E-commerce Application

backup requirements
Polyglot Data Storage Usage
allows to mix and match                                              Shopping cart data      User Sessions        Completed Order     Product Catalog   Recomendations




Relational and NoSQL data
stores
                                                                                      Key-Value                     RDMBS              Document            Graph




                 2012 © Trivadis
26
                 NoSQL for Data Services, Data Virtualization & Big Data
                 25.9.2012
Polyglot Persistence – Challenges

• Decisions
         • Have to decide what data storage technology to use
         • Today it„s easier to go with relational
• New Data Access APIs
     •     Each data store has its own mechanisms for accessing the data
     •     Different API‟s
                                                                                                    Service-Oriented Polygot Persistence Model
                                                                                                                         E-commerce Application

     •     Solution: Wrap the data access
           code into services (Data/Entity
           Service) exposed to applications
     •     Will enforce a contract/schema
                                                                               Shopping cart data        User Sessions        Completed Order      Product Catalog    Recomendations




           to a schemaless database
                                                                                 Key-Value                                                                               Graph
                                                                                                                                RDMBS               Document

                                                                               Shopping Cart            User Session                              Product Catalog    Recomendation
                                                                                 Service                  Service            Order Service            Service           Service




                     2012 © Trivadis
27
                     NoSQL for Data Services, Data Virtualization & Big Data
                     25.9.2012
Polyglot Persistence – Challenges


•    Immaturity
     • NoSQL tools are still young, full of rough edges that new tools have
     • Not much experience, we don„t know how to use them well
     • No patterns and best practices exist yet


•    Organizational Change
     • How will the different data groups in an enterprise react to this new
       technology


•    Dealing with eventual consistency paradigm
     • Reaction of different stakeholders to the fact that data could be stale
     • How to enforce rules to sync data across systems

                  2012 © Trivadis
28
                  NoSQL for Data Services, Data Virtualization & Big Data
                  25.9.2012
Agenda


1. What is NoSQL and Big Data
2. NoSQL Database Types
3. Polyglot Persistence
4. Data Virtualization Layer
5. Summary




              2012 © Trivadis
29
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
Data Access Architecture for Polyglot Persistence


well known design patterns are still valid!
some best practices we know in data access are still valid!

     Consumer                                Consumer                            Consumer


           REST/SOAP                                                                   REST/SOAP



     Service                                                                     Service

                                                           REST
     Repository/DAO                                                              Repository/DAO


     O/R Mapping

                                                                                            ???
               SQL

                                             REST API

     RDBMS                                   NoSQL                               NoSQL



                       2012 © Trivadis
30
                       NoSQL for Data Services, Data Virtualization & Big Data
                       25.9.2012
Middle Tier Architecture for Polyglot Persistence




                                                                                                                                                  Resource Tier
                                                                                 Middle Tier
     Consumer                   Integration           Service                   Application                        Domain           Integration

                                                                                                              Domain Service Bean




                                                        Web Service Exporter




                                                                                   Application Service Bean
                                   REST
        Composite Application




                                                                                                                Factory Bean

                                  SOAP
                                                                                                                                    O/R Mapping
                                                                                                                Domain Objects
                                                                                                                                    NoSQL API
                                                                                                                 Repository Bean
                                                                                                                    Aggregate        SQL API


                                                                                                                  DAO Bean


                                                                               Data Transfer Objects (DTO)




                                    2012 © Trivadis
31
                                    NoSQL for Data Services, Data Virtualization & Big Data
                                    25.9.2012
Polyglot Persistence with Spring Data


makes it easier to build Spring-powered applications that use new data
access technologies
provide improved support for relational database technologies
Commons project supports Polyglot Persistence
Currently support for:
•    JPA and JDBC (relational)                                             Consumer


•    Apache Hadoop                                                               REST/SOAP



•    GemFire                                                               Service


•    REST                                                                  Repository/DAO


•    Redis
•    MongoDB                                                                          ???



•    Neo4J
•    Hbase                                                                 NoSQL




                 2012 © Trivadis
32
                 NoSQL for Data Services, Data Virtualization & Big Data
                 25.9.2012
Spring Data – Mapping to Relational Database (using JPA)
                                                                     Annotations define the mapping:
                                                                     @Entity, @Id, @Column,
                                                                     @OneToOne, @OneToMany,
                                                                     @JoinColumn,


                                                                                             Consumer


                                                                                                   REST/SOAP



                                                                                             Service


                                                                                             Repository/DAO




                                                                                                        ???




                                                                                             NoSQL




           2012 © Trivadis
33
           NoSQL for Data Services, Data Virtualization & Big Data
           25.9.2012
Spring Data – Mapping to Relational Database
                                                                                                                         Consumer


                                                                                                                               REST/SOAP



                                                                                                                         Service


public interface CustomerRepository extends Repository<Customer, Long> {                                                 Repository/DAO


   Customer findByEmailAddress(EmailAddress emailAddress);
}                                                                                                                                   ???




@Repository                                                                                                              NoSQL


@Profile(“jpa")
class JpaCustomerRepository implements CustomerRepository {
   @Override
   public Customer findByEmailAddress(EmailAddress emailAddress) {

     TypedQuery<Customer> query = em.createQuery(
               "select c from Customer c where c.emailAddress = :email“, Customer.class);
     query.setParameter("email", emailAddress);
                                                              Customer guido= repository.findByEmailAddress(new
     return query.getSingleResult();                          EmailAddress(“guido@hotmail.com"));
}
                                                                                Customer anotherCust= new Customer(“Peter", “Sample");
<jpa:repositories base-package="com.oreilly.springdata.jpa" />                  anotherCust.setEmailAddress(guido.getEmailAddress());

                                                            repository.save(anotherCust);
<bean class="org.springframework.orm.jpa.JpaTransactionManager">
    <property name="entityManagerFactory" ref="entityManagerFactory" />
</bean>

<bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
   <property name="dataSource" ref="dataSource" />
   <property name="packagesToScan" value="com.oreilly.springdata.jpa" />
</bean>

                      2012 © Trivadis
34
                      NoSQL for Data Services, Data Virtualization & Big Data
                      25.9.2012
Spring Data – Mapping to MongoDB
                                                                    Annotations define the mapping:
                                                                    @Document, @Id, @Indexed,
                                                                    @PersistenceConstructor,
                                                                    @CompoundIndex, @DBRef,
                                                                    @GeoSpatialIndex, @Value


                                                                                            Consumer


                                                                                                  REST/SOAP



                                                                                            Service


                                                                                            Repository/DAO




                                                                                                       ???




                                                                                            NoSQL




          2012 © Trivadis
35
          NoSQL for Data Services, Data Virtualization & Big Data
          25.9.2012
Spring Data – Generic Repositories for MongoDB
                                                                                                    Consumer


                                                                                                          REST/SOAP



                                                                                                    Service


public interface CustomerRepository extends Repository<Customer, Long> {                            Repository/DAO


   Customer findByEmailAddress(EmailAddress emailAddress);
}                                                                                                              ???




@Repository                                                                                         NoSQL


@Profile("mongodb")
class MongoDbCustomerRepository implements CustomerRepository {
   @Override
   public Customer findByEmailAddress(EmailAddress emailAddress) {

            Query query = query(where("emailAddress").is(emailAddress));
            return operations.findOne(query, Customer.class);
     }

<mongo:db-factory id="mongoDbFactory" dbname="e-store" />

<mongo:mapping-converter id="mongoConverter" base-package="com.oreilly.springdata.mongodb">
<mongo:custom-converters base-package="com.oreilly.springdata.mongodb" />
</mongo:mapping-converter>

<bean id="mongoTemplate" class="org.springframework.data.mongodb.core.MongoTemplate">
                                                          Customer guido= repository.findByEmailAddress(new
 <constructor-arg ref="mongoDbFactory" />                 EmailAddress(“guido@hotmail.com"));
 <constructor-arg ref="mongoConverter" />
 <property name="writeConcern" value="SAFE" />            Customer anotherCust= new Customer(“Peter", “Sample");
</bean>                                                   anotherCust.setEmailAddress(guido.getEmailAddress());
<mongo:repositories base-package="com.oreilly.springdata.mongodb" />
                                                           repository.save(anotherCust);
                     2012 © Trivadis
36
                     NoSQL for Data Services, Data Virtualization & Big Data
                     25.9.2012
Spring Data – Mapping to Neo4J
                                                                                                    Annotations define the mapping:
                                                                                                    @NodeEntity, RelationShipEntity,
                                                                                                    @GraphId, @RelatedTo,
                                                                                                    @RelatedToVia, @EndNode, @Fetch,




                                                                 Tag
                               Customer
 Country
                                              RATED            TAG


                     ADDRESS
     COUNTRY                                               Product


                               BILLING_
                               ADDRESS         LINE_ITEM
           Address

                                              Order
                         DELIVERY_
                         ADDRESS


                                          2012 © Trivadis
37
                                          NoSQL for Data Services, Data Virtualization & Big Data
                                          25.9.2012
Spring Data – Generic Repositories for MongoDB
                                                                                                      Consumer


                                                                                                            REST/SOAP



                                                                                                      Service


                                                                                                      Repository/DAO




public interface CustomerRepository extends GraphRepository<Customer> {
                                                                                                                 ???




   Customer findByEmailAddress(EmailAddress emailAddress);
}
                                                                                                      NoSQL




<neo4j:config graphDatabaseService="graphDatabaseService" />
<neo4j:repositories base-package="com.oreilly.springdata.neo4j" />

<bean id="graphDatabaseService" class="org.neo4j.kernel.EmbeddedGraphDatabase" destroy-method="shutdown">
  <constructor-arg value="target/graph.db" />
</bean>



Customer guido= repository.findByEmailAddress(new
EmailAddress(“guido@hotmail.com"));

Customer anotherCust= new Customer(“Peter", “Sample");
anotherCust.setEmailAddress(guido.getEmailAddress());

repository.save(anotherCust);




                      2012 © Trivadis
38
                      NoSQL for Data Services, Data Virtualization & Big Data
                      25.9.2012
Expose contract-first Web service
                                                                        Consumer


                                                                              REST/SOAP



                                                                        Service


                                                                        Repository/DAO


Use any Java Web Service Framework which supports
Contract-First approach                                                            ???




Can be SOAP or can be REST                                              NoSQL



Maps the data contract to the schemaless database
Uses the different Repository implementations
Must handle data migration issues together with the
Repository




              2012 © Trivadis
39
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
Schemaless – We still have to migrate the data!


With RDMBS we are used to keep DDL scripts together                           Customer


with DML scripts for each single data model change
                                                                              Name: Peter Sample
                                                                              First Name: Peter
                                                                              Last Name: Sample
                                                                               BillingAddress
                                                                               Billing Address

     •   Has to be in sync with the data access code                            Street: Somestreet 10   Version 1.0
                                                                                City: Somewhere
                                                                                Postal Code:55901
                                                                                PostalCode: 55901

RDBMS has to be changed before the application
is changed => possible application downtime                                   Customer
     •   This is what the schemaless approach of most NoSQL                   Name: Peter Sample
                                                                              FirstName: Peter
         DB tries to avoid                                                    LastName: Sample
                                                                                                        Transition
                                                                               Billing Address
                                                                                                        Version 1.0 => 2.0

Schemaless DBs still need careful migration, due to
                                                                                Street: Somestreet 10
                                                                                City: Somewhere

implicit schema in any data access code
                                                                                PostalCode: 55901




But a more “on-demand” approach is possible                                   Customer

     •   Code can read data in a way that it tolerant to                      First Name: Peter
                                                                              Last Name: Sample

         changes in the data‟s implicit schema and migrate                     Billing Address          Version 2.0

         the data on the next update
                                                                                Street: Somestreet 10
                                                                                City: Somewhere
                                                                                PostalCode: 55901

     •   Similar to service versioning => gradual change


                    2012 © Trivadis
40
                    NoSQL for Data Services, Data Virtualization & Big Data
                    25.9.2012
Agenda


1. What is NoSQL and Big Data
2. NoSQL Database Types
3. Polyglot Persistence
4. Data Virtualization Layer and Data Architecture
5. Summary




              2012 © Trivadis
41
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
Pros & Cons of NoSQL compared to RDBMS


Pros                                                                   Cons
• No O/R impedance mismatch                                            • Lacks in tool and framework
                                                                         support
• Can easily evolve schemas
                                                                       • Few other implementations =>
• Can represent semi-structured
                                                                         potential lock in
  info
                                                                       • No support for ad-hoc queries
• Can represent graphs/networks
  (with performance)                                                   • Another/A new database in
                                                                         production to take care of




             2012 © Trivadis
42
             NoSQL for Data Services, Data Virtualization & Big Data
             25.9.2012
Summary


Relational databases are here to stay but NoSQL offers new persistence
model
Polyglot Persistence will be the future
Schemaless does not mean there is no data migration! => but a more on-
demand model might be possible
Encapsulate data access code to be able to switch databases
Service-orientation provides the data contract to a NoSQL database => to
make information reusable
Don„t commit to a NoSQL until you have done a significant PoC
Make sure that Operations people (DBAs) are on board early enough
Non-relational is not new in an enterprise (OLTP vs. OLAP)

              2012 © Trivadis
43
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
Possible Use Cases

 NoSQL for parallel ETL?
 NoSQL for modern BI
 NoSQL for stateful Middletier (i.e. shopping cart)
 NoSQL for aggregated master data (i.e. through REST for Web apps)
 NoSQL for a CMS-Store, directly accessible through REST
 NoSQL as a local Store for Mobile applications
 NoSQL for Event Sourcing and CQRS architectures




              2012 © Trivadis
44
              NoSQL for Data Services, Data Virtualization & Big Data
              25.9.2012
Further Information




           2012 © Trivadis
45
           NoSQL for Data Services, Data Virtualization & Big Data
           25.9.2012
VIELEN DANK.
                                                                                 Trivadis

                                                                                 Guido Schmutz
                                                                                 guido.schmutz@trivadis.com

                                                                                 info@trivadis.com
                                                                                 www.trivadis.com




BASEL    BERN   LAUSANNE     ZÜRICH         DÜSSELDORF          FRANKFURT A.M.   FREIBURG I.BR.   HAMBURG   MÜNCHEN   STUTTGART   WIEN




                       2012 © Trivadis
 46
                       NoSQL for Data Services, Data Virtualization & Big Data
                       25.9.2012

More Related Content

PPTX
NoSQL Databases for Implementing Data Services – Should I Care?
PPTX
NoSQL and SOA
PDF
Aod Print Portfolio
PDF
IGC Microsoft SharePoint Solutions
PDF
Oren Jacob w/ Toy Talk @ MamaBear Conference, Mt. View 4/20
PDF
IT Go to market transformation - Playing a new ball game
PDF
Enterprise intelligence apr2012 load - romania - 30 min
PPT
Legal status at the epo latipat
NoSQL Databases for Implementing Data Services – Should I Care?
NoSQL and SOA
Aod Print Portfolio
IGC Microsoft SharePoint Solutions
Oren Jacob w/ Toy Talk @ MamaBear Conference, Mt. View 4/20
IT Go to market transformation - Playing a new ball game
Enterprise intelligence apr2012 load - romania - 30 min
Legal status at the epo latipat

What's hot (8)

PDF
UPA Arizona Presentation: Designing web content to engage customers and incre...
PDF
Next generation MDM
PPTX
Meet XO
PPTX
Meet Xo Core Presentation 2012
PDF
Meet XO
PDF
Lolland kommune
PDF
1to1 messenger 26
PDF
E-commerce Technology for Safe money transaction over the net
UPA Arizona Presentation: Designing web content to engage customers and incre...
Next generation MDM
Meet XO
Meet Xo Core Presentation 2012
Meet XO
Lolland kommune
1to1 messenger 26
E-commerce Technology for Safe money transaction over the net
Ad

Viewers also liked (20)

PPT
Why Data Virtualization? An Introduction by Denodo
PDF
Unix Automation using centralized configuration management tool
PPTX
Hitchhiker's Guide to Open Source Cloud Computing
PPT
NoSQL Seminer
PPTX
PDF
Combine Spring Data Neo4j and Spring Boot to quickl
PDF
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
ODP
Writing and testing high frequency trading engines in java
PPTX
Access control attacks by nor liyana binti azman
PPT
Debs 2011 tutorial on non functional properties of event processing
PDF
Comparative Analysis of Personal Firewalls
PDF
Installing Complex Event Processing On Linux
PPTX
Reactconf 2014 - Event Stream Processing
PPTX
Session hijacking
PDF
Tutorial in DEBS 2008 - Event Processing Patterns
PPT
Complex Event Processing with Esper and WSO2 ESB
PPT
Chapter 12
PDF
Ceh v8 labs module 03 scanning networks
PPSX
CyberLab CCEH Session - 3 Scanning Networks
PDF
Nmap scripting engine
Why Data Virtualization? An Introduction by Denodo
Unix Automation using centralized configuration management tool
Hitchhiker's Guide to Open Source Cloud Computing
NoSQL Seminer
Combine Spring Data Neo4j and Spring Boot to quickl
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Writing and testing high frequency trading engines in java
Access control attacks by nor liyana binti azman
Debs 2011 tutorial on non functional properties of event processing
Comparative Analysis of Personal Firewalls
Installing Complex Event Processing On Linux
Reactconf 2014 - Event Stream Processing
Session hijacking
Tutorial in DEBS 2008 - Event Processing Patterns
Complex Event Processing with Esper and WSO2 ESB
Chapter 12
Ceh v8 labs module 03 scanning networks
CyberLab CCEH Session - 3 Scanning Networks
Nmap scripting engine
Ad

Similar to NoSQL for Data Services, Data Virtualization & Big Data (20)

PDF
Enabling Supplier Communities
PDF
Ebs architecture con9036_pdf_9036_0001
PDF
The Modern Web Part 4: Cloud Computing
PDF
PromptCloud Nasscom Emerge 50 Presentation
PDF
Architecting Cloud Solutions
 
PDF
Logical Data Fabric and Data Mesh – Driving Business Outcomes
PDF
Employing Enterprise Application Integration (EAI)
PDF
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
PPTX
Kevin jackson cloud service brokerage for datacenter service providers for we...
PDF
Expert Panel The Future of NoSQL Databases
PPTX
Big Data i CSC's optik, CSC Representative
PDF
Tear down this wall PESGB
PPSX
Power pointshow
PDF
DYN MassTLC go-to-market strategy
PDF
Scaling MySQL: Benefits of Automatic Data Distribution
PPTX
Tech Talk SQL Server 2012 Business Intelligence
PPSX
Brac Delta Housing Finance Limited
PPTX
Tera stream for datastreams
PDF
SISO LSA AND OMG DDS
PPTX
ADO.NET Data Services
Enabling Supplier Communities
Ebs architecture con9036_pdf_9036_0001
The Modern Web Part 4: Cloud Computing
PromptCloud Nasscom Emerge 50 Presentation
Architecting Cloud Solutions
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Employing Enterprise Application Integration (EAI)
DDN Accelerating-Decisions-Through-Enterprise-Hadoop-final
Kevin jackson cloud service brokerage for datacenter service providers for we...
Expert Panel The Future of NoSQL Databases
Big Data i CSC's optik, CSC Representative
Tear down this wall PESGB
Power pointshow
DYN MassTLC go-to-market strategy
Scaling MySQL: Benefits of Automatic Data Distribution
Tech Talk SQL Server 2012 Business Intelligence
Brac Delta Housing Finance Limited
Tera stream for datastreams
SISO LSA AND OMG DDS
ADO.NET Data Services

More from Guido Schmutz (20)

PDF
30 Minutes to the Analytics Platform with Infrastructure as Code
PDF
Event Broker (Kafka) in a Modern Data Architecture
PDF
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
PDF
ksqlDB - Stream Processing simplified!
PDF
Kafka as your Data Lake - is it Feasible?
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
PDF
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
Location Analytics - Real-Time Geofencing using Apache Kafka
PDF
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
PDF
What is Apache Kafka? Why is it so popular? Should I use it?
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
PDF
Location Analytics Real-Time Geofencing using Kafka
PDF
Streaming Visualisation
PDF
Kafka as an event store - is it good enough?
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
PDF
Fundamentals Big Data and AI Architecture
PDF
Location Analytics - Real-Time Geofencing using Kafka
PDF
Streaming Visualization
30 Minutes to the Analytics Platform with Infrastructure as Code
Event Broker (Kafka) in a Modern Data Architecture
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
ksqlDB - Stream Processing simplified!
Kafka as your Data Lake - is it Feasible?
Event Hub (i.e. Kafka) in Modern Data Architecture
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Building Event Driven (Micro)services with Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
What is Apache Kafka? Why is it so popular? Should I use it?
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Location Analytics Real-Time Geofencing using Kafka
Streaming Visualisation
Kafka as an event store - is it good enough?
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Fundamentals Big Data and AI Architecture
Location Analytics - Real-Time Geofencing using Kafka
Streaming Visualization

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Spectroscopy.pptx food analysis technology
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
KodekX | Application Modernization Development
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
sap open course for s4hana steps from ECC to s4
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectroscopy.pptx food analysis technology
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
MYSQL Presentation for SQL database connectivity
KodekX | Application Modernization Development
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
sap open course for s4hana steps from ECC to s4

NoSQL for Data Services, Data Virtualization & Big Data

  • 1. Welcome NoSQL for Data Services, Data Virtualization & Big Data Guido Schmutz 25.9.2012 BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 2012 © Trivadis 1 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 2. Guido Schmutz • Working for Trivadis for more than 15 years • Oracle ACE Director for Fusion Middleware and SOA • Co-Author of different books • Consultant, Trainer Software Architect for Java, Oracle, SOA and EDA • Member of Trivadis Architecture Board • Technology Manager @ Trivadis • More than 20 years of software development experience • Contact: guido.schmutz@trivadis.com • Blog: http://guidoschmutz.wordpress.com • Twitter: gschmutz 2012 © Trivadis 2 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 3. Agenda 1. Why NoSQL and what is it? 2. NoSQL Database Types 3. Polyglot Persistence 4. Data Virtualization Layer 5. Summary 2012 © Trivadis 3 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 4. History of Database 1960s File-based, Network (CODASYL) and Hierarchical Databases 1970s Relational Database 1980 SQL became the standard query language Early 1990 Object-Databases Late 1990 XML Databases 2004 NoSQL Databases 2012 © Trivadis 4 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 5. What„s wrong with Relational Databases ? They are great …. • SQL provides a rich, declarative query language • Database enforce referential integrity • ACID semantics • Well understood by developers, database administrators • Well supported by different languages, frameworks and tools • Hibernate, JPA, JDBC, iBATIS, Entity Framework • Well understood and accepted by operations people (DBAs) • Configuration • Monitoring • Backup and Recovery • Tuning • Design 2012 © Trivadis 5 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 6. Relational Databases are great ... But! ORDER Order ID: 1001 Problem: Complex Object graphs Order Date: 15.9.2012 Customer  Object/Relational impedance mismatch CUSTOMER First Name: Peter Last Name: Sample  Complicated to map rich domain model Billing Address Street: Somestreet 10 to relational schema City: Somewhere Postal Code: 55901 ADDRESS  Performance issues Line Items  Many rows in many tables Name Ipod Touch Quantity 1 Price 220.95 ORDER_LINES  Many joins Monster Beat 2 190.00  Eager vs. lazy loading Apple Mouse 1 69.90 Problem: Schema evolution  Adding attributes to an object => have to add columns to table  Expensive, if lots of data in that table - Holding locks on the tables for long time - Application downtime … 2012 © Trivadis 6 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 7. ORDER Order Relational Databases are great ... But! ID: 1001 Order Date: 15.9.2012 Customer CUSTOMER First Name: Peter Last Name: Sample Billing Address Street: Somestreet 10 City: Somewhere Postal Code: 55901 ADDRESS Line Items Name Quantity Price ORDER_LINES Ipod Touch 1 220.95 Monster Beat 2 190.00 Apple Mouse 1 69.90 Consumer REST/SOAP Service Repository/DAO O/R Mapping SQL RDBMS 2012 © Trivadis 7 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 8. Relational Databases are great ... But! Problem: Semi-structured data  Relational schema doesn„t easily handle semi-structured data  Common solutions - Name/Value table - Poor performance - Lack of constraint - Serialize as Blob - Fewer joins, but no query capabilities Problem: Scaling  Scaling writes difficult/expensive/impossible => BigData  Vertical scaling is limited and is expensive  Horizontal scaling is limited and is expensive 2012 © Trivadis 8 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 9. Solution: NoSQL ? No standard definition of what NoSQL means • Not Only SQL Term began in a workshop organized in 2009 but some common characteristics of NoSQL databases • They don„t use the relational data model and thus don„t use SQL • Tend to be designed to run on cluster RDBMS NoSQL • Tend to be Open Source Presentation Tier User Interface User Interface • Schema-Less - Don„t have a fixed Key Value Stores schema, allowing to store any Services Caching Search Middle Tier Object-Relational Relational-Object Lucene Transactions Batch data in any record MapReduce • Different APIs Search Blobs Database Tier Transactions Batch Data Caching Triggers 2012 © Trivadis 9 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 10. Central vs. Application Databases Central Database Application Database • Using SQL as the integration mechanism • Only accessed by a single application between applications • Only the application using the database • applications store data in common DB needs to know about the structure • Improves communication, all applications • Easier to maintain and evolve the schema operate on consistent set of data • More freedom to choose the database • Structure ends up to be more complex • Applicable to SOA (i.e. Data Service/Entity • Changes need to be coordinated with all Service) with good Service Autonomy other applications using the database • Ready for the cloud • Side-effects (i.e. adding database index) Application 1 Application 2 Application 3 Application 1 Application 2 Application 3 DB DB DB DB 2012 © Trivadis 10 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 11. Relational vs. Aggregate Data Models  The relational model takes the  Aggregate is a term that comes information and divides it into from Domain-Driven Design tuples (rows) (Evans)  A tuple is a limited data structure  An aggregate is a collection of  no nesting of tuples related objects, that should be  no list of values treated as a unit  Unit for data manipulation and management of consistency 2012 © Trivadis 11 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 12. Relational vs. Aggregate Data Model Relational Instance Aggregate Instance CUSTOMER PRODUCT ID NAME ID NAME { 1 Guido 1000 IPod Touch „id“:1, 1020 Monster Beat „name“:“Guido“, BILLING_ADDRESS „billingAddress“:[{„street“:“Chaumontweg“,“city“:“Spiegel“,“postCode“:“3095“}] } ID CUSTOMER_ID ADDRESS_ID 1 1 55 { „id“:90, ADDRESS „customerId“:1, ID STREET CITY POST_CODE „orderItems“:[ { 55 Chaumontweg Spiegel 3095 „productId“:1000,“price“: 250.55, „produtName“: „iPod Touch“ }, ORDER { ID CUSTOMER_ID SHIPPING_ADDRESS_ID „productId“:1020,“price“: 199.55, „produtName“: „Monster Beat“ 90 1 55 }], „sippingAddress“:[{„street“:“Chaumontweg“,“city“:“Spiegel“,“postCode“:“3095“}] } ORDER_ITEM ID ORDER_ID PRODUCT_ID PRICE 1 90 1000 250.55 1 90 1020 199.55 2012 © Trivadis 12 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 13. Agenda 1. Why NoSQL and what is it? 2. NoSQL Database Types 3. Polyglot Persistence 4. Data Virtualization Layer 5. Summary 2012 © Trivadis 15 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 14. NoSQL Database Types Key/Value Column Family Document Graph  Key/Value Stores Design Collections of Columns and Key/Value pairs Focus on the  Ordered Key-Value Stores Colum Families. Key/Value Pairs but value is connections Acesses directly interpreted by between data and  Big Table Stores (map-of-maps-of-maps) the colum values. the database the fast navigation Document Stores Scalability/ +++ +++ ++ ++ Performance Graph Databases Aggregate- Yes Yes Yes No oriented Complexity + ++ ++ +++ Inspiration and Berkley DB, SAP Sybase IQ, Lotus Notes Graph Theory Relation Memcached, BigTable Distributed Hashmaps NoSQL Voldemort Hbase CouchDB Sones Products Redis Cassandra MongoDB Neo4J Riak Hypertable OrientDB InfoGrid Amazon SimpleDB RavenDB FlockDB 2012 © Trivadis 16 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 15. NoSQL Database Types Size Key-value stores Column Family Document Graph Relational Complexity 2012 © Trivadis 17 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 16. Key Value Databases  A key-value store is a simple hash table  Primarily used when all access to the database is via primary key  Simplest NoSQL data stores to use (from an API perspective)  PUT, GET, DELETE (matches REST)  Value is a blob with the data store not caring or knowing what is inside  Aggregate-Oriented Suitable Use Cases • Storing Session Information • User Profiles, Preferences • Shopping Cart Data 2012 © Trivadis 18 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 17. Column-Family Stores  Store data in column families as rows that have many columns associated with a row key  Column families are groups of related data, often accessed together  Aggregate-Oriented Suitable Use Cases • Event Logging • Content Management Systems • Counters Source: NoSQL Distilled • Expiring Usage 2012 © Trivadis 19 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 18. Document Databases  Documents are the main concept  Stores and retrieves documents, which can be XML, JSON, BSON, …  Documents are self-describing, hierarchical tree data structures which can consist of maps, collections and scalar values  Documents stored are similar to each other but do not have to be exactly the same  Aggregate-Oriented Suitable Use Cases • Event Logging • Content Management Systems • Web Analytics or Real-Time Analytics • Product Catalog 2012 © Trivadis 20 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 19. Document Database - MongoDB 2012 © Trivadis 21 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 20. Graph Databases  Allow to store entities and relationships between these entities  Entities are known as nodes, which have properties  Relations are known as edges, which also have properties  A query on the graph is also known as traversing the graph  Traversing the relationships is very fast Tag Customer Suitable Use Cases Country RATED TAG • Connected Data ADDRESS COUNTRY Product • Routing, Dispatch and Location-Based BILLING_ LINE_ITEM Services ADDRESS Address Recommendation Engines Order • DELIVERY_ ADDRESS 2012 © Trivadis 22 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 21. Graph Database – Neo4J Query through Cypher START MATCH WHERE RETURN ORDER BY LIMIT customer=node:Customer(email = "david@dmband.com") customer-[:ORDERED]->order-[item:LINEITEM]->product order.date > 20120101 product.name, sum(item.amount) AS product products DESC 20 2012 © Trivadis 23 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 22. Agenda 1. Why NoSQL and what is it? 2. NoSQL Database Types 3. Polyglot Persistence 4. Data Virtualization Layer 5. Summary 2012 © Trivadis 24 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 23. Polyglot Persistence In 2006, Neal Ford coined the term Polyglot Programming  Applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems Polyglot Persistence defines a a hybrid approach to persistence  Using multiple data storage technologies  Selected based on the way data is being used by individual applications  Why store binary images in relational databases, when there are better storage systems?  Can occur both over the enterprise as well as within a single application 2012 © Trivadis 25 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 24. „Traditional“ Persistence Model Polyglot Persistence E-commerce Application Today we use the same database for all kind of data Shopping cart data User Sessions Completed Order Product Catalog Recomendations • Business transactions, session management data, reporting, RDBMS logging information, content information, ... No need for same properties of Polygot Persistence Model availability, consistency or E-commerce Application backup requirements Polyglot Data Storage Usage allows to mix and match Shopping cart data User Sessions Completed Order Product Catalog Recomendations Relational and NoSQL data stores Key-Value RDMBS Document Graph 2012 © Trivadis 26 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 25. Polyglot Persistence – Challenges • Decisions • Have to decide what data storage technology to use • Today it„s easier to go with relational • New Data Access APIs • Each data store has its own mechanisms for accessing the data • Different API‟s Service-Oriented Polygot Persistence Model E-commerce Application • Solution: Wrap the data access code into services (Data/Entity Service) exposed to applications • Will enforce a contract/schema Shopping cart data User Sessions Completed Order Product Catalog Recomendations to a schemaless database Key-Value Graph RDMBS Document Shopping Cart User Session Product Catalog Recomendation Service Service Order Service Service Service 2012 © Trivadis 27 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 26. Polyglot Persistence – Challenges • Immaturity • NoSQL tools are still young, full of rough edges that new tools have • Not much experience, we don„t know how to use them well • No patterns and best practices exist yet • Organizational Change • How will the different data groups in an enterprise react to this new technology • Dealing with eventual consistency paradigm • Reaction of different stakeholders to the fact that data could be stale • How to enforce rules to sync data across systems 2012 © Trivadis 28 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 27. Agenda 1. What is NoSQL and Big Data 2. NoSQL Database Types 3. Polyglot Persistence 4. Data Virtualization Layer 5. Summary 2012 © Trivadis 29 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 28. Data Access Architecture for Polyglot Persistence well known design patterns are still valid! some best practices we know in data access are still valid! Consumer Consumer Consumer REST/SOAP REST/SOAP Service Service REST Repository/DAO Repository/DAO O/R Mapping ??? SQL REST API RDBMS NoSQL NoSQL 2012 © Trivadis 30 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 29. Middle Tier Architecture for Polyglot Persistence Resource Tier Middle Tier Consumer Integration Service Application Domain Integration Domain Service Bean Web Service Exporter Application Service Bean REST Composite Application Factory Bean SOAP O/R Mapping Domain Objects NoSQL API Repository Bean Aggregate SQL API DAO Bean Data Transfer Objects (DTO) 2012 © Trivadis 31 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 30. Polyglot Persistence with Spring Data makes it easier to build Spring-powered applications that use new data access technologies provide improved support for relational database technologies Commons project supports Polyglot Persistence Currently support for: • JPA and JDBC (relational) Consumer • Apache Hadoop REST/SOAP • GemFire Service • REST Repository/DAO • Redis • MongoDB ??? • Neo4J • Hbase NoSQL 2012 © Trivadis 32 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 31. Spring Data – Mapping to Relational Database (using JPA) Annotations define the mapping: @Entity, @Id, @Column, @OneToOne, @OneToMany, @JoinColumn, Consumer REST/SOAP Service Repository/DAO ??? NoSQL 2012 © Trivadis 33 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 32. Spring Data – Mapping to Relational Database Consumer REST/SOAP Service public interface CustomerRepository extends Repository<Customer, Long> { Repository/DAO Customer findByEmailAddress(EmailAddress emailAddress); } ??? @Repository NoSQL @Profile(“jpa") class JpaCustomerRepository implements CustomerRepository { @Override public Customer findByEmailAddress(EmailAddress emailAddress) { TypedQuery<Customer> query = em.createQuery( "select c from Customer c where c.emailAddress = :email“, Customer.class); query.setParameter("email", emailAddress); Customer guido= repository.findByEmailAddress(new return query.getSingleResult(); EmailAddress(“guido@hotmail.com")); } Customer anotherCust= new Customer(“Peter", “Sample"); <jpa:repositories base-package="com.oreilly.springdata.jpa" /> anotherCust.setEmailAddress(guido.getEmailAddress()); repository.save(anotherCust); <bean class="org.springframework.orm.jpa.JpaTransactionManager"> <property name="entityManagerFactory" ref="entityManagerFactory" /> </bean> <bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean"> <property name="dataSource" ref="dataSource" /> <property name="packagesToScan" value="com.oreilly.springdata.jpa" /> </bean> 2012 © Trivadis 34 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 33. Spring Data – Mapping to MongoDB Annotations define the mapping: @Document, @Id, @Indexed, @PersistenceConstructor, @CompoundIndex, @DBRef, @GeoSpatialIndex, @Value Consumer REST/SOAP Service Repository/DAO ??? NoSQL 2012 © Trivadis 35 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 34. Spring Data – Generic Repositories for MongoDB Consumer REST/SOAP Service public interface CustomerRepository extends Repository<Customer, Long> { Repository/DAO Customer findByEmailAddress(EmailAddress emailAddress); } ??? @Repository NoSQL @Profile("mongodb") class MongoDbCustomerRepository implements CustomerRepository { @Override public Customer findByEmailAddress(EmailAddress emailAddress) { Query query = query(where("emailAddress").is(emailAddress)); return operations.findOne(query, Customer.class); } <mongo:db-factory id="mongoDbFactory" dbname="e-store" /> <mongo:mapping-converter id="mongoConverter" base-package="com.oreilly.springdata.mongodb"> <mongo:custom-converters base-package="com.oreilly.springdata.mongodb" /> </mongo:mapping-converter> <bean id="mongoTemplate" class="org.springframework.data.mongodb.core.MongoTemplate"> Customer guido= repository.findByEmailAddress(new <constructor-arg ref="mongoDbFactory" /> EmailAddress(“guido@hotmail.com")); <constructor-arg ref="mongoConverter" /> <property name="writeConcern" value="SAFE" /> Customer anotherCust= new Customer(“Peter", “Sample"); </bean> anotherCust.setEmailAddress(guido.getEmailAddress()); <mongo:repositories base-package="com.oreilly.springdata.mongodb" /> repository.save(anotherCust); 2012 © Trivadis 36 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 35. Spring Data – Mapping to Neo4J Annotations define the mapping: @NodeEntity, RelationShipEntity, @GraphId, @RelatedTo, @RelatedToVia, @EndNode, @Fetch, Tag Customer Country RATED TAG ADDRESS COUNTRY Product BILLING_ ADDRESS LINE_ITEM Address Order DELIVERY_ ADDRESS 2012 © Trivadis 37 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 36. Spring Data – Generic Repositories for MongoDB Consumer REST/SOAP Service Repository/DAO public interface CustomerRepository extends GraphRepository<Customer> { ??? Customer findByEmailAddress(EmailAddress emailAddress); } NoSQL <neo4j:config graphDatabaseService="graphDatabaseService" /> <neo4j:repositories base-package="com.oreilly.springdata.neo4j" /> <bean id="graphDatabaseService" class="org.neo4j.kernel.EmbeddedGraphDatabase" destroy-method="shutdown"> <constructor-arg value="target/graph.db" /> </bean> Customer guido= repository.findByEmailAddress(new EmailAddress(“guido@hotmail.com")); Customer anotherCust= new Customer(“Peter", “Sample"); anotherCust.setEmailAddress(guido.getEmailAddress()); repository.save(anotherCust); 2012 © Trivadis 38 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 37. Expose contract-first Web service Consumer REST/SOAP Service Repository/DAO Use any Java Web Service Framework which supports Contract-First approach ??? Can be SOAP or can be REST NoSQL Maps the data contract to the schemaless database Uses the different Repository implementations Must handle data migration issues together with the Repository 2012 © Trivadis 39 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 38. Schemaless – We still have to migrate the data! With RDMBS we are used to keep DDL scripts together Customer with DML scripts for each single data model change Name: Peter Sample First Name: Peter Last Name: Sample BillingAddress Billing Address • Has to be in sync with the data access code Street: Somestreet 10 Version 1.0 City: Somewhere Postal Code:55901 PostalCode: 55901 RDBMS has to be changed before the application is changed => possible application downtime Customer • This is what the schemaless approach of most NoSQL Name: Peter Sample FirstName: Peter DB tries to avoid LastName: Sample Transition Billing Address Version 1.0 => 2.0 Schemaless DBs still need careful migration, due to Street: Somestreet 10 City: Somewhere implicit schema in any data access code PostalCode: 55901 But a more “on-demand” approach is possible Customer • Code can read data in a way that it tolerant to First Name: Peter Last Name: Sample changes in the data‟s implicit schema and migrate Billing Address Version 2.0 the data on the next update Street: Somestreet 10 City: Somewhere PostalCode: 55901 • Similar to service versioning => gradual change 2012 © Trivadis 40 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 39. Agenda 1. What is NoSQL and Big Data 2. NoSQL Database Types 3. Polyglot Persistence 4. Data Virtualization Layer and Data Architecture 5. Summary 2012 © Trivadis 41 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 40. Pros & Cons of NoSQL compared to RDBMS Pros Cons • No O/R impedance mismatch • Lacks in tool and framework support • Can easily evolve schemas • Few other implementations => • Can represent semi-structured potential lock in info • No support for ad-hoc queries • Can represent graphs/networks (with performance) • Another/A new database in production to take care of 2012 © Trivadis 42 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 41. Summary Relational databases are here to stay but NoSQL offers new persistence model Polyglot Persistence will be the future Schemaless does not mean there is no data migration! => but a more on- demand model might be possible Encapsulate data access code to be able to switch databases Service-orientation provides the data contract to a NoSQL database => to make information reusable Don„t commit to a NoSQL until you have done a significant PoC Make sure that Operations people (DBAs) are on board early enough Non-relational is not new in an enterprise (OLTP vs. OLAP) 2012 © Trivadis 43 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 42. Possible Use Cases  NoSQL for parallel ETL?  NoSQL for modern BI  NoSQL for stateful Middletier (i.e. shopping cart)  NoSQL for aggregated master data (i.e. through REST for Web apps)  NoSQL for a CMS-Store, directly accessible through REST  NoSQL as a local Store for Mobile applications  NoSQL for Event Sourcing and CQRS architectures 2012 © Trivadis 44 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 43. Further Information 2012 © Trivadis 45 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012
  • 44. VIELEN DANK. Trivadis Guido Schmutz guido.schmutz@trivadis.com info@trivadis.com www.trivadis.com BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 2012 © Trivadis 46 NoSQL for Data Services, Data Virtualization & Big Data 25.9.2012