SlideShare a Scribd company logo
Padova, InfoCamere
  JBoss User Group
     12 Aprile 2012
Chi sono?
•   Team Hibernate    Sanne Grinovero
– Hibernate Search    Italiano, Olandese, Newcastle
                      Red Hat: JBoss, Engineering
– Hibernate OGM

•   Infinispan
– Infinispan Core
– Infinispan Query
– JGroups

•   Apache Lucene
Infinispan
•   Cache distribuita
•   Datagrid scalabile e transazionale:
    performance estreme e cloud
•   NoSQL “DataBase”: key-value store
– Come si interroga un data grid ?

      SELECT * FROM GRID
Infinispan,Lucene,Hibername OGM
Interrogare
una “Grid”

Object v =

cache.get(“c7”);
Senza chiave, non puoi
ottenere il valore.
É pratico il solo
accesso per chiave?
Test sulla mia
   libreria
• Dov'é Hibernate
    Search in Action?
•   Mi passi
    ISBN 978-1-
    933988-17-7 ?
•   Prendi i libri su
    Gaudí ?
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
Come implementare
    queste funzioni su un
      Key/Value store?
•   Dov'é Hibernate Search in Action?
•   Mi passi ISBN 978-1-933988-17-7 ?
•   Trovi i libri su Gaudí ?
document based NoSQL:
        Map/Reduce
Infinispan non é propriamente document based ma
offre Map/Reduce.
Eppure non é escluso l'uso di JSON, XML, YAML, Java:
     public class Book implements Serializable {

         final String title;
         final String author;
         final String editor;

         public Book(String title, String author, String editor) {
            this.title = title;
            this.author = author;
            this.editor = editor;
         }

     }
Iterate & collect
class TitleBookSearcher implements
                        Mapper<String, Book, String, Book> {
  final String title;
  public TitleBookSearcher(String t) { title = t; }
  public void map(String key, Book value, Collector collector){
     if ( title.equals( value.title ) )
        collector.emit( key, value );
  }

class BookReducer implements
                        Reducer<String, Book> {
  public Book reduce(String reducedKey, Iterator<Book> iter) {
         return iter.next();
  }
}
Implementare queste
         semplici funzioni:
✔ Trova “Hibernate Search in Action”?
✔ Trova per codice “ISBN 978-1-933988-17-7” ?
✗ Quanti libri a proposito di
“Shakespeare” ?
  •   Per uno score corretto in ricerche fulltext
      servono le frequenze dei frammenti di
      testo relative al corpus.
  •   Il Pre-tagging é poco pratico e limitante
Apache Lucene
•   Progetto open source Apache™

•   Integrato in innumerevoli progetti

•   .. tra cui Hibernate via Hibernate Search

•   Clusterizzabile via Infinispan
– Performance
– Real time
– High availability
Cosa offre Lucene?
•   Ricerche per Similarity score

•   Analisi del testo
– Sinonyms, Stopwords, Stemming, ...

•   Reusable declarative Filters

•   TermVectors

•   MoreLikeThis

•   Faceted Search

•   Veloce!
Lucene: Stopwords
a, able, about, across, after, all, almost, also, am,
among, an, and, any, are, as, at, be, because, been,
but, by, can, cannot, could, dear, did, do, does,
either, else, ever, every, for, from, get, got, had,
has, have, he, her, hers, him, his, how, however, i, if,
in, into, is, it, its, just, least, let, like, likely,
may, me, might, most, must, my, neither, no, nor, not,
of, off, often, on, only, or, other, our, own, rather,
said, say, says, she, should, since, so, some, than,
that, the, their, them, then, there, these, they, this,
tis, to, too, twas, us, wants, was, we, were, what,
when, where, which, while, who, whom, why, will, with,
would, yet, you, your
Filters
Faceted Search
Facciamo un bel motore di
ricerca che restituisce i
risultati in ordine
alfabetico?
Chi usa Lucene?




Nexus
Dov'é la fregatura?
•   Necessita di un indice: risorse fisiche e di
    amministrazione.
– in memory
– on filesystem
– in Infinispan

•   Sostanzialmente immutable segments
– Ottimizzato per data mining / query, non per
  updates.

•   Un mondo di stringhe e vettori di frequenze
Infinispan Query quickstart
  • Abilita indexing=true nella
       configurazione
   •   Aggiungi il modulo infinispan-
       query.jar al classpath
   •   Annota i POJO inseriti nella cache
       per le modalitá di indicizzazione
 <dependency>
     <groupId>org.infinispan</groupId>
     <artifactId>infinispan-query</artifactId>
     <version>5.1.3.FINAL</version>
 </dependency>
Configurazione tramite
          codice
Configuration c = new Configuration()
   .fluent()
      .indexing()
      .addProperty(
"hibernate.search.default.directory_provider",
"ram")
      .build();

CacheManager manager = new DefaultCacheManager(c);
Configurazione / XML
<?xml version="1.0" encoding="UTF-8"?>
<infinispan
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="urn:infinispan:config:5.0
http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"
            xmlns="urn:infinispan:config:5.0">
<default>
    <indexing enabled="true" indexLocalOnly="true">
        <properties>
             <property name="hibernate.search.option1" value="..." />
             <property name="hibernate.search.option2" value="..." />
          </properties>
    </indexing>
</default>
Annotazioni sul modello
@ProvidedId @Indexed
public class Book implements Serializable {

    @Field String title;
    @Field String author;
    @Field String editor;

    public Book(String title, String author, String editor) {
       this.title = title;
       this.author = author;
       this.editor = editor;
    }

}
Esecuzione di Query
SearchManager sm = Search.getSearchManager(cache);

Query query = sm.buildQueryBuilderForClass(Book.class)
    .get()
        .phrase()
            .onField("title")
            .sentence("in action")
    .createQuery();

List<Object> list = sm.getQuery(query).list();
Architettura
•    Integra Hibernate Search (engine)
– Listener a eventi Hibernate &
  transazioni
    • Eventi Infinispan & transazioni
– Mappa tipi Java e grafi del modello a
  Documents di Lucene
– Thin-layer design
Index mapping
Tests per
     Infinispan Query
https://github.com/infinispan/infinispan
org.apache.lucene.search.Query luceneQuery =
    queryBuilder.phrase()
          .onField( "description" )
          .andField( "title" )
          .sentence( "a book on highly scalable query engines" )
          .enableFullTextFilter( “ready-for-shipping” )
          .createQuery();


CacheQuery cacheQuery =
            searchManager.getQuery( luceneQuery, Book.class);
List<Book> objectList = cacheQuery.list();
Architettura: Infinispan
         Query
Problemi di scalabilitá
•   Writer locks globali
•   Sharing su NFS
    molto problematico
Queue-based clustering
     (filesystem)
Index stored in
   Infinispan
Infinispan,Lucene,Hibername OGM
Quickstart Hibernate
           Search
 •    Aggiungi la dipendenza ad hibernate-
      search:
<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate­search­orm</artifactId>
   <version>4.1.0.Final</version>
</dependency>
Quickstart Hibernate
                       Search
•    Tutto il resto é opzionale:
– Come gestire gli indici
– Moduli di estensione, Analyzer custom
– Performance tuning
– Mapping custom dei tipi
– Clustering
    • JGroups
    • Infinispan
    • JMS
Quickstart Hibernate
@Entity
          Search
public class Essay {
   @Id
   public Long getId() { return id; }

   public String getSummary() { return 
summary; }
   @Lob 
   public String getText() { return text; }
   @ManyToOne 
   public Author getAuthor() { return 
author; }
...
Quickstart Hibernate
@Entity @Indexed
                 Search
public class Essay {
   @Id
   public Long getId() { return id; }

   public String getSummary() { return 
summary; }
   @Lob 
   public String getText() { return text; }
   @ManyToOne 
   public Author getAuthor() { return 
author; }
...
Quickstart Hibernate
@Entity @Indexed
                 Search
public class Essay {
   @Id
   public Long getId() { return id; }
   @Field
   public String getSummary() { return 
summary; }
   @Lob 
   public String getText() { return text; }
   @ManyToOne 
   public Author getAuthor() { return 
author; }
...
Quickstart Hibernate
@Entity @Indexed
                 Search
public class Essay {
   @Id
   public Long getId() { return id; }
   @Field
   public String getSummary() { return 
summary; }
   @Lob @Field @Boost(0.8)
   public String getText() { return text; }
   @ManyToOne 
   public Author getAuthor() { return 
author; }
...
Quickstart Hibernate
@Entity @Indexed
                 Search
public class Essay {
   @Id
   public Long getId() { return id; }
   @Field
   public String getSummary() { return 
summary; }
   @Lob @Field @Boost(0.8)
   public String getText() { return text; }
   @ManyToOne @IndexedEmbedded 
   public Author getAuthor() { return 
author; }
...
Un secondo esempio
@Entity                        @Entity
public class Author {          public class Book {
        @Id @GeneratedValue       private Integer id;
        private Integer id;       private String title;
        private String name;   }
        @OneToMany
        private Set<Book>
books;
}
Struttura dell'indice
@Entity @Indexed              @Entity
public class Author {         public class Book {
        @Id @GeneratedValue      private Integer id;
        private Integer id;      @Field(store=Store.YES)
                                 private String title;
@Field(store=Store.YES)       }
       private String name;
       @OneToMany
       @IndexedEmbedded
       private Set<Book>
books;
}
Query
String[] productFields = {"summary", "author.name"};

Query luceneQuery = // query builder or any Lucene 
Query

FullTextEntityManager ftEm =
   Search.getFullTextEntityManager(entityManager);

FullTextQuery query =
   ftEm.createFullTextQuery( luceneQuery, 
Product.class );

List<Product> items =
   query.setMaxResults(100).getResultList();

int totalNbrOfResults = query.getResultSize();
                             TotalNbrOfResults= 8.320.000
                             (0.002 seconds)
Uso della DSL
Sui risultati:
•   Managed POJO: modifiche alle entitá applicati sia a
    Lucene che al database

•   Paginazione JPA, familiari (standard):
– .setMaxResults( 20 ).setFirstResult( 100 );

•   Restrizioni sul tipo, query fulltext polimorifiche:
– .createQuery( luceneQuery, A.class, B.class, ..);

•   Projection

•   Result mapping
Filters
FullTextQuery ftQuery = s // s is a FullTextSession

   .createFullTextQuery( query, Product.class )

   .enableFullTextFilter( "filtroMinori" )

   .enableFullTextFilter( "offertaDelGiorno" )

      .setParameter( "day", “20120412” )

   .enableFullTextFilter( "inStockA" )

      .setParameter( "location", "Padova" );

List<Product> results = ftQuery.list();
Uso di Infinispan per la
distribuzione degli indici
Clustering di un uso
      Lucene “diretto”

•   Usando org.apache.lucene
– Tradizionalmente difficile da distribuire
  su nodi multipli
– Su qualsiasi cloud
Nodo singolo
         idea di performance
                       Write ops/sec                                                                              Queries/sec



 RAMDirectory                                                                               RAMDirectory



    Infinispan 0                                                                               Infinispan 0




                                                                      queries per second
  Infinispan D4                                                                              Infinispan D4



 Infinispan D40                                                                             Infinispan D40



   FSDirectory                                                                                FSDirectory



Infinispan Local                                                                           Infinispan Local


                   0   50   100   150   200   250   300   350   400                                           0   5000   10000   15000   20000   25000
Nodi multipli
         idea di performance
                       Write ops/sec                                                                              Queries/sec



 RAMDirectory                                                                               RAMDirectory



    Infinispan 0                                                                               Infinispan 0




                                                                      queries per second
  Infinispan D4                                                                              Infinispan D4



 Infinispan D40                                                                             Infinispan D40



   FSDirectory                                                                                FSDirectory



Infinispan Local                                                                           Infinispan Local


                   0   50   100   150   200   250   300   350   400                                           0   5000   10000   15000   20000   25000
Le scritture non
    scalano?
Suggerimenti per
    performance ottimali
•   Calibra il chunk_size per l'uso effettivo
    del vostro indice (evita i read lock
    evitando la frammentazione)
•   Verifica la dimensione dei pacchetti
    network: blob size, JGroups packets,
    network interface and hardware.
•   Scegli e configura un CacheLoader
    adatto
Requisiti di memoria
•   RAMDirectory: tutto l'indice (e piú) in RAM.

•   FSDirectory: un buon OS sa fare un ottimo
    lavoro di caching di IO – spesso meglio di
    RAMDirectory.

•   Infinispan: configurabile, fino alla memoria
    condivisa tra nodi
– Flexible
– Fast
– Network vs. disk
Moduli per cloud
 deployment scalabili
   One Infinispan to rule them all
– Store Lucene indexes
– Hibernate second level cache
– Application managed cache
– Datagrid
– EJB, session replication in AS7
– As a JPA “store” via Hibernate OGM
Ingredienti per la cloud
• JGroups DISCOVERY protocol
– MPING
– TCP_PING
– JDBC_PING
– S3_PING

•   Scegli un CacheLoader
– Database based, Jclouds,
  Cassandra, ...
Futuro prossimo
•   Semplificare la scalabilitá in scrittura
•   Auto-tuning dei parametri di
    clustering – ergonomics!

•   Parallel searching: multi/core +
    multi/node

•   A component of
– http://www.cloudtm.eu
Infinispan,Lucene,Hibername OGM
JPA for NoSQL
NoSQL:
   la flessibilitá costa
• Programming model
  • one per product :-(
• no schema => app driven schema
• query (Map Reduce, specific DSL, ...)
• data structure transpires
• Transaction
• durability / consistency
Esempio: Infinispan
Distributed Key/Value store
         (or Replicated, local only efficient cache,
      •
      invalidating cache)
Each node is equal
         Just start more nodes, or kill some
      •
No bottlenecks
         by design
      •
Cloud-network friendly
         JGroups
      •
         And “cloud storage” friendly too!
      •
ABC di Infinispan

map.put( “user-34”,
userInstance );

map.get( “user-34” );

map.remove( “user-34” );
É una ConcurrentMap !
map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

map.putIfAbsent( “user-38”,
another );
Qualche altro dettaglio su
       Infinispan
 ●   Support for Transactions (XA)
 ●   CacheLoaders
     ●   Cassandra, JDBC, Amazon S3 (jclouds),...
 ● Tree API for JBossCache compatibility
 ● Lucene integration

   ● Two-fold

 ● Some Hibernate integrations

   ● Second level cache

   ● Hibernate Search indexing backend
Obiettivi di Hibernate
         OGM

      Encourage new data usage patterns
  •
      Familiar environment
  •
      Ease of use
  •
      easy to jump in
  •
      easy to jump out
  •
      Push NoSQL exploration in enterprises
  •
      “PaaS for existing API” initiative
  •
Cos'é

• JPA front end to key/value stores
   • Object CRUD (incl polymorphism and
      associations)
   • OO queries (JP-QL)
• Reuses
   • Hibernate Core
   • Hibernate Search (and Lucene)
   • Infinispan
• Is not a silver bullet
   • not for all NoSQL use cases
Entitá come blob
       serializzati?
• Serialize objects into the (key) value
  • store the whole graph?

• maintain consistency with duplicated
  objects
  • guaranteed identity a == b
  • concurrency / latency
  • structure change and (de)serialization,
     class definition changes
OGM’s approach to
      schema
• Keep what’s best from relational model
  • as much as possible
  • tables / columns / pks
• Decorrelate object structure from data
  structure
• Data stored as (self-described) tuples
• Core types limited
  • portability
Infinispan,Lucene,Hibername OGM
Query

• Hibernate Search indexes entities
• Store Lucene indexes in Infinispan
• JP-QL to Lucene query transformation

• Works for simple queries
  • Lucene is not a relational SQL engine
E ora?


•   MongoDB
•   EHCache / Terracotta
•   Redis
•   Voldemort
•   Neo4J
•   Dynamo
•   ... Git? Spreadsheet? ...CapeDwarf?
Q&A



http://infinispan.org   @Infinispan
http://in.relation.to   @Hibernate
http://jboss.org        @SanneGrinovero

More Related Content

PDF
April 2010 - JBoss Web Services
PPTX
Python mongo db-training-europython-2011
PDF
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
PDF
Scala Frustrations
PDF
Lobos Introduction
PPTX
SenchaCon 2016: Modernizing the Ext JS Class System - Don Griffin
PPT
8b. Column Oriented Databases Lab
PDF
Cassandra 3.0 - JSON at scale - StampedeCon 2015
April 2010 - JBoss Web Services
Python mongo db-training-europython-2011
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
Scala Frustrations
Lobos Introduction
SenchaCon 2016: Modernizing the Ext JS Class System - Don Griffin
8b. Column Oriented Databases Lab
Cassandra 3.0 - JSON at scale - StampedeCon 2015

What's hot (20)

PDF
Building node.js applications with Database Jones
PDF
Cloudera Impala, updated for v1.0
PDF
My sql tutorial-oscon-2012
PDF
Designing a JavaFX Mobile application
PPTX
Building .NET Apps using Couchbase Lite
ODP
JCR and ModeShape
PPTX
ElasticSearch for .NET Developers
PPTX
Oak Lucene Indexes
PPTX
Omnisearch in AEM 6.2 - Search All the Things
PDF
Scalable XQuery Processing with Zorba on top of MongoDB
PPTX
Getting started with Elasticsearch and .NET
PDF
Dropwizard
PDF
Cassandra and materialized views
PPTX
Pragmatische Plone Projekte
PDF
MongoDB Munich 2012: MongoDB for official documents in Bavaria
PDF
Replacing Oracle with MongoDB for a templating application at the Bavarian go...
PPT
Ms build – inline task
PDF
Cooking 5 Star Infrastructure with Chef
PDF
Polyglot Persistence
PDF
PostgreSQL, your NoSQL database
Building node.js applications with Database Jones
Cloudera Impala, updated for v1.0
My sql tutorial-oscon-2012
Designing a JavaFX Mobile application
Building .NET Apps using Couchbase Lite
JCR and ModeShape
ElasticSearch for .NET Developers
Oak Lucene Indexes
Omnisearch in AEM 6.2 - Search All the Things
Scalable XQuery Processing with Zorba on top of MongoDB
Getting started with Elasticsearch and .NET
Dropwizard
Cassandra and materialized views
Pragmatische Plone Projekte
MongoDB Munich 2012: MongoDB for official documents in Bavaria
Replacing Oracle with MongoDB for a templating application at the Bavarian go...
Ms build – inline task
Cooking 5 Star Infrastructure with Chef
Polyglot Persistence
PostgreSQL, your NoSQL database
Ad

Viewers also liked (11)

PDF
Why RESTful Design for the Cloud is Best
PDF
Infinispan – the open source data grid platform by Mircea Markus
PDF
What's New in Infinispan 6.0
PDF
Infinispan
PDF
Infinispan Data Grid Platform
PDF
London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
PDF
Introducing Infinispan
PDF
Data Grids vs Databases
PPTX
Infinispan, a distributed in-memory key/value data grid and cache
PDF
Data Grids and Data Caching
PDF
Infinispan for Dummies
Why RESTful Design for the Cloud is Best
Infinispan – the open source data grid platform by Mircea Markus
What's New in Infinispan 6.0
Infinispan
Infinispan Data Grid Platform
London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
Introducing Infinispan
Data Grids vs Databases
Infinispan, a distributed in-memory key/value data grid and cache
Data Grids and Data Caching
Infinispan for Dummies
Ad

Similar to Infinispan,Lucene,Hibername OGM (20)

PPTX
Examiness hints and tips from the trenches
PDF
Full Text Search In PostgreSQL
PPTX
JavaEdge09 : Java Indexing and Searching
KEY
LibreCat::Catmandu
PDF
[2D1]Elasticsearch 성능 최적화
PDF
[2 d1] elasticsearch 성능 최적화
PPTX
Introduction to Elasticsearch with basics of Lucene
PPTX
Search and analyze your data with elasticsearch
PPTX
04 darwino concepts and utility classes
PDF
Rapid Prototyping with Solr
PDF
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
PPT
How ElasticSearch lives in my DevOps life
PDF
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
PDF
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PDF
Effiziente Datenpersistierung mit JPA 2.1 und Hibernate
PDF
Lucene for Solr Developers
PDF
Lucene for Solr Developers
PPTX
Effiziente persistierung
PDF
Rapid Prototyping with Solr
ODP
700 Tons of Code Later
Examiness hints and tips from the trenches
Full Text Search In PostgreSQL
JavaEdge09 : Java Indexing and Searching
LibreCat::Catmandu
[2D1]Elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
Introduction to Elasticsearch with basics of Lucene
Search and analyze your data with elasticsearch
04 darwino concepts and utility classes
Rapid Prototyping with Solr
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
How ElasticSearch lives in my DevOps life
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
Effiziente Datenpersistierung mit JPA 2.1 und Hibernate
Lucene for Solr Developers
Lucene for Solr Developers
Effiziente persistierung
Rapid Prototyping with Solr
700 Tons of Code Later

More from JBug Italy (20)

PDF
JBoss Wise: breaking barriers to WS testing
PDF
Camel and JBoss
PDF
AS7 and CLI
PDF
Intro jbug milano_26_set2012
PDF
Faster & Greater Messaging System HornetQ zzz
PDF
ODP
JBoss BRMS - The enterprise platform for business logic
KEY
JBoss AS7 Overview
PDF
Intro JBug Milano - January 2012
PDF
JBoss AS7 Webservices
PDF
JBoss AS7
PDF
Intro JBug Milano - September 2011
ODP
All the cool stuff of JBoss BRMS
ODP
Infinispan and Enterprise Data Grid
PDF
Drools Introduction
PDF
September 2010 - Arquillian
PDF
September 2010 - Gatein
PDF
May 2010 - Infinispan
PDF
May 2010 - RestEasy
PDF
May 2010 - Drools flow
JBoss Wise: breaking barriers to WS testing
Camel and JBoss
AS7 and CLI
Intro jbug milano_26_set2012
Faster & Greater Messaging System HornetQ zzz
JBoss BRMS - The enterprise platform for business logic
JBoss AS7 Overview
Intro JBug Milano - January 2012
JBoss AS7 Webservices
JBoss AS7
Intro JBug Milano - September 2011
All the cool stuff of JBoss BRMS
Infinispan and Enterprise Data Grid
Drools Introduction
September 2010 - Arquillian
September 2010 - Gatein
May 2010 - Infinispan
May 2010 - RestEasy
May 2010 - Drools flow

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
cuic standard and advanced reporting.pdf
NewMind AI Monthly Chronicles - July 2025
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Dropbox Q2 2025 Financial Results & Investor Presentation

Infinispan,Lucene,Hibername OGM

  • 1. Padova, InfoCamere JBoss User Group 12 Aprile 2012
  • 2. Chi sono? • Team Hibernate Sanne Grinovero – Hibernate Search Italiano, Olandese, Newcastle Red Hat: JBoss, Engineering – Hibernate OGM • Infinispan – Infinispan Core – Infinispan Query – JGroups • Apache Lucene
  • 3. Infinispan • Cache distribuita • Datagrid scalabile e transazionale: performance estreme e cloud • NoSQL “DataBase”: key-value store – Come si interroga un data grid ? SELECT * FROM GRID
  • 5. Interrogare una “Grid” Object v = cache.get(“c7”);
  • 6. Senza chiave, non puoi ottenere il valore.
  • 7. É pratico il solo accesso per chiave?
  • 8. Test sulla mia libreria • Dov'é Hibernate Search in Action? • Mi passi ISBN 978-1- 933988-17-7 ? • Prendi i libri su Gaudí ?
  • 12. Come implementare queste funzioni su un Key/Value store? • Dov'é Hibernate Search in Action? • Mi passi ISBN 978-1-933988-17-7 ? • Trovi i libri su Gaudí ?
  • 13. document based NoSQL: Map/Reduce Infinispan non é propriamente document based ma offre Map/Reduce. Eppure non é escluso l'uso di JSON, XML, YAML, Java: public class Book implements Serializable { final String title; final String author; final String editor; public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; } }
  • 14. Iterate & collect class TitleBookSearcher implements Mapper<String, Book, String, Book> { final String title; public TitleBookSearcher(String t) { title = t; } public void map(String key, Book value, Collector collector){ if ( title.equals( value.title ) ) collector.emit( key, value ); } class BookReducer implements Reducer<String, Book> { public Book reduce(String reducedKey, Iterator<Book> iter) { return iter.next(); } }
  • 15. Implementare queste semplici funzioni: ✔ Trova “Hibernate Search in Action”? ✔ Trova per codice “ISBN 978-1-933988-17-7” ? ✗ Quanti libri a proposito di “Shakespeare” ? • Per uno score corretto in ricerche fulltext servono le frequenze dei frammenti di testo relative al corpus. • Il Pre-tagging é poco pratico e limitante
  • 16. Apache Lucene • Progetto open source Apache™ • Integrato in innumerevoli progetti • .. tra cui Hibernate via Hibernate Search • Clusterizzabile via Infinispan – Performance – Real time – High availability
  • 17. Cosa offre Lucene? • Ricerche per Similarity score • Analisi del testo – Sinonyms, Stopwords, Stemming, ... • Reusable declarative Filters • TermVectors • MoreLikeThis • Faceted Search • Veloce!
  • 18. Lucene: Stopwords a, able, about, across, after, all, almost, also, am, among, an, and, any, are, as, at, be, because, been, but, by, can, cannot, could, dear, did, do, does, either, else, ever, every, for, from, get, got, had, has, have, he, her, hers, him, his, how, however, i, if, in, into, is, it, its, just, least, let, like, likely, may, me, might, most, must, my, neither, no, nor, not, of, off, often, on, only, or, other, our, own, rather, said, say, says, she, should, since, so, some, than, that, the, their, them, then, there, these, they, this, tis, to, too, twas, us, wants, was, we, were, what, when, where, which, while, who, whom, why, will, with, would, yet, you, your
  • 21. Facciamo un bel motore di ricerca che restituisce i risultati in ordine alfabetico?
  • 23. Dov'é la fregatura? • Necessita di un indice: risorse fisiche e di amministrazione. – in memory – on filesystem – in Infinispan • Sostanzialmente immutable segments – Ottimizzato per data mining / query, non per updates. • Un mondo di stringhe e vettori di frequenze
  • 24. Infinispan Query quickstart • Abilita indexing=true nella configurazione • Aggiungi il modulo infinispan- query.jar al classpath • Annota i POJO inseriti nella cache per le modalitá di indicizzazione <dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.1.3.FINAL</version> </dependency>
  • 25. Configurazione tramite codice Configuration c = new Configuration() .fluent() .indexing() .addProperty( "hibernate.search.default.directory_provider", "ram") .build(); CacheManager manager = new DefaultCacheManager(c);
  • 26. Configurazione / XML <?xml version="1.0" encoding="UTF-8"?> <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd" xmlns="urn:infinispan:config:5.0"> <default> <indexing enabled="true" indexLocalOnly="true"> <properties> <property name="hibernate.search.option1" value="..." /> <property name="hibernate.search.option2" value="..." /> </properties> </indexing> </default>
  • 27. Annotazioni sul modello @ProvidedId @Indexed public class Book implements Serializable { @Field String title; @Field String author; @Field String editor; public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; } }
  • 28. Esecuzione di Query SearchManager sm = Search.getSearchManager(cache); Query query = sm.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery(); List<Object> list = sm.getQuery(query).list();
  • 29. Architettura • Integra Hibernate Search (engine) – Listener a eventi Hibernate & transazioni • Eventi Infinispan & transazioni – Mappa tipi Java e grafi del modello a Documents di Lucene – Thin-layer design
  • 31. Tests per Infinispan Query https://github.com/infinispan/infinispan
  • 32. org.apache.lucene.search.Query luceneQuery = queryBuilder.phrase() .onField( "description" ) .andField( "title" ) .sentence( "a book on highly scalable query engines" ) .enableFullTextFilter( “ready-for-shipping” ) .createQuery(); CacheQuery cacheQuery = searchManager.getQuery( luceneQuery, Book.class); List<Book> objectList = cacheQuery.list();
  • 34. Problemi di scalabilitá • Writer locks globali • Sharing su NFS molto problematico
  • 35. Queue-based clustering (filesystem)
  • 36. Index stored in Infinispan
  • 38. Quickstart Hibernate Search • Aggiungi la dipendenza ad hibernate- search: <dependency>    <groupId>org.hibernate</groupId>    <artifactId>hibernate­search­orm</artifactId>    <version>4.1.0.Final</version> </dependency>
  • 39. Quickstart Hibernate Search • Tutto il resto é opzionale: – Come gestire gli indici – Moduli di estensione, Analyzer custom – Performance tuning – Mapping custom dei tipi – Clustering • JGroups • Infinispan • JMS
  • 40. Quickstart Hibernate @Entity Search public class Essay {    @Id    public Long getId() { return id; }    public String getSummary() { return  summary; }    @Lob     public String getText() { return text; }    @ManyToOne     public Author getAuthor() { return  author; } ...
  • 41. Quickstart Hibernate @Entity @Indexed Search public class Essay {    @Id    public Long getId() { return id; }    public String getSummary() { return  summary; }    @Lob     public String getText() { return text; }    @ManyToOne     public Author getAuthor() { return  author; } ...
  • 42. Quickstart Hibernate @Entity @Indexed Search public class Essay {    @Id    public Long getId() { return id; }    @Field    public String getSummary() { return  summary; }    @Lob     public String getText() { return text; }    @ManyToOne     public Author getAuthor() { return  author; } ...
  • 43. Quickstart Hibernate @Entity @Indexed Search public class Essay {    @Id    public Long getId() { return id; }    @Field    public String getSummary() { return  summary; }    @Lob @Field @Boost(0.8)    public String getText() { return text; }    @ManyToOne     public Author getAuthor() { return  author; } ...
  • 44. Quickstart Hibernate @Entity @Indexed Search public class Essay {    @Id    public Long getId() { return id; }    @Field    public String getSummary() { return  summary; }    @Lob @Field @Boost(0.8)    public String getText() { return text; }    @ManyToOne @IndexedEmbedded     public Author getAuthor() { return  author; } ...
  • 45. Un secondo esempio @Entity @Entity public class Author { public class Book { @Id @GeneratedValue private Integer id; private Integer id; private String title; private String name; } @OneToMany private Set<Book> books; }
  • 46. Struttura dell'indice @Entity @Indexed @Entity public class Author { public class Book { @Id @GeneratedValue private Integer id; private Integer id; @Field(store=Store.YES) private String title; @Field(store=Store.YES) } private String name; @OneToMany @IndexedEmbedded private Set<Book> books; }
  • 49. Sui risultati: • Managed POJO: modifiche alle entitá applicati sia a Lucene che al database • Paginazione JPA, familiari (standard): – .setMaxResults( 20 ).setFirstResult( 100 ); • Restrizioni sul tipo, query fulltext polimorifiche: – .createQuery( luceneQuery, A.class, B.class, ..); • Projection • Result mapping
  • 51. Uso di Infinispan per la distribuzione degli indici
  • 52. Clustering di un uso Lucene “diretto” • Usando org.apache.lucene – Tradizionalmente difficile da distribuire su nodi multipli – Su qualsiasi cloud
  • 53. Nodo singolo idea di performance Write ops/sec Queries/sec RAMDirectory RAMDirectory Infinispan 0 Infinispan 0 queries per second Infinispan D4 Infinispan D4 Infinispan D40 Infinispan D40 FSDirectory FSDirectory Infinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000
  • 54. Nodi multipli idea di performance Write ops/sec Queries/sec RAMDirectory RAMDirectory Infinispan 0 Infinispan 0 queries per second Infinispan D4 Infinispan D4 Infinispan D40 Infinispan D40 FSDirectory FSDirectory Infinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000
  • 55. Le scritture non scalano?
  • 56. Suggerimenti per performance ottimali • Calibra il chunk_size per l'uso effettivo del vostro indice (evita i read lock evitando la frammentazione) • Verifica la dimensione dei pacchetti network: blob size, JGroups packets, network interface and hardware. • Scegli e configura un CacheLoader adatto
  • 57. Requisiti di memoria • RAMDirectory: tutto l'indice (e piú) in RAM. • FSDirectory: un buon OS sa fare un ottimo lavoro di caching di IO – spesso meglio di RAMDirectory. • Infinispan: configurabile, fino alla memoria condivisa tra nodi – Flexible – Fast – Network vs. disk
  • 58. Moduli per cloud deployment scalabili One Infinispan to rule them all – Store Lucene indexes – Hibernate second level cache – Application managed cache – Datagrid – EJB, session replication in AS7 – As a JPA “store” via Hibernate OGM
  • 59. Ingredienti per la cloud • JGroups DISCOVERY protocol – MPING – TCP_PING – JDBC_PING – S3_PING • Scegli un CacheLoader – Database based, Jclouds, Cassandra, ...
  • 60. Futuro prossimo • Semplificare la scalabilitá in scrittura • Auto-tuning dei parametri di clustering – ergonomics! • Parallel searching: multi/core + multi/node • A component of – http://www.cloudtm.eu
  • 63. NoSQL: la flessibilitá costa • Programming model • one per product :-( • no schema => app driven schema • query (Map Reduce, specific DSL, ...) • data structure transpires • Transaction • durability / consistency
  • 64. Esempio: Infinispan Distributed Key/Value store (or Replicated, local only efficient cache, • invalidating cache) Each node is equal Just start more nodes, or kill some • No bottlenecks by design • Cloud-network friendly JGroups • And “cloud storage” friendly too! •
  • 65. ABC di Infinispan map.put( “user-34”, userInstance ); map.get( “user-34” ); map.remove( “user-34” );
  • 66. É una ConcurrentMap ! map.put( “user-34”, userInstance ); map.get( “user-34” ); map.remove( “user-34” ); map.putIfAbsent( “user-38”, another );
  • 67. Qualche altro dettaglio su Infinispan ● Support for Transactions (XA) ● CacheLoaders ● Cassandra, JDBC, Amazon S3 (jclouds),... ● Tree API for JBossCache compatibility ● Lucene integration ● Two-fold ● Some Hibernate integrations ● Second level cache ● Hibernate Search indexing backend
  • 68. Obiettivi di Hibernate OGM Encourage new data usage patterns • Familiar environment • Ease of use • easy to jump in • easy to jump out • Push NoSQL exploration in enterprises • “PaaS for existing API” initiative •
  • 69. Cos'é • JPA front end to key/value stores • Object CRUD (incl polymorphism and associations) • OO queries (JP-QL) • Reuses • Hibernate Core • Hibernate Search (and Lucene) • Infinispan • Is not a silver bullet • not for all NoSQL use cases
  • 70. Entitá come blob serializzati? • Serialize objects into the (key) value • store the whole graph? • maintain consistency with duplicated objects • guaranteed identity a == b • concurrency / latency • structure change and (de)serialization, class definition changes
  • 71. OGM’s approach to schema • Keep what’s best from relational model • as much as possible • tables / columns / pks • Decorrelate object structure from data structure • Data stored as (self-described) tuples • Core types limited • portability
  • 73. Query • Hibernate Search indexes entities • Store Lucene indexes in Infinispan • JP-QL to Lucene query transformation • Works for simple queries • Lucene is not a relational SQL engine
  • 74. E ora? • MongoDB • EHCache / Terracotta • Redis • Voldemort • Neo4J • Dynamo • ... Git? Spreadsheet? ...CapeDwarf?
  • 75. Q&A http://infinispan.org @Infinispan http://in.relation.to @Hibernate http://jboss.org @SanneGrinovero