SlideShare a Scribd company logo
Multi-criteria Queries on a Cassandra Application
Jérôme Mainaud
Ippon Technologies © 2015#CassandraSummit
Who am I
Jérôme Mainaud
➔ @jxerome
➔ Software Architect at Ippon Technologies, Paris
➔ DataStax Solution Architect Certified
Ippon Technologies © 2015#CassandraSummit
Ippon Technologies
● 200 software engineers in France and the US
➔ Paris, Nantes, Bordeaux
➔ Richmond (Virginia), Washington (DC)
● Expertise
➔ Digital, Big Data and Cloud
➔ Java & Agile
● Open-source Projects :
➔ JHipster,
➔ Tatami …
● @ipponusa
Agenda
1. Context
2. Technical Stack
3. Modelisation
4. Implementation
5. Results
Ippon Technologies © 2015
Warning
The following slideshow features data patterns and
code performed by professionals.
Accordingly, Ippon and conference organisers must
insist that no one attempt to recreate any data pattern
and code performed in this slideshow.
Once Upon a time an app …
Ippon Technologies © 2015#CassandraSummit
Once Upon a time an app …
Invoice application in SAAS
➔ A single database for all users
➔ Data isolation for each user
High volume data
➔ 1 year
➔ 500 millions invoices
➔ 2 billions invoice lines
Ippon Technologies © 2015#CassandraSummit
Once Upon a time an app …
Ippon Technologies © 2015#CassandraSummit
Once Upon a time an app …
Ippon Technologies © 2015#CassandraSummit
Back-end evolution
Technical Stack
Ippon Technologies © 2015#CassandraSummit
Technical Stack
JHipster
➔ Spring Boot + AngularJS Application Generator
➔ Support JPA, MongoDB
➔ and now Cassandra!
Made us generate first version very fast
➔ Application skeleton ready in 5 minutes
➔ Add entities tables, objets and mapping
➔ Configuration, build, logs management, etc.
➔ Gatling Tests ready to use
http://jhipster.github.io
Ippon Technologies © 2015#CassandraSummit
Technical Stack
Spring Boot
➔ Build on Spring
➔ Convention over configuration
➔ Many “starters” ready to use
Services Web
➔ CXF instead of Spring MVC REST
Cassandra
➔ DataStax Enterprise
Java 8
Ippon Technologies © 2015#CassandraSummit
JHipster — Code generator
● But
➔ Cassandra was not yet supported
➔ No AngularJS nor frontend
➔ CXF instead of Spring MVC
Ippon Technologies © 2015#CassandraSummit
JHipster — Code generator
● But
➔ Cassandra was not yet supported
➔ No AngularJS nor frontend
➔ CXF instead of Spring MVC
● JHipster alpha generator
➔ Secret Generator secret used to
validate concepts before writing
Yeoman generator
Ippon Technologies © 2015#CassandraSummit
JHipster — Code generator
Julien Dubois
Code Generator
Ippon Technologies © 2015#CassandraSummit
Cassandra Driver Configuration
Spring Boot Configuration
➔ No integration of driver DataStax Java Driver in Spring Boot
➔ Created Spring Boot autoconfiguration of DataStax Java Driver
➔ Use the standard YAML File
Offered to Spring Boot 1.3
➔ Github ticket #2064 « Add a spring-boot-starter-data-cassandra »
➔ Still opened
Improved by the Community
➔ JHipster version was improved by pull-request
➔ Authentication, Load-Balancer config
Data Model
Ippon Technologies © 2015#CassandraSummit
Conceptual Model
Ippon Technologies © 2015#CassandraSummit
Physical Model
Ippon Technologies © 2015#CassandraSummit
create table invoice (
invoice_id timeuuid,
user_id uuid static,
firstname text static,
lastname text static,
invoice_date timestamp static,
payment_date timestamp static,
total_amount decimal static,
delivery_address text static,
delivery_city text static,
delivery_zipcode text static,
item_id timeuuid,
item_label text,
item_price decimal,
item_qty int,
item_total decimal,
primary key (invoice_id, item_id)
);
Table
Multi-criteria Search
Ippon Technologies © 2015#CassandraSummit
Multi-criteria Search
Mandatory Criteria
➔ User (implicit)
➔ Invoice date (range of dates)
Additional Criteria
➔ Client lastname
➔ Client firstname
➔ City
➔ Zipcode
Paginated Result
Ippon Technologies © 2015#CassandraSummit
Shall we use Solr ?
Ippon Technologies © 2015#CassandraSummit
Shall we use Solr ?
● Integrated in DataStax Enterprise
● Atomic and Automatic Index update
● Full-Text Search
Ippon Technologies © 2015#CassandraSummit
Shall we use Solr ?
● We search on static columns
➔ Solr don’t support them
● We search partitions
➔ Solr search lines
Ippon Technologies © 2015#CassandraSummit
Shall we use Solr ?
● We search on static columns
➔ Solr don’t support them
● We search partitions
➔ Solr search lines
Ippon Technologies © 2015#CassandraSummit
Shall we use secondary indexes ?
● Only one index used for a query
● Hard to get good performance with them
Ippon Technologies © 2015#CassandraSummit
Index Table
Use index tables
➔ Partition Key : Mandatory criteria and one additional criterium
○ user_id
○ invoice day (truncated invoice date)
○ additional criterium
➔ Clustering columns : Invoice UUID
Ippon Technologies © 2015#CassandraSummit
Index Table
Ippon Technologies © 2015#CassandraSummit
Materialized view
CREATE MATERIALIZED VIEW invoice_by_firstname
AS
SELECT invoice_id
FROM invoice
WHERE firstname IS NOT NULL
PRIMARY KEY ((user_id, invoice_day, firstname), invoice_id)
WITH CLUSTERING ORDER BY (invoice_id DESC)
new in
3.0
Ippon Technologies © 2015#CassandraSummit
Parallel Search on indexes
in memory
merge by application
Ippon Technologies © 2015#CassandraSummit
Parallel item detail queries
Result Page (id)
Ippon Technologies © 2015#CassandraSummit
Search
Search on date range
➔ loop an every days in the range and stop
when there is enough result for a page
Ippon Technologies © 2015#CassandraSummit
Search Complexity
Query count
➔ For each day in date range
○ 1 query per additional criterium filled (partition by query)
➔ 1 query per item in result page (partition by query)
Search Complexity
➔ partitions by query
Example: 3 criteria, 7 days, 100 items per page
➔ query count ≤ 3 × 7 + 100 = 121
JAVA
Indexes
Ippon Technologies © 2015#CassandraSummit
Index — Instances
@Repository
public class InvoiceByLastNameRepository extends IndexRepository<String> {
public InvoiceByLastNameRepository() {
super("invoice_by_lastname", "lastname", Invoice::getLastName, Criteria::getLastName);
}
}
@Repository
public class InvoiceByFirstNameRepository extends IndexRepository<String> {
public InvoiceByFirstNameRepository() {
super("invoice_by_firstname", "firstname", Invoice::getFirstName, Criteria::getFirstName);
}
}
Ippon Technologies © 2015#CassandraSummit
Index — Parent Class
public class IndexRepository<T> {
@Inject
private Session session;
private final String tableName;
private final String valueName;
private final Function<Invoice, T> valueGetter;
private final Function<Criteria, T> criteriumGetter;
private PreparedStatement insertStmt;
private PreparedStatement findStmt;
private PreparedStatement findWithOffsetStmt;
@PostConstruct
public void init() { /* initialize PreparedStatements */ }
Ippon Technologies © 2015#CassandraSummit
Index — Insert
@Override
public void insert(Invoice invoice) {
T value = valueGetter.apply(invoice);
if (value != null) {
session.execute(
insertStmt.bind(
invoice.getUserId(),
Dates.toDate(invoice.getInvoiceDay()),
value,
invoice.getId()));
}
}
Ippon Technologies © 2015#CassandraSummit
Index — Insert — Prepare Statement
insertStmt = session.prepare(
QueryBuilder.insertInto(tableName)
.value("user_id", bindMarker())
.value("invoice_day", bindMarker())
.value(valueName, bindMarker())
.value("invoice_id", bindMarker())
);
Ippon Technologies © 2015#CassandraSummit
Index — Insert — Date conversion
public static Date toDate(LocalDate date) {
return date == null ? null :
Date.from(date.atStartOfDay().atZone(ZoneOffset.systemDefault()).toInstant());
}
Ippon Technologies © 2015#CassandraSummit
Index — Search
@Override
public CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID offset) {
T criterium = criteriumGetter.apply(criteria);
if (criterium == null) {
return CompletableFuture.completedFuture(null);
}
BoundStatement stmt;
if (invoiceIdOffset == null) {
stmt = findStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium);
} else {
stmt = findWithOffsetStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium, offset);
}
return Jdk8.completableFuture(session.executeAsync(stmt))
.thenApply(rs -> Iterators.transform(rs.iterator(), row -> row.getUUID(0)));
}
Ippon Technologies © 2015#CassandraSummit
Index — Search — Prepare Statement
findWithOffsetStmt = session.prepare(
QueryBuilder.select()
.column("invoice_id")
.from(tableName)
.where(eq("user_id", bindMarker()))
.and(eq("invoice_day", bindMarker()))
.and(eq(valueName, bindMarker()))
.and(lte("invoice_id", bindMarker()))
);
Ippon Technologies © 2015#CassandraSummit
Index — Search (Guava to Java 8)
public static <T> CompletableFuture<T> completableFuture(ListenableFuture<T> guavaFuture) {
CompletableFuture<T> future = new CompletableFuture<>();
Futures.addCallback(guavaFuture, new FutureCallback<T>() {
@Override
public void onSuccess(T result) {
future.complete(result);
}
@Override
public void onFailure(Throwable t) {
future.completeExceptionally(t);
}
});
return future;
}
JAVA
Search Service
Ippon Technologies © 2015#CassandraSummit
Service — Class
@Service
public class InvoiceSearchService {
@Inject
private InvoiceRepository invoiceRepository;
@Inject
private InvoiceByDayRepository byDayRepository;
@Inject
private InvoiceByLastNameRepository byLastNameRepository;
@Inject
private InvoiceByFirstNameRepository byLastNameRepository;
@Inject
private InvoiceByCityRepository byCityRepository;
@Inject
private InvoiceByZipCodeRepository byZipCodeRepository;
Ippon Technologies © 2015#CassandraSummit
Service — Search
public ResultPage findByCriteria(Criteria criteria) {
return byDateInteval(criteria, (crit, day, offset) -> {
CompletableFuture<Iterator<UUID>> futureUuidIt;
if (crit.hasIndexedCriteria()) {
/*
* ... Doing multi-criteria search; see next slide ...
*/
} else {
futureUuidIt = byDayRepository.find(crit.getUserId(), day, offset);
}
return futureUuidIt;
});
}
Ippon Technologies © 2015#CassandraSummit
Service — Search
CompletableFuture<Iterator<UUID>>[] futures = Stream.<IndexRepository> of(
byLastNameRepository, byFirstNameRepository, byCityRepository, byZipCodeRepository)
.map(repo -> repo.find(crit, day, offset))
.toArray(CompletableFuture[]::new);
futureUuidIt = CompletableFuture.allOf(futures).thenApply(v ->
Iterators.intersection(TimeUUIDComparator.desc,
Stream.of(futures)
.map(CompletableFuture::join)
.filter(Objects::nonNull)
.collect(Collectors.toList())));
Ippon Technologies © 2015#CassandraSummit
Service — UUIDs Comparator
/**
* TimeUUID Comparator equivalent to Cassandra’s Comparator:
* @see org.apache.cassandra.db.marshal.TimeUUIDType#compare()
*/
public enum TimeUUIDComparator implements Comparator<UUID> {
desc {
@Override
public int compare(UUID o1, UUID o2) {
long delta = o2.timestamp() - o1.timestamp();
if (delta != 0)
return Ints.saturatedCast(delta);
return o2.compareTo(o1);
}
};
}
Ippon Technologies © 2015#CassandraSummit
Service — Days Loop
@FunctionalInterface
private static interface DayQuery {
CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID invoiceIdOffset);
}
private ResultPage byDateInteval(Criteria criteria, DayQuery dayQuery) {
int limit = criteria.getLimit();
List<Invoice> resultList = new ArrayList<>(limit);
LocalDate dayOffset = criteria.getDayOffset();
UUID invoiceIdOffset = criteria.getInvoiceIdOffset();
/* ... Loop on days ; to be seen in next slide ... */
return new ResultPage(resultList);
}
Ippon Technologies © 2015#CassandraSummit
Service — Days Loop
LocalDate day = criteria.getLastDay();
do {
Iterator<UUID> uuidIt = dayQuery.find(criteria, day, invoiceIdOffset).join();
limit -= loadInvoices(resultList, uuidIt, criteria, limit);
if (uuidIt.hasNext()) {
return new ResultPage(resultList, day, uuidIt.next());
}
day = day.minusDays(1);
invoiceIdOffset = null;
} while (!day.isBefore(criteria.getFirstDay()));
Ippon Technologies © 2015#CassandraSummit
Service — Invoices Loading
private int loadInvoices(List<Invoice> resultList, Iterator<UUID> uuidIt, int limit) {
List<CompletableFuture<Invoice>> futureList = new ArrayList<>(limit);
for (int i = 0; i < limit && uuidIt.hasNext(); ++i) {
futureList.add(invoiceRepository.findOne(uuidIt.next()));
}
futureList.stream()
.map(CompletableFuture::join)
.forEach(resultList::add);
return futureList.size();
}
Results
Ippon Technologies © 2015#CassandraSummit
Limits
● We got an exact-match search
➔ No full text search
➔ No « start with » search
➔ No pattern base search
● Requires highly discriminating mandatory criteria
➔ user_id & invoice_day
● Pagination doesn’t give total item count
➔ Could be done with additionnal query cost
● No sort availaible
Ippon Technologies © 2015#CassandraSummit
Hardware
● Hosted by Ippon Hosting
● 8 nodes
➔ 16 Gb RAM
➔ Two SSD drives with 256 Gb in RAID 0
● 6 nodes dedicated to Cassandra cluster
● 2 nodes dedicated to the application
Ippon Technologies © 2015#CassandraSummit
Application
● 5,000 concurrent users
● 9 months of data loaded
➔ Legacy system: store 1 year; search on last 3 months.
➔ Target: 3 years of history
● Real-time search Result
➔ Data are immediately available
➔ Legacy system: data available next day
● Cost Killer
Q & A
PARIS
BORDEAUX
NANTES
WASHINGTON
NEW-YORK
RICHMOND
contact@ippon.fr
www.ippon.fr - www.ippon-hosting.com - www.ippon-digital.fr
@ippontech
-
01 46 12 48 48

More Related Content

PDF
Symfony Day 2009 - Symfony vs Integrating products
PDF
Benefits of OSGi in Practise
PDF
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
PPT
The Web on OSGi: Here's How
PDF
JavaCro'15 - Everything a Java EE Developer needs to know about the JavaScrip...
PDF
Comparing JSF, Spring MVC, Stripes, Struts 2, Tapestry and Wicket
PPTX
Migrating from MFC to Qt
PPTX
An XPager's Guide to Process Server-Side Jobs on Domino
Symfony Day 2009 - Symfony vs Integrating products
Benefits of OSGi in Practise
JavaOne 2015 CON7547 "Beyond the Coffee Cup: Leveraging Java Runtime Technolo...
The Web on OSGi: Here's How
JavaCro'15 - Everything a Java EE Developer needs to know about the JavaScrip...
Comparing JSF, Spring MVC, Stripes, Struts 2, Tapestry and Wicket
Migrating from MFC to Qt
An XPager's Guide to Process Server-Side Jobs on Domino

What's hot (19)

PDF
Embrace Change - Embrace OSGi
PPTX
Maven
PDF
FreshAir2008
PDF
Preparing your code for Java 9
PDF
JPQL/ JPA Activity 1
 
PDF
Running Spring Boot Applications as GraalVM Native Images
PPTX
BP207 - Meet the Java Application Server You Already Own – IBM Domino
ODP
An Introduction to Maven Part 1
PDF
Intro To OSGi
PDF
JVMs in Containers
PDF
Spring Native and Spring AOT
PDF
Modern web application development with java ee 7
KEY
Virgo Project Creation Review
PDF
SpringOne Platform recap 정윤진
PDF
Geneva Jug (30th March, 2010) - Maven
PDF
Alpes Jug (29th March, 2010) - Apache Maven
PDF
The 2014 Decision Makers Guide to Java Web Frameworks
PDF
Creating Large Scale Software Platforms with OSGi and an Extension Point Mode...
PDF
Karaf ee-apachecon eu-2012
Embrace Change - Embrace OSGi
Maven
FreshAir2008
Preparing your code for Java 9
JPQL/ JPA Activity 1
 
Running Spring Boot Applications as GraalVM Native Images
BP207 - Meet the Java Application Server You Already Own – IBM Domino
An Introduction to Maven Part 1
Intro To OSGi
JVMs in Containers
Spring Native and Spring AOT
Modern web application development with java ee 7
Virgo Project Creation Review
SpringOne Platform recap 정윤진
Geneva Jug (30th March, 2010) - Maven
Alpes Jug (29th March, 2010) - Apache Maven
The 2014 Decision Makers Guide to Java Web Frameworks
Creating Large Scale Software Platforms with OSGi and an Extension Point Mode...
Karaf ee-apachecon eu-2012
Ad

Viewers also liked (20)

PDF
Quoi de neuf pour JHipster en 2016
PPTX
Système d’Information à l’Apec : un nouveau coeur de métier mis en place avec...
PDF
Démystifions le machine learning avec spark par David Martin pour le Salon B...
PDF
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014
PDF
Cassandra Java Driver : vers Cassandra 1.2 et au-delà
PDF
Agilité, n’oublions pas les valeurs
PDF
Formation Spring Avancé gratuite par Ippon 2014
PDF
Atelier TDD (Test Driven Development)
PDF
Formation Gratuite Total Tests par les experts Java Ippon
PDF
Realtime Web avec Akka, Kafka, Spark et Mesos - Devoxx Paris 2014
PDF
Web API & Cache, the HTTP way - Ippevent 10 Juin 2014
PDF
Formation Usine Logicielle gratuite par Ippon 2014
PDF
Formation html5 CSS3 offerte par ippon 2014
PDF
Formation JPA Avancé / Hibernate gratuite par Ippon 2014
PDF
Formation GIT gratuite par ippon 2014
PDF
JPA avec Cassandra, grâce à Achilles
PDF
Nouveau look pour une nouvelle vie : HTML5, Spring, NoSQL et Mobile
PDF
Ippon: Doing multi-criteria queries on a Cassandra application (Français)
PPTX
La gouvernance de l'information est une affaire de changement - Conférence SE...
PPT
Offre 3org Conseil sur la gouvernance et la gestion de l'information d'entrep...
Quoi de neuf pour JHipster en 2016
Système d’Information à l’Apec : un nouveau coeur de métier mis en place avec...
Démystifions le machine learning avec spark par David Martin pour le Salon B...
One Web (API?) – Alexandre Bertails - Ippevent 10 juin 2014
Cassandra Java Driver : vers Cassandra 1.2 et au-delà
Agilité, n’oublions pas les valeurs
Formation Spring Avancé gratuite par Ippon 2014
Atelier TDD (Test Driven Development)
Formation Gratuite Total Tests par les experts Java Ippon
Realtime Web avec Akka, Kafka, Spark et Mesos - Devoxx Paris 2014
Web API & Cache, the HTTP way - Ippevent 10 Juin 2014
Formation Usine Logicielle gratuite par Ippon 2014
Formation html5 CSS3 offerte par ippon 2014
Formation JPA Avancé / Hibernate gratuite par Ippon 2014
Formation GIT gratuite par ippon 2014
JPA avec Cassandra, grâce à Achilles
Nouveau look pour une nouvelle vie : HTML5, Spring, NoSQL et Mobile
Ippon: Doing multi-criteria queries on a Cassandra application (Français)
La gouvernance de l'information est une affaire de changement - Conférence SE...
Offre 3org Conseil sur la gouvernance et la gestion de l'information d'entrep...
Ad

Similar to Multi criteria queries on a cassandra application (20)

PDF
LJC Conference 2014 Cassandra for Java Developers
PDF
Cassandra Summit 2015 - A Change of Seasons
PDF
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
PPTX
Cassandra Day Atlanta 2015: BetterCloud: Leveraging Apache Cassandra
PPTX
Apache MetaModel - unified access to all your data points
ODP
Meetup cassandra for_java_cql
PPTX
In memory databases presentation
PDF
Spark & Cassandra - DevFest Córdoba
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
PPTX
Lessons Learned with Cassandra and Spark at the US Patent and Trademark Office
DOCX
Cassandra data modelling best practices
PDF
Software Development with Apache Cassandra
PDF
Successful Software Development with Apache Cassandra
PDF
Cassandra 3 new features 2016
PDF
Single View of the Customer
ODP
Introduciton to Apache Cassandra for Java Developers (JavaOne)
ODP
Nyc summit intro_to_cassandra
PPTX
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
PPTX
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
PDF
Cassandra: An Alien Technology That's not so Alien
LJC Conference 2014 Cassandra for Java Developers
Cassandra Summit 2015 - A Change of Seasons
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Cassandra Day Atlanta 2015: BetterCloud: Leveraging Apache Cassandra
Apache MetaModel - unified access to all your data points
Meetup cassandra for_java_cql
In memory databases presentation
Spark & Cassandra - DevFest Córdoba
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Lessons Learned with Cassandra and Spark at the US Patent and Trademark Office
Cassandra data modelling best practices
Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
Cassandra 3 new features 2016
Single View of the Customer
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Nyc summit intro_to_cassandra
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Webinar: Get On-Demand Education Anytime, Anywhere with Coursera and DataStax
Cassandra: An Alien Technology That's not so Alien

More from Ippon (11)

PDF
Offre 2015 numeriq_ippon
KEY
CDI par la pratique
PDF
Hibernate vs le_cloud_computing
PDF
Stateful is beautiful
PDF
Présentation Ippon DGA Liferay Symposium 2011
PDF
Scrum et forfait
PDF
Mule ESB Summit 2010 avec Ippon
PDF
Présentation du retour d'expérience sur Git
PDF
Présentation Rex GWT 2.0
PDF
Presentation Rex Methodes Agiles
PDF
Seminaire Portail Open Source
Offre 2015 numeriq_ippon
CDI par la pratique
Hibernate vs le_cloud_computing
Stateful is beautiful
Présentation Ippon DGA Liferay Symposium 2011
Scrum et forfait
Mule ESB Summit 2010 avec Ippon
Présentation du retour d'expérience sur Git
Présentation Rex GWT 2.0
Presentation Rex Methodes Agiles
Seminaire Portail Open Source

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
Machine Learning_overview_presentation.pptx
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25-Week II
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology

Multi criteria queries on a cassandra application

  • 1. Multi-criteria Queries on a Cassandra Application Jérôme Mainaud
  • 2. Ippon Technologies © 2015#CassandraSummit Who am I Jérôme Mainaud ➔ @jxerome ➔ Software Architect at Ippon Technologies, Paris ➔ DataStax Solution Architect Certified
  • 3. Ippon Technologies © 2015#CassandraSummit Ippon Technologies ● 200 software engineers in France and the US ➔ Paris, Nantes, Bordeaux ➔ Richmond (Virginia), Washington (DC) ● Expertise ➔ Digital, Big Data and Cloud ➔ Java & Agile ● Open-source Projects : ➔ JHipster, ➔ Tatami … ● @ipponusa
  • 4. Agenda 1. Context 2. Technical Stack 3. Modelisation 4. Implementation 5. Results
  • 5. Ippon Technologies © 2015 Warning The following slideshow features data patterns and code performed by professionals. Accordingly, Ippon and conference organisers must insist that no one attempt to recreate any data pattern and code performed in this slideshow.
  • 6. Once Upon a time an app …
  • 7. Ippon Technologies © 2015#CassandraSummit Once Upon a time an app … Invoice application in SAAS ➔ A single database for all users ➔ Data isolation for each user High volume data ➔ 1 year ➔ 500 millions invoices ➔ 2 billions invoice lines
  • 8. Ippon Technologies © 2015#CassandraSummit Once Upon a time an app …
  • 9. Ippon Technologies © 2015#CassandraSummit Once Upon a time an app …
  • 10. Ippon Technologies © 2015#CassandraSummit Back-end evolution
  • 12. Ippon Technologies © 2015#CassandraSummit Technical Stack JHipster ➔ Spring Boot + AngularJS Application Generator ➔ Support JPA, MongoDB ➔ and now Cassandra! Made us generate first version very fast ➔ Application skeleton ready in 5 minutes ➔ Add entities tables, objets and mapping ➔ Configuration, build, logs management, etc. ➔ Gatling Tests ready to use http://jhipster.github.io
  • 13. Ippon Technologies © 2015#CassandraSummit Technical Stack Spring Boot ➔ Build on Spring ➔ Convention over configuration ➔ Many “starters” ready to use Services Web ➔ CXF instead of Spring MVC REST Cassandra ➔ DataStax Enterprise Java 8
  • 14. Ippon Technologies © 2015#CassandraSummit JHipster — Code generator ● But ➔ Cassandra was not yet supported ➔ No AngularJS nor frontend ➔ CXF instead of Spring MVC
  • 15. Ippon Technologies © 2015#CassandraSummit JHipster — Code generator ● But ➔ Cassandra was not yet supported ➔ No AngularJS nor frontend ➔ CXF instead of Spring MVC ● JHipster alpha generator ➔ Secret Generator secret used to validate concepts before writing Yeoman generator
  • 16. Ippon Technologies © 2015#CassandraSummit JHipster — Code generator Julien Dubois Code Generator
  • 17. Ippon Technologies © 2015#CassandraSummit Cassandra Driver Configuration Spring Boot Configuration ➔ No integration of driver DataStax Java Driver in Spring Boot ➔ Created Spring Boot autoconfiguration of DataStax Java Driver ➔ Use the standard YAML File Offered to Spring Boot 1.3 ➔ Github ticket #2064 « Add a spring-boot-starter-data-cassandra » ➔ Still opened Improved by the Community ➔ JHipster version was improved by pull-request ➔ Authentication, Load-Balancer config
  • 19. Ippon Technologies © 2015#CassandraSummit Conceptual Model
  • 20. Ippon Technologies © 2015#CassandraSummit Physical Model
  • 21. Ippon Technologies © 2015#CassandraSummit create table invoice ( invoice_id timeuuid, user_id uuid static, firstname text static, lastname text static, invoice_date timestamp static, payment_date timestamp static, total_amount decimal static, delivery_address text static, delivery_city text static, delivery_zipcode text static, item_id timeuuid, item_label text, item_price decimal, item_qty int, item_total decimal, primary key (invoice_id, item_id) ); Table
  • 23. Ippon Technologies © 2015#CassandraSummit Multi-criteria Search Mandatory Criteria ➔ User (implicit) ➔ Invoice date (range of dates) Additional Criteria ➔ Client lastname ➔ Client firstname ➔ City ➔ Zipcode Paginated Result
  • 24. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ?
  • 25. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ? ● Integrated in DataStax Enterprise ● Atomic and Automatic Index update ● Full-Text Search
  • 26. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ? ● We search on static columns ➔ Solr don’t support them ● We search partitions ➔ Solr search lines
  • 27. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ? ● We search on static columns ➔ Solr don’t support them ● We search partitions ➔ Solr search lines
  • 28. Ippon Technologies © 2015#CassandraSummit Shall we use secondary indexes ? ● Only one index used for a query ● Hard to get good performance with them
  • 29. Ippon Technologies © 2015#CassandraSummit Index Table Use index tables ➔ Partition Key : Mandatory criteria and one additional criterium ○ user_id ○ invoice day (truncated invoice date) ○ additional criterium ➔ Clustering columns : Invoice UUID
  • 30. Ippon Technologies © 2015#CassandraSummit Index Table
  • 31. Ippon Technologies © 2015#CassandraSummit Materialized view CREATE MATERIALIZED VIEW invoice_by_firstname AS SELECT invoice_id FROM invoice WHERE firstname IS NOT NULL PRIMARY KEY ((user_id, invoice_day, firstname), invoice_id) WITH CLUSTERING ORDER BY (invoice_id DESC) new in 3.0
  • 32. Ippon Technologies © 2015#CassandraSummit Parallel Search on indexes in memory merge by application
  • 33. Ippon Technologies © 2015#CassandraSummit Parallel item detail queries Result Page (id)
  • 34. Ippon Technologies © 2015#CassandraSummit Search Search on date range ➔ loop an every days in the range and stop when there is enough result for a page
  • 35. Ippon Technologies © 2015#CassandraSummit Search Complexity Query count ➔ For each day in date range ○ 1 query per additional criterium filled (partition by query) ➔ 1 query per item in result page (partition by query) Search Complexity ➔ partitions by query Example: 3 criteria, 7 days, 100 items per page ➔ query count ≤ 3 × 7 + 100 = 121
  • 37. Ippon Technologies © 2015#CassandraSummit Index — Instances @Repository public class InvoiceByLastNameRepository extends IndexRepository<String> { public InvoiceByLastNameRepository() { super("invoice_by_lastname", "lastname", Invoice::getLastName, Criteria::getLastName); } } @Repository public class InvoiceByFirstNameRepository extends IndexRepository<String> { public InvoiceByFirstNameRepository() { super("invoice_by_firstname", "firstname", Invoice::getFirstName, Criteria::getFirstName); } }
  • 38. Ippon Technologies © 2015#CassandraSummit Index — Parent Class public class IndexRepository<T> { @Inject private Session session; private final String tableName; private final String valueName; private final Function<Invoice, T> valueGetter; private final Function<Criteria, T> criteriumGetter; private PreparedStatement insertStmt; private PreparedStatement findStmt; private PreparedStatement findWithOffsetStmt; @PostConstruct public void init() { /* initialize PreparedStatements */ }
  • 39. Ippon Technologies © 2015#CassandraSummit Index — Insert @Override public void insert(Invoice invoice) { T value = valueGetter.apply(invoice); if (value != null) { session.execute( insertStmt.bind( invoice.getUserId(), Dates.toDate(invoice.getInvoiceDay()), value, invoice.getId())); } }
  • 40. Ippon Technologies © 2015#CassandraSummit Index — Insert — Prepare Statement insertStmt = session.prepare( QueryBuilder.insertInto(tableName) .value("user_id", bindMarker()) .value("invoice_day", bindMarker()) .value(valueName, bindMarker()) .value("invoice_id", bindMarker()) );
  • 41. Ippon Technologies © 2015#CassandraSummit Index — Insert — Date conversion public static Date toDate(LocalDate date) { return date == null ? null : Date.from(date.atStartOfDay().atZone(ZoneOffset.systemDefault()).toInstant()); }
  • 42. Ippon Technologies © 2015#CassandraSummit Index — Search @Override public CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID offset) { T criterium = criteriumGetter.apply(criteria); if (criterium == null) { return CompletableFuture.completedFuture(null); } BoundStatement stmt; if (invoiceIdOffset == null) { stmt = findStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium); } else { stmt = findWithOffsetStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium, offset); } return Jdk8.completableFuture(session.executeAsync(stmt)) .thenApply(rs -> Iterators.transform(rs.iterator(), row -> row.getUUID(0))); }
  • 43. Ippon Technologies © 2015#CassandraSummit Index — Search — Prepare Statement findWithOffsetStmt = session.prepare( QueryBuilder.select() .column("invoice_id") .from(tableName) .where(eq("user_id", bindMarker())) .and(eq("invoice_day", bindMarker())) .and(eq(valueName, bindMarker())) .and(lte("invoice_id", bindMarker())) );
  • 44. Ippon Technologies © 2015#CassandraSummit Index — Search (Guava to Java 8) public static <T> CompletableFuture<T> completableFuture(ListenableFuture<T> guavaFuture) { CompletableFuture<T> future = new CompletableFuture<>(); Futures.addCallback(guavaFuture, new FutureCallback<T>() { @Override public void onSuccess(T result) { future.complete(result); } @Override public void onFailure(Throwable t) { future.completeExceptionally(t); } }); return future; }
  • 46. Ippon Technologies © 2015#CassandraSummit Service — Class @Service public class InvoiceSearchService { @Inject private InvoiceRepository invoiceRepository; @Inject private InvoiceByDayRepository byDayRepository; @Inject private InvoiceByLastNameRepository byLastNameRepository; @Inject private InvoiceByFirstNameRepository byLastNameRepository; @Inject private InvoiceByCityRepository byCityRepository; @Inject private InvoiceByZipCodeRepository byZipCodeRepository;
  • 47. Ippon Technologies © 2015#CassandraSummit Service — Search public ResultPage findByCriteria(Criteria criteria) { return byDateInteval(criteria, (crit, day, offset) -> { CompletableFuture<Iterator<UUID>> futureUuidIt; if (crit.hasIndexedCriteria()) { /* * ... Doing multi-criteria search; see next slide ... */ } else { futureUuidIt = byDayRepository.find(crit.getUserId(), day, offset); } return futureUuidIt; }); }
  • 48. Ippon Technologies © 2015#CassandraSummit Service — Search CompletableFuture<Iterator<UUID>>[] futures = Stream.<IndexRepository> of( byLastNameRepository, byFirstNameRepository, byCityRepository, byZipCodeRepository) .map(repo -> repo.find(crit, day, offset)) .toArray(CompletableFuture[]::new); futureUuidIt = CompletableFuture.allOf(futures).thenApply(v -> Iterators.intersection(TimeUUIDComparator.desc, Stream.of(futures) .map(CompletableFuture::join) .filter(Objects::nonNull) .collect(Collectors.toList())));
  • 49. Ippon Technologies © 2015#CassandraSummit Service — UUIDs Comparator /** * TimeUUID Comparator equivalent to Cassandra’s Comparator: * @see org.apache.cassandra.db.marshal.TimeUUIDType#compare() */ public enum TimeUUIDComparator implements Comparator<UUID> { desc { @Override public int compare(UUID o1, UUID o2) { long delta = o2.timestamp() - o1.timestamp(); if (delta != 0) return Ints.saturatedCast(delta); return o2.compareTo(o1); } }; }
  • 50. Ippon Technologies © 2015#CassandraSummit Service — Days Loop @FunctionalInterface private static interface DayQuery { CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID invoiceIdOffset); } private ResultPage byDateInteval(Criteria criteria, DayQuery dayQuery) { int limit = criteria.getLimit(); List<Invoice> resultList = new ArrayList<>(limit); LocalDate dayOffset = criteria.getDayOffset(); UUID invoiceIdOffset = criteria.getInvoiceIdOffset(); /* ... Loop on days ; to be seen in next slide ... */ return new ResultPage(resultList); }
  • 51. Ippon Technologies © 2015#CassandraSummit Service — Days Loop LocalDate day = criteria.getLastDay(); do { Iterator<UUID> uuidIt = dayQuery.find(criteria, day, invoiceIdOffset).join(); limit -= loadInvoices(resultList, uuidIt, criteria, limit); if (uuidIt.hasNext()) { return new ResultPage(resultList, day, uuidIt.next()); } day = day.minusDays(1); invoiceIdOffset = null; } while (!day.isBefore(criteria.getFirstDay()));
  • 52. Ippon Technologies © 2015#CassandraSummit Service — Invoices Loading private int loadInvoices(List<Invoice> resultList, Iterator<UUID> uuidIt, int limit) { List<CompletableFuture<Invoice>> futureList = new ArrayList<>(limit); for (int i = 0; i < limit && uuidIt.hasNext(); ++i) { futureList.add(invoiceRepository.findOne(uuidIt.next())); } futureList.stream() .map(CompletableFuture::join) .forEach(resultList::add); return futureList.size(); }
  • 54. Ippon Technologies © 2015#CassandraSummit Limits ● We got an exact-match search ➔ No full text search ➔ No « start with » search ➔ No pattern base search ● Requires highly discriminating mandatory criteria ➔ user_id & invoice_day ● Pagination doesn’t give total item count ➔ Could be done with additionnal query cost ● No sort availaible
  • 55. Ippon Technologies © 2015#CassandraSummit Hardware ● Hosted by Ippon Hosting ● 8 nodes ➔ 16 Gb RAM ➔ Two SSD drives with 256 Gb in RAID 0 ● 6 nodes dedicated to Cassandra cluster ● 2 nodes dedicated to the application
  • 56. Ippon Technologies © 2015#CassandraSummit Application ● 5,000 concurrent users ● 9 months of data loaded ➔ Legacy system: store 1 year; search on last 3 months. ➔ Target: 3 years of history ● Real-time search Result ➔ Data are immediately available ➔ Legacy system: data available next day ● Cost Killer
  • 57. Q & A