SlideShare a Scribd company logo
INTERACTIVELY 

QUERY AND SEARCH
YOUR BIG DATA
Romain Rigaux
GOALS

Build	
  a	
  Web	
  app	
  
Quickly	
  explore	
  data	
  
…	
  with	
  Solr
make	
  Solr	
  /	
  Hadoop	
  easier	
  to	
  use
+
ARCHITECTURE

“Just	
  a	
  view”	
  on	
  top	
  of	
  the	
  standard	
  Solr	
  API
REST
HISTORY

V1 USER
HISTORY

V1 ADMIN
ARCHITECTURE

NEXT!
Lot	
  of	
  learning,	
  UX	
  Boost	
  needed	
  
Simple,	
  don’t	
  know	
  it	
  is	
  Solr
HISTORY

V2 USER
HISTORY

V2 ADMIN
HISTORY

V2 BETTER UX
ARCHITECTURE
/select	
  
/admin/collections	
  
/get	
  
/luke...
/add_widget	
  
/zoom_in	
  
/select_facet	
  
/select_range...
REST AJAX
Templates	
  
+	
  
JS	
  Model
www….
ARCHITECTURE

UI FOR FACETS
Query
Collection
	
  Layout All	
  the	
  2D	
  positioning	
  (cell	
  ids),	
  visual,	
  drag&drop
Dashboard,	
  fields,	
  template,	
  widgets	
  (ids)
Search	
  terms,	
  selected	
  facets	
  (q,	
  fqs)
ADDING A WIDGET

LIFECYCLE
Load	
  the	
  initial	
  page	
  
Edit	
  mode	
  and	
  Drag&Drop
/solr/zookeeper/clusterstate.json	
  
/solr/admin/luke…
/get_collection
ADDING A WIDGET

LIFECYCLE
/solr/select?stats=true /new_facet
Select	
  the	
  field	
  
Guess	
  ranges	
  (number	
  or	
  dates)	
  
Rounding	
  (number	
  or	
  dates)
ADDING A WIDGET

LIFECYCLE
Query	
  part	
  1
Query	
  Part	
  2
Augment	
  Solr	
  response
facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&	
  
f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10
q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]
{
'facet_counts':{
'facet_ranges':{
'bytes':{
'start':10000,
'counts':[
'900000',
3423,
'1800000',
339,
...
]
}
}
}
{
...,
'normalized_facets':[
{
'extraSeries':[
],
'label':'bytes',
'field':'bytes',
'counts':[
{
'from’:'900000',
'to':'1800000',
'selected':True,
'value':3423,
'field’:'bytes',
'exclude':False
}
], ...
}
}
}
JSON TO WIDGET

{
"field":"rate_code",
"counts":[
{
"count":97797,
"exclude":true,
"selected":false,
"value":"1",
"cat":"rate_code"
} ...
{
"field":"medallion",
"counts":[
{
"count":159,
"exclude":true,
"selected":false,
"value":"6CA28FC49A4C49A9A96",
"cat":"medallion"
} ….
{
"extraSeries":[
],
"label":"trip_time_in_secs",
"field":"trip_time_in_secs",
"counts":[
{
"from":"0",
"to":"10",
"selected":false,
"value":527,
"field":"trip_time_in_secs",
"exclude":true
} ...
{
"field":"passenger_count",
"counts":[
{
"count":74766,
"exclude":true,
"selected":false,
"value":"1",
"cat":"passenger_count"
} ...
REPEAT

UNTIL…
GAME CHANGER!
Possibilihes
5.1	
  /	
  5.2
Analyhc	
  Facets
FACET

FUNCTIONS
Count	
  
Sum	
  
Avg	
  
Percentile	
  
Max	
  
...
Count(id)	
  
Sum(bytes)	
  
Avg(mul(price,	
  quantity))	
  
Percentile(salary,	
  50,	
  90)	
  
Max(temperature)	
  
...
FACET

FUNCTIONS
SUB “NESTED”

FACETS
top_os	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  os,	
  
	
  	
  limit:	
  5	
  
}
top_os	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  os,	
  
	
  	
  limit:	
  5,	
  
	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  by_country:	
  {	
  
	
  	
  	
  	
  	
  	
  type:	
  term,	
  
	
  	
  	
  	
  	
  	
  field:	
  country	
  
	
  	
  	
  	
  }	
  
	
  	
  }	
  
}
FUNCTION + NESTED =

ANALYTICS states	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  state,	
  
	
  	
  facet	
  :	
  {	
  
	
  	
  	
  by_month	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  type:	
  range,	
  
	
  	
  	
  	
  	
  	
  field:	
  time,	
  
	
  	
  	
  	
  	
  	
  start:	
  “TODAY-­‐6MONTHS”,	
  
	
  	
  	
  	
  	
  	
  end:	
  “TODAY”,	
  
	
  	
  	
  	
  	
  	
  gap:	
  “MONTH”,	
  
	
  	
  	
  	
  	
  	
  facet	
  :	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  avg_sal:	
  “avg(salary)”	
  
	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  }	
  
	
  	
  }	
  
}
states	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  state,	
  
	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  avg_sal:	
  “avg(salary)”	
  
	
  	
  }	
  
}
OPERATIONS ON

BUCKETS OF DATA
Counts	
  →	
  Functions
OPERATIONS ON

BUCKETS OF DATA
Nested	
  →	
  nD	
  functions
ENTERPRISE

FEATURES
- Access	
  to	
  Search	
  App	
  configurable,	
  LDAP/SAML	
  auths	
  
- Share	
  by	
  link	
  
- Solr	
  Cloud	
  (or	
  non	
  Cloud)	
  
- Proxy	
  user

	
  	
   /solr/jobs_demo/select?user.name=hue&doAs=romain&q=	
  
- Security

	
  	
   Kerberos	
  
- Sentry

	
  	
   Collection	
  level,	
  Solr	
  calls	
  like	
  /admin,	
  /query,	
  Solr	
  UI,	
  ZooKeeper
SEARCH AS ONLY

APP IN HUE
gethue.com/solr-­‐search-­‐ui-­‐only/
• Spark	
  in	
  your	
  browser	
  
• Notebooks	
  
• New	
  REST	
  Server
SPARK

INDEXING
WHAT
• Open	
  source	
  REST	
  for	
  Spark	
  Shell	
  
• Runs	
  locally	
  or	
  inside	
  YARN	
  
• Spark	
  Scala,	
  PySpark	
  and	
  jar/py	
  
submission
SPARK

INDEXING
WHAT
hsps://github.com/cloudera/hue/tree/master/apps/spark/java
SPARK STREAMING
Real	
  hme!	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Spark	
  Solr
• Pytho	
  
• Scala	
  
• Charts
NOTEBOOKS / SHELL
WHAT
DEMO
TIME

• Analyze	
  Bay	
  area	
  bike	
  share	
  
• Visualize	
  one	
  year	
  of	
  data	
  
• Know	
  your	
  users,	
  predict	
  behavior
MISSED

SOMETHING?
demo.gethue.com
• Full	
  Analyhcs	
  
• Easier	
  indexing	
  
• Geo	
  
• Export/Share	
  results	
  
• “More	
  like	
  this”	
  
• Solr	
  Joins,	
  Solr	
  SQL	
  
• Spark,	
  SQL...	
  integrahon,	
  Hue	
  4
WHAT’S NEXT
NEW FEATURES
TWITTER
@gethue
USER GROUP
hue-­‐user@
WEBSITE
hsp://gethue.com
LEARN
hsp://learn.gethue.com
THANKS!


More Related Content

PDF
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
PPTX
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
PDF
Sqoop on Spark for Data Ingestion
PDF
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
PDF
Jump Start on Apache® Spark™ 2.x with Databricks
PDF
Reactive app using actor model & apache spark
PDF
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
PDF
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Sqoop on Spark for Data Ingestion
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Jump Start on Apache® Spark™ 2.x with Databricks
Reactive app using actor model & apache spark
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Interoperating a Zoo of Data Processing Platforms Using with Rheem Sebastian ...

What's hot (20)

PDF
Spark streaming , Spark SQL
PDF
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
PDF
Cost-Based Optimizer in Apache Spark 2.2
PDF
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
PDF
Operational Tips for Deploying Spark by Miklos Christine
PDF
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
PPTX
Alpine academy apache spark series #1 introduction to cluster computing wit...
PDF
Automated Spark Deployment With Declarative Infrastructure
PDF
Building a Data Pipeline from Scratch - Joe Crobak
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
PDF
20170126 big data processing
PDF
Reactive dashboard’s using apache spark
PDF
Hadoop summit 2010, HONU
PDF
New Data Transfer Tools for Hadoop: Sqoop 2
PDF
Cost-based Query Optimization
PDF
Lessons from Running Large Scale Spark Workloads
PDF
Querying Data Pipeline with AWS Athena
PPTX
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
PDF
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark streaming , Spark SQL
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Cost-Based Optimizer in Apache Spark 2.2
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Operational Tips for Deploying Spark by Miklos Christine
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
Alpine academy apache spark series #1 introduction to cluster computing wit...
Automated Spark Deployment With Declarative Infrastructure
Building a Data Pipeline from Scratch - Joe Crobak
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
20170126 big data processing
Reactive dashboard’s using apache spark
Hadoop summit 2010, HONU
New Data Transfer Tools for Hadoop: Sqoop 2
Cost-based Query Optimization
Lessons from Running Large Scale Spark Workloads
Querying Data Pipeline with AWS Athena
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Ad

Viewers also liked (8)

PPTX
Presentation
PPTX
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
PPT
Structured Document Search and Retrieval
PPTX
Information retrival system and PageRank algorithm
PDF
Naive Bayesian Text Classifier Event Models
PPTX
E-Learning Baseline, UCL
PDF
Text classification & sentiment analysis
PDF
Keyword proximity search in xml trees andrada astefanoaie - presentation
Presentation
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Structured Document Search and Retrieval
Information retrival system and PageRank algorithm
Naive Bayesian Text Classifier Event Models
E-Learning Baseline, UCL
Text classification & sentiment analysis
Keyword proximity search in xml trees andrada astefanoaie - presentation
Ad

Similar to Interactive Query and Search for your Big Data (20)

PDF
20150627 bigdatala
PDF
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
PDF
SF Solr Meetup - Interactively Search and Visualize Your Big Data
PDF
Rapid prototyping with solr - By Erik Hatcher
PDF
Rapid Prototyping with Solr
PDF
Hue architecture in the Hadoop ecosystem and SQL Editor
PDF
SQL and Search with Spark in your browser
PDF
Made for Mobile - Let Office 365 Power Your Mobile Apps
PDF
Building mobile applications with DrupalGap
PDF
New-Age Search through Apache Solr
PDF
Oslo Solr MeetUp March 2012 - Solr4 alpha
PDF
Solr as a Spark SQL Datasource
PDF
Behavior Driven Development and Automation Testing Using Cucumber
PDF
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
PDF
Sails.js Intro
PDF
Apache Solr! Enterprise Search Solutions at your Fingertips!
KEY
The Open & Social Web - Kings of Code 2009
PDF
Make your gui shine with ajax solr
PPTX
My weekend startup: seocrawler.co
PPTX
Intro to node and mongodb 1
20150627 bigdatala
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
SF Solr Meetup - Interactively Search and Visualize Your Big Data
Rapid prototyping with solr - By Erik Hatcher
Rapid Prototyping with Solr
Hue architecture in the Hadoop ecosystem and SQL Editor
SQL and Search with Spark in your browser
Made for Mobile - Let Office 365 Power Your Mobile Apps
Building mobile applications with DrupalGap
New-Age Search through Apache Solr
Oslo Solr MeetUp March 2012 - Solr4 alpha
Solr as a Spark SQL Datasource
Behavior Driven Development and Automation Testing Using Cucumber
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Sails.js Intro
Apache Solr! Enterprise Search Solutions at your Fingertips!
The Open & Social Web - Kings of Code 2009
Make your gui shine with ajax solr
My weekend startup: seocrawler.co
Intro to node and mongodb 1

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
A comparative analysis of optical character recognition models for extracting...
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Tartificialntelligence_presentation.pptx
Spectroscopy.pptx food analysis technology
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Empathic Computing: Creating Shared Understanding
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine Learning_overview_presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
A comparative analysis of optical character recognition models for extracting...

Interactive Query and Search for your Big Data