SlideShare a Scribd company logo
INTERACTIVELY
SEARCH AND
VISUALIZE YOUR DATA
WITH SOLR AND SPARK
Romain Rigaux
GOALS

Build	
  a	
  Web	
  app	
  
Quickly	
  explore	
  data	
  
…	
  with	
  Solr
make	
  Solr	
  /	
  Hadoop	
  easier	
  to	
  use
+
ARCHITECTURE

“Just	
  a	
  view”	
  on	
  top	
  of	
  the	
  standard	
  Solr	
  API
REST
HISTORY

V1 USER
HISTORY

V1 ADMIN
ARCHITECTURE

NEXT!
Lot	
  of	
  learning,	
  UX	
  Boost	
  needed	
  
Simple,	
  don’t	
  know	
  it	
  is	
  Solr
HISTORY

V2 USER
HISTORY

V2 ADMIN
HISTORY

V2 BETTER UX
ARCHITECTURE
/select	
  
/admin/collections	
  
/get	
  
/luke...
/add_widget	
  
/zoom_in	
  
/select_facet	
  
/select_range...
REST AJAX
Templates	
  
+	
  
JS	
  Model
www….
ARCHITECTURE

UI FOR FACETS
Query
Collection
	
  Layout All	
  the	
  2D	
  positioning	
  (cell	
  ids),	
  visual,	
  drag&drop
Dashboard,	
  fields,	
  template,	
  widgets	
  (ids)
Search	
  terms,	
  selected	
  facets	
  (q,	
  fqs)
ADDING A WIDGET

LIFECYCLE
Load	
  the	
  initial	
  page	
  
Edit	
  mode	
  and	
  Drag&Drop
/solr/zookeeper/clusterstate.json	
  
/solr/admin/luke…
/get_collection
ADDING A WIDGET

LIFECYCLE
/solr/select?stats=true /new_facet
Select	
  the	
  field	
  
Guess	
  ranges	
  (number	
  or	
  dates)	
  
Rounding	
  (number	
  or	
  dates)
ADDING A WIDGET

LIFECYCLE
Query	
  part	
  1
Query	
  Part	
  2
Augment	
  Solr	
  response
facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&	
  
f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10
q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]
{
'facet_counts':{
'facet_ranges':{
'bytes':{
'start':10000,
'counts':[
'900000',
3423,
'1800000',
339,
...
]
}
}
}
{
...,
'normalized_facets':[
{
'extraSeries':[
],
'label':'bytes',
'field':'bytes',
'counts':[
{
'from’:'900000',
'to':'1800000',
'selected':True,
'value':3423,
'field’:'bytes',
'exclude':False
}
], ...
}
}
}
JSON TO WIDGET

{
"field":"rate_code",
"counts":[
{
"count":97797,
"exclude":true,
"selected":false,
"value":"1",
"cat":"rate_code"
} ...
{
"field":"medallion",
"counts":[
{
"count":159,
"exclude":true,
"selected":false,
"value":"6CA28FC49A4C49A9A96",
"cat":"medallion"
} ….
{
"extraSeries":[
],
"label":"trip_time_in_secs",
"field":"trip_time_in_secs",
"counts":[
{
"from":"0",
"to":"10",
"selected":false,
"value":527,
"field":"trip_time_in_secs",
"exclude":true
} ...
{
"field":"passenger_count",
"counts":[
{
"count":74766,
"exclude":true,
"selected":false,
"value":"1",
"cat":"passenger_count"
} ...
REPEAT

UNTIL…
GAME CHANGER!
Possibilihes
5.1	
  /	
  5.2
Analyhc	
  Facets
FACET

FUNCTIONS
Count	
  
Sum	
  
Avg	
  
Percentile	
  
Max	
  
...
Count(id)	
  
Sum(bytes)	
  
Avg(mul(price,	
  quantity))	
  
Percentile(salary,	
  50,	
  90)	
  
Max(temperature)	
  
...
FACET

FUNCTIONS
SUB “NESTED”

FACETS
top_os	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  os,	
  
	
  	
  limit:	
  5	
  
}
top_os	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  os,	
  
	
  	
  limit:	
  5,	
  
	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  by_country:	
  {	
  
	
  	
  	
  	
  	
  	
  type:	
  term,	
  
	
  	
  	
  	
  	
  	
  field:	
  country	
  
	
  	
  	
  	
  }	
  
	
  	
  }	
  
}
FUNCTION + NESTED =

ANALYTICS states	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  state,	
  
	
  	
  facet	
  :	
  {	
  
	
  	
  	
  by_month	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  type:	
  range,	
  
	
  	
  	
  	
  	
  	
  field:	
  time,	
  
	
  	
  	
  	
  	
  	
  start:	
  “TODAY-­‐6MONTHS”,	
  
	
  	
  	
  	
  	
  	
  end:	
  “TODAY”,	
  
	
  	
  	
  	
  	
  	
  gap:	
  “MONTH”,	
  
	
  	
  	
  	
  	
  	
  facet	
  :	
  {	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  avg_sal:	
  “avg(salary)”	
  
	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  }	
  
	
  	
  }	
  
}
states	
  {	
  
	
  	
  type:	
  term,	
  
	
  	
  field:	
  state,	
  
	
  	
  facet	
  :	
  {	
  
	
  	
  	
  	
  avg_sal:	
  “avg(salary)”	
  
	
  	
  }	
  
}
OPERATIONS ON

BUCKETS OF DATA
Counts	
  →	
  Functions
OPERATIONS ON

BUCKETS OF DATA
Nested	
  →	
  nD	
  functions
SEARCH AS ONLY

APP IN HUE
gethue.com/solr-­‐search-­‐ui-­‐only/
• Spark	
  in	
  your	
  browser	
  
• Notebooks	
  
• New	
  REST	
  Server
SPARK

INDEXING
WHAT
• Open	
  source	
  REST	
  for	
  Spark	
  Shell	
  
• Runs	
  locally	
  or	
  inside	
  YARN	
  
• Spark	
  Scala,	
  PySpark	
  and	
  jar/py	
  
submission
SPARK

INDEXING
WHAT
hpps://github.com/cloudera/hue/tree/master/apps/spark/java
LIVY ARCH
YARN LOCAL
Livy	
  Server
Livy	
  REPL
Spark	
  Contexts
Spark	
  Worker
Livy	
  Server
YARN	
  Master
YARN	
  Node
Livy	
  REPL
Spark	
  Context	
  /	
  PySpark
YARN	
  Node
Spark	
  Worker
YARN	
  Node
Spark	
  Worker
1
2
3
4
SPARK STREAMING
Real	
  hme!	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Spark	
  Solr
• Python	
  
• Scala	
  
• Charts
NOTEBOOKS / SHELL
WHAT
DEMO
TIME

• Analyze	
  Bay	
  area	
  bike	
  share	
  
• Visualize	
  one	
  year	
  of	
  data	
  
• Know	
  your	
  users,	
  predict	
  behavior
MISSED

SOMETHING?
demo.gethue.com
• Full	
  Analyhcs	
  
• Easier	
  indexing	
  
• Geo	
  
• Export/Share	
  results	
  
• Solr	
  Joins,	
  Solr	
  SQL	
  
• Spark,	
  SQL...	
  integrahon,	
  Hue	
  4
WHAT’S NEXT
NEW FEATURES
TWITTER
@gethue
USER GROUP
hue-­‐user@
WEBSITE
hpp://gethue.com
LEARN
hpp://learn.gethue.com
THANKS!


More Related Content

PDF
20150627 bigdatala
PDF
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
PDF
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
PPTX
Big Data Scala by the Bay: Interactive Spark in your Browser
PDF
Hue architecture in the Hadoop ecosystem and SQL Editor
PDF
LDAP, SAML and Hue
PDF
Hadoop Israel - HBase Browser in Hue
PDF
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
20150627 bigdatala
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Big Data Scala by the Bay: Interactive Spark in your Browser
Hue architecture in the Hadoop ecosystem and SQL Editor
LDAP, SAML and Hue
Hadoop Israel - HBase Browser in Hue
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...

What's hot (18)

PDF
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
PDF
Voldemortの紹介
ZIP
5分で説明する Play! scala
PPTX
Solr 4: Run Solr in SolrCloud Mode on your local file system.
PDF
HBase + Hue - LA HBase User Group
PDF
Pydata2014
PDF
Web development in Lua @ FOSDEM 2016
PPTX
Scaling Solr with Solr Cloud
PDF
Benchmarking at Parse
PDF
Harnessing Spark and Cassandra with Groovy
PDF
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
PDF
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
PPTX
Testing in Scala. Adform Research
PDF
Real-time search in Drupal. Meet Elasticsearch
PDF
Web development with Lua @ Bulgaria Web Summit 2016
PPTX
Spark intro by Adform Research
PDF
Spark tuning2016may11bida
PDF
Scaling search with Solr Cloud
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Voldemortの紹介
5分で説明する Play! scala
Solr 4: Run Solr in SolrCloud Mode on your local file system.
HBase + Hue - LA HBase User Group
Pydata2014
Web development in Lua @ FOSDEM 2016
Scaling Solr with Solr Cloud
Benchmarking at Parse
Harnessing Spark and Cassandra with Groovy
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Testing in Scala. Adform Research
Real-time search in Drupal. Meet Elasticsearch
Web development with Lua @ Bulgaria Web Summit 2016
Spark intro by Adform Research
Spark tuning2016may11bida
Scaling search with Solr Cloud
Ad

Viewers also liked (13)

PDF
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
PDF
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
PDF
Integrating Hadoop & Solr
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PPT
Data Discovery, Visualization, and Apache Hadoop
PPTX
Solr 6.0 Graph Query Overview
PDF
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
PDF
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
PDF
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
PDF
Big Data & Analytics for Government - Case Studies
PDF
Webinar: Solr 6 Deep Dive - SQL and Graph
PDF
Big data a possible game changer for e-governance
PDF
Solr+Hadoop = Big Data Search
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Integrating Hadoop & Solr
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Data Discovery, Visualization, and Apache Hadoop
Solr 6.0 Graph Query Overview
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
Big Data & Analytics for Government - Case Studies
Webinar: Solr 6 Deep Dive - SQL and Graph
Big data a possible game changer for e-governance
Solr+Hadoop = Big Data Search
Ad

Similar to SF Solr Meetup - Interactively Search and Visualize Your Big Data (20)

PDF
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
PDF
Interactive Query and Search for your Big Data
PDF
Rapid prototyping with solr - By Erik Hatcher
PDF
Rapid Prototyping with Solr
PDF
Play framework
PDF
SQL and Search with Spark in your browser
PPT
Rapid, Scalable Web Development with MongoDB, Ming, and Python
PDF
FOXX - a Javascript application framework on top of ArangoDB
PDF
Spark streaming , Spark SQL
PDF
Solr as a Spark SQL Datasource
PDF
New-Age Search through Apache Solr
PDF
SQLPage : building a web app for archaeologists in SQL
PDF
Rails and the Apache SOLR Search Engine
PPTX
NYC Lucene/Solr Meetup: Spark / Solr
PDF
The Heron Mapping Client - Overview, Functions, Concepts
PDF
Apache Solr! Enterprise Search Solutions at your Fingertips!
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
PDF
Jumpstart on Apache Spark 2.2 on Databricks
PDF
Jump Start on Apache® Spark™ 2.x with Databricks
PDF
Seven Versions of One Web Application
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Interactive Query and Search for your Big Data
Rapid prototyping with solr - By Erik Hatcher
Rapid Prototyping with Solr
Play framework
SQL and Search with Spark in your browser
Rapid, Scalable Web Development with MongoDB, Ming, and Python
FOXX - a Javascript application framework on top of ArangoDB
Spark streaming , Spark SQL
Solr as a Spark SQL Datasource
New-Age Search through Apache Solr
SQLPage : building a web app for archaeologists in SQL
Rails and the Apache SOLR Search Engine
NYC Lucene/Solr Meetup: Spark / Solr
The Heron Mapping Client - Overview, Functions, Concepts
Apache Solr! Enterprise Search Solutions at your Fingertips!
Machine Learning with H2O, Spark, and Python at Strata 2015
Jumpstart on Apache Spark 2.2 on Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Seven Versions of One Web Application

More from gethue (7)

PDF
Interactively Search and Visualize Your Big Data
PDF
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
PDF
Hue: The Hadoop UI - Hadoop Singapore
PDF
SF Dev Meetup - Hue SDK
PDF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
PDF
Hue: The Hadoop UI - HUG France
PDF
Hue: The Hadoop UI - Stockholm HUG
Interactively Search and Visualize Your Big Data
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Hue: The Hadoop UI - Hadoop Singapore
SF Dev Meetup - Hue SDK
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - HUG France
Hue: The Hadoop UI - Stockholm HUG

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PDF
Lecture1 pattern recognition............
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
Computer network topology notes for revision
Lecture1 pattern recognition............
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
1_Introduction to advance data techniques.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Qualitative Qantitative and Mixed Methods.pptx
Reliability_Chapter_ presentation 1221.5784
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Knowledge Engineering Part 1
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Galatica Smart Energy Infrastructure Startup Pitch Deck
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Acumen Training GuidePresentation.pptx

SF Solr Meetup - Interactively Search and Visualize Your Big Data