The	
     Power	
  

          Tareque	
  Hossain	
  
          Sr.	
  Software	
  Engineer	
  
                                    	
  
What	
  about	
  it?	
  


•  We	
  always	
  associate	
  solr	
  with	
  searching	
  
•  solr	
  can	
  also	
  serve	
  as	
  your	
  non-­‐relational	
  
   data	
  layer	
  
NoSQL	
  ?	
  solr	
  ?	
  
The solr power
Why	
  solr?	
  

•  Hey	
  solr	
  is	
  already	
  part	
  of	
  my	
  stack	
  
•  I	
  love	
  solr	
  
•  It’s	
  fast,	
  scalable	
  and	
  there	
  are	
  some	
  great	
  
   python	
  	
  	
  	
  	
  	
  	
  interfaces	
  out	
  there	
  
When	
  would	
  you	
  consider	
  it?	
  

•  You	
  have	
  a	
  DB	
  that’s	
  frequently	
  read	
  and	
  
   infrequently	
  written	
  
•  You	
  want	
  robust	
  search	
  &	
  filtering	
  on	
  your	
  
   data	
  
•  You	
  want	
  to	
  leverage	
  the	
  faceting	
  feature	
  
•  You	
  want	
  a	
  decently	
  scalable	
  data	
  layer	
  
What’s	
  not	
  so	
  cool?	
  

•  Doesn’t	
  support	
  transactions	
  
•  Not	
  all	
  SQL	
  queries	
  can	
  be	
  translated	
  into	
  
   solr	
  queries	
  
•  Generating	
  indices	
  can	
  take	
  a	
  long	
  time	
  
•  Searching	
  and	
  indexing	
  at	
  the	
  same	
  time	
  
   brings	
  down	
  performance	
  
But..	
  
•  You	
  don’t	
  have	
  to	
  give	
  up	
  your	
  relational	
  
   data	
  layer	
  
•  Create	
  a	
  non-­‐relational	
  layer	
  on	
  top	
  of	
  your	
  
   relational	
  data	
  layer	
  
•  Get	
  best	
  of	
  the	
  both	
  worlds	
  
So	
  what’s	
  the	
  use	
  case?	
  
•  We	
  deal	
  with	
  medical	
  survey	
  data	
  
•  Say:	
  
    –  About	
  300	
  multiple	
  choice	
  questions	
  
    –  Responses	
  can	
  be	
  multi-­‐dimensional	
  
    –  7000+	
  different	
  answer	
  choices	
  per	
  question	
  
    –  2000+	
  respondents	
  per	
  survey	
  
    –  15+	
  surveys	
  and	
  growing	
  
What	
  a	
  survey	
  question	
  looks	
  like	
  
 When	
  were	
  you	
  diagnosed	
  with	
  the	
  following	
  types	
  of	
  
 Arthri5s?	
  


                                             Rheumatoid	
   Traumatic	
      Psoriatic	
  
                        Osteoarthritis	
                                                     Other	
  
                                              Arthritis	
    Arthritis	
     Arthritis	
  

Less	
  than	
  a	
  
                               þ	
               ☐	
             ☐	
            ☐	
           ☐	
  
 year	
  ago	
  

More	
  than	
  a	
  
                               ☐	
                ☐	
             þ	
           ☐	
           ☐	
  
 year	
  ago	
  
Storing	
  a	
  single	
  response	
  
 When	
  were	
  you	
  diagnosed	
  with	
  the	
  following	
  types	
  of	
  
 Arthri5s?	
  


                                             Rheumatoid	
   Traumatic	
      Psoriatic	
  
                        Osteoarthritis	
                                                     Other	
  
                                              Arthritis	
    Arthritis	
     Arthritis	
  

Less	
  than	
  a	
  
                               1	
                 0	
             0	
            0	
           0	
  
 year	
  ago	
  

More	
  than	
  a	
  
                               0	
                 0	
             1	
            0	
           0	
  
 year	
  ago	
  
Aggregating	
  over	
  2000	
  responses	
  
 When	
  were	
  you	
  diagnosed	
  with	
  the	
  following	
  types	
  of	
  
 Arthri5s?	
  


                                             Rheumatoid	
   Traumatic	
      Psoriatic	
  
                        Osteoarthritis	
                                                     Other	
  
                                              Arthritis	
    Arthritis	
     Arthritis	
  

Less	
  than	
  a	
  
                               63	
               155	
           19	
           27	
         268	
  
 year	
  ago	
  

More	
  than	
  a	
  
                              190	
                46	
            8	
          213	
         325	
  
 year	
  ago	
  
The	
  Document	
  Structure	
  

•  Each	
  survey	
  response	
  =	
  solr	
  document	
  
•  Up	
  to	
  3000	
  boolean	
  variables	
  per	
  document	
  
   indicating	
  chosen	
  answers	
  
•  Added	
  meta	
  information:	
  age,	
  profession,	
  
   interests	
  
Querying	
  

•  Filter	
  by	
  age,	
  interest,	
  profession	
  
•  Facet	
  across	
  boolean	
  field	
  
•  Result:	
  what	
  group	
  of	
  people	
  chose	
  what	
  
     group	
  of	
  answers	
  
	
  
Why	
  solr	
  is	
  awesome..	
  
•  Faceting	
  across	
  boolean	
  field	
  uses	
  very	
  little	
  
     memory	
  
•  Combining	
  3000	
  fields	
  for	
  2000	
  documents	
  
     takes	
  1	
  ~	
  2	
  ms	
  
•  Allowed	
  us	
  to	
  reduce	
  API	
  response	
  time	
  
     from	
  a	
  variable	
  of	
  2	
  ~	
  15	
  seconds	
  (sucked!)	
  to	
  
     an	
  almost	
  constant	
  ~50	
  ms	
  
	
  
Good	
  to	
  know..	
  
•  sunburnt:	
  Awesome	
  python	
  solr	
  interface	
  
   	
   	
   	
   	
  github.com/tow/sunburnt	
  
•  Programmatic	
  querying	
  as	
  well	
  as	
  raw	
  
   queries	
  
•  Supports	
  most	
  advanced	
  solr	
  options	
  
•  If	
  you	
  only	
  required	
  facets,	
  specify	
  rows=0	
  
Questions?	
  
•  wisertogether.com	
  
•  slideshare.net/tarequeh/the-­‐solr-­‐power	
  
•  @tarequeh	
  
	
  

More Related Content

PPTX
django Forms in a Web API World
PPTX
RESTful APIs: Promises & lies
PDF
API Design & Security in django
PDF
Life in a Queue - Using Message Queue with django
PDF
Introducing KMux - The Kernel Multiplexer
PDF
SIGTRAN - An Introduction
PPT
Django orm-tips
PPT
Linux Composite Communication
django Forms in a Web API World
RESTful APIs: Promises & lies
API Design & Security in django
Life in a Queue - Using Message Queue with django
Introducing KMux - The Kernel Multiplexer
SIGTRAN - An Introduction
Django orm-tips
Linux Composite Communication

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PPT
Geologic Time for studying geology for geologist
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Five Habits of High-Impact Board Members
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Modernising the Digital Integration Hub
PDF
Getting Started with Data Integration: FME Form 101
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Chapter 5: Probability Theory and Statistics
DOCX
search engine optimization ppt fir known well about this
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Architecture types and enterprise applications.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hybrid model detection and classification of lung cancer
Geologic Time for studying geology for geologist
Developing a website for English-speaking practice to English as a foreign la...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Taming the Chaos: How to Turn Unstructured Data into Decisions
Five Habits of High-Impact Board Members
DP Operators-handbook-extract for the Mautical Institute
CloudStack 4.21: First Look Webinar slides
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Modernising the Digital Integration Hub
Getting Started with Data Integration: FME Form 101
Hindi spoken digit analysis for native and non-native speakers
Chapter 5: Probability Theory and Statistics
search engine optimization ppt fir known well about this
Final SEM Unit 1 for mit wpu at pune .pptx
O2C Customer Invoices to Receipt V15A.pptx
WOOl fibre morphology and structure.pdf for textiles
Univ-Connecticut-ChatGPT-Presentaion.pdf
Architecture types and enterprise applications.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Ad
Ad

The solr power

  • 1. The   Power   Tareque  Hossain   Sr.  Software  Engineer    
  • 2. What  about  it?   •  We  always  associate  solr  with  searching   •  solr  can  also  serve  as  your  non-­‐relational   data  layer  
  • 5. Why  solr?   •  Hey  solr  is  already  part  of  my  stack   •  I  love  solr   •  It’s  fast,  scalable  and  there  are  some  great   python              interfaces  out  there  
  • 6. When  would  you  consider  it?   •  You  have  a  DB  that’s  frequently  read  and   infrequently  written   •  You  want  robust  search  &  filtering  on  your   data   •  You  want  to  leverage  the  faceting  feature   •  You  want  a  decently  scalable  data  layer  
  • 7. What’s  not  so  cool?   •  Doesn’t  support  transactions   •  Not  all  SQL  queries  can  be  translated  into   solr  queries   •  Generating  indices  can  take  a  long  time   •  Searching  and  indexing  at  the  same  time   brings  down  performance  
  • 8. But..   •  You  don’t  have  to  give  up  your  relational   data  layer   •  Create  a  non-­‐relational  layer  on  top  of  your   relational  data  layer   •  Get  best  of  the  both  worlds  
  • 9. So  what’s  the  use  case?   •  We  deal  with  medical  survey  data   •  Say:   –  About  300  multiple  choice  questions   –  Responses  can  be  multi-­‐dimensional   –  7000+  different  answer  choices  per  question   –  2000+  respondents  per  survey   –  15+  surveys  and  growing  
  • 10. What  a  survey  question  looks  like   When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Rheumatoid   Traumatic   Psoriatic   Osteoarthritis   Other   Arthritis   Arthritis   Arthritis   Less  than  a   þ   ☐   ☐   ☐   ☐   year  ago   More  than  a   ☐   ☐   þ   ☐   ☐   year  ago  
  • 11. Storing  a  single  response   When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Rheumatoid   Traumatic   Psoriatic   Osteoarthritis   Other   Arthritis   Arthritis   Arthritis   Less  than  a   1   0   0   0   0   year  ago   More  than  a   0   0   1   0   0   year  ago  
  • 12. Aggregating  over  2000  responses   When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Rheumatoid   Traumatic   Psoriatic   Osteoarthritis   Other   Arthritis   Arthritis   Arthritis   Less  than  a   63   155   19   27   268   year  ago   More  than  a   190   46   8   213   325   year  ago  
  • 13. The  Document  Structure   •  Each  survey  response  =  solr  document   •  Up  to  3000  boolean  variables  per  document   indicating  chosen  answers   •  Added  meta  information:  age,  profession,   interests  
  • 14. Querying   •  Filter  by  age,  interest,  profession   •  Facet  across  boolean  field   •  Result:  what  group  of  people  chose  what   group  of  answers    
  • 15. Why  solr  is  awesome..   •  Faceting  across  boolean  field  uses  very  little   memory   •  Combining  3000  fields  for  2000  documents   takes  1  ~  2  ms   •  Allowed  us  to  reduce  API  response  time   from  a  variable  of  2  ~  15  seconds  (sucked!)  to   an  almost  constant  ~50  ms    
  • 16. Good  to  know..   •  sunburnt:  Awesome  python  solr  interface          github.com/tow/sunburnt   •  Programmatic  querying  as  well  as  raw   queries   •  Supports  most  advanced  solr  options   •  If  you  only  required  facets,  specify  rows=0  
  • 17. Questions?   •  wisertogether.com   •  slideshare.net/tarequeh/the-­‐solr-­‐power   •  @tarequeh    

Editor's Notes

  • #2: Good afternoon everyone! Welcome to my lightning talk: The Solr Power. My name is Tareque and I work for a small health industry startup named wisertogether. As you have noticed from this corny title, my talk is about solr.
  • #3: This could be turned into a most interesting man joke.
  • #4: As you might have already guessed I’m talking about using solr as a NoSQL backend. This approach is not novel in anyway. But I wanted to discuss the use case that brought it about. First of all… NoSQL.
  • #5: We got to a point where retrieving data from a SQL layer just wasn’t an option. The arrow came in form of performance hit from querying a complex relational model.
  • #6: Well why not? Now on to more specific reasons for using solr as a NoSQL backend.
  • #7: I emphasize on the word infrequently.
  • #10: So there are a lot of answer options
  • #11: What were you diagnosed with previously and what you got diagnosed with recently.
  • #13: When you start combining all the survey responses, you start getting some really useful information because it exposes common trends, idiosyncrasies etc. We use these numbers to generate pretty graphs
  • #14: Solr stores everything in the form of a document
  • #17: We used sunburnt to interface with solr. If you only need the facets, no reason to retrieve the documents unless necessary and you can save a lot of memory