SlideShare a Scribd company logo
tl;dr: Solr




     
Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, 
                          pours them into the basin, and examines them at one's leisure. It becomes 
                          easier to spot patterns and links, you understand, when they are in this form."
    Harry:           "You mean... that stuff's your thoughts?"
    Dumbledore: "Certainly."




                                   
Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, 
                          pours them into the basin, and examines them at one's leisure. It becomes 
                          easier to spot patterns and links, you understand, when they are in this form."
    Harry:           "You mean... that stuff's your thoughts?"
    Dumbledore: "Certainly."




                                   
Solr is Lucene­based
    
        Lucene = text search engine library written in Java
    
        All kinds of crazy goodies:
        
          Ranked search
        
          Multiple indexing
        
          Simultaneous read & write
        
          Date­range search
        
          ...the list goes on
    
        Platform­independent (thanks, Java!)
    
        Fast & efficient
          
             Index size ~= 20­30% size of indexed data
          
             Very high throughput indexing (95GB/hour)




                             
Solr is NoSQL
    
        NoSQL == Non­relational database
    
        RDBMS metaphor:
        
          One database
        
          One table
        
          Denormalized data
        
          Query parameters instead of SQL
        
          “Documents” instead of rows
    
        Bottom line: it's a persistent datastore, and we use it to store data 
        persistently.




                              
Vocabulary
    
      Master
    
      Slave
    
      Replication
    
      Document
    
      API




                     
Master
    
      There can be only one
    
      Read & write operations
    
      Must be secure
    
      Younger, stronger brother of production DB
    
      Home base for Solr slaves




                    
Slave
    
      There are many copies
    
      They have a plan: replication
    
      Read­only
    
      Gets copy of index from the Solr master every k 
      minutes
    
      Responds to queries  




                    
Replication
    
      Slaves –­HTTP GET­­> Master
    
      Replication is differential
    
      Configuration is set in solrconfig.xml
    
      http://tinyurl.com/DESolrRepl




                     
Document
    
      RDBMS = row; Solr = document
    
      Denormalized relational data




                        my friend,





    Flatten a bunch of related RDBMS rows into a 
    single Solr document
                   
API
    
      Application programming interface
    
      Primary means of communicating with Solr is an 
      HTTP API




                    
The Good Stuff:
                    Unix & Diagnostics
                       “This  is  the Unix  philosophy:  Write programs  that 
                       do  one  thing  and  do  it  well.  Write  programs  to 
                       work  together.  Write  programs  to  handle  text 
                       streams, because that is a universal interface.” 
                                                               ­ Doug McIlroy


    
        Examples of things beyond the scope of this talk:
        
          Cat
        
          Awk
        
          Grep
        
          Sed
        
          Cut
        
          Wc
        
          Sort
        
          Tail
        
          Head
    
        Great read: http://matt.might.net/articles/sql­in­the­shell/


                                
The Good Stuff:
                      Unix & Diagnostics
    
        You cannot effectively troubleshoot without parsing logs
    
        You cannot effectively parse logs without good text­parsing tools:
        
          Cat
        
          Awk
        
          Grep
        
          Sed
        
          Cut
        
          Wc
        
          Sort
        
          Tail
        
          Head
    
        No *nix OS? PowerShell!




                                
The Good Stuff:
                   Unix & Diagnostics
    
        Example commands:
        
          tail -f /var/log/celery/project.log
          
            Output the Celery log to stdout, in real time
        
          cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9]
          {0,5})'|grep -oE '[0-9]{0,5}'|sort --unique
          
            Parse the Celery log, printing a list of unique BUIDs
        
          cat /ebs2/log/celery/project.log|grep -B 15
          "DocumentInvalid"|grep -E 'Download complete for BUID ([0-9]
          {1,5})'|awk '{sub(/[/, "");print $1 " " $2 " " $7 ":" $8}'
          
            Parse the Celery log, outputting a list of BUID the feed
            file for which failed for some reason:




                            
Conclusion
    
        RTFreakingM
        
           http://wiki.apache.org/solr/SolrQuerySyntax
        
           http://wiki.apache.org/solr/SolrCaching
        
           http://wiki.apache.org/solr/SchemaXml
        
           http://django­haystack.readthedocs.org/en/latest/
    
        Experiment & tinker & reinvent the wheel
    
        Get comfortable with the command line – you can't effectively administer Solr 
         (or any sufficiently complex system) with a web GUI
    
        Read the logs
    
        Connect Solr behavior to application operations




                                
     

More Related Content

ZIP
2010 08-06 - sd ruby - solr
PDF
Practical Cocoapods
ODP
Introduction to JCR and Apache Jackrabbi
KEY
Grand Central Dispatch
PDF
Ruby - a tester's best friend
PDF
AcademicPoster20153S
DOCX
Syarat Sah dan syarat wajib shalat
PPTX
A Black Face in Brazil
2010 08-06 - sd ruby - solr
Practical Cocoapods
Introduction to JCR and Apache Jackrabbi
Grand Central Dispatch
Ruby - a tester's best friend
AcademicPoster20153S
Syarat Sah dan syarat wajib shalat
A Black Face in Brazil

Similar to Tldr solr-courseload (20)

PDF
Ruby on Rails (RoR) as a back-end processor for Apex
PPT
ApacheCon NA 2011 report
PDF
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
ZIP
Solr Powr — Enterprise-grade search for your app
PDF
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
PPTX
ElasticSearch in Production: lessons learned
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
PDF
Architecture by Accident
KEY
NoSQL: Why, When, and How
PPTX
Ruby on rails for beginers
PDF
Play framework
PDF
MongoDB is the MashupDB
PDF
Rapid prototyping with solr - By Erik Hatcher
PDF
Rapid Prototyping with Solr
PDF
Spark Summit EU talk by Shay Nativ and Dvir Volk
PPTX
Exploiting NoSQL Like Never Before
PDF
No sq lv1_0
PDF
Bash shell programming in linux
PDF
Ruby On Rails
PDF
Log analysis with the elk stack
Ruby on Rails (RoR) as a back-end processor for Apex
ApacheCon NA 2011 report
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Solr Powr — Enterprise-grade search for your app
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ElasticSearch in Production: lessons learned
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Architecture by Accident
NoSQL: Why, When, and How
Ruby on rails for beginers
Play framework
MongoDB is the MashupDB
Rapid prototyping with solr - By Erik Hatcher
Rapid Prototyping with Solr
Spark Summit EU talk by Shay Nativ and Dvir Volk
Exploiting NoSQL Like Never Before
No sq lv1_0
Bash shell programming in linux
Ruby On Rails
Log analysis with the elk stack
Ad

Tldr solr-courseload

  • 2. Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind,                        pours them into the basin, and examines them at one's leisure. It becomes                        easier to spot patterns and links, you understand, when they are in this form." Harry:           "You mean... that stuff's your thoughts?" Dumbledore: "Certainly."    
  • 3. Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind,                        pours them into the basin, and examines them at one's leisure. It becomes                        easier to spot patterns and links, you understand, when they are in this form." Harry:           "You mean... that stuff's your thoughts?" Dumbledore: "Certainly."    
  • 4. Solr is Lucene­based  Lucene = text search engine library written in Java  All kinds of crazy goodies:  Ranked search  Multiple indexing  Simultaneous read & write  Date­range search  ...the list goes on  Platform­independent (thanks, Java!)  Fast & efficient  Index size ~= 20­30% size of indexed data  Very high throughput indexing (95GB/hour)    
  • 5. Solr is NoSQL  NoSQL == Non­relational database  RDBMS metaphor:  One database  One table  Denormalized data  Query parameters instead of SQL  “Documents” instead of rows  Bottom line: it's a persistent datastore, and we use it to store data  persistently.    
  • 6. Vocabulary  Master  Slave  Replication  Document  API    
  • 7. Master  There can be only one  Read & write operations  Must be secure  Younger, stronger brother of production DB  Home base for Solr slaves    
  • 8. Slave  There are many copies  They have a plan: replication  Read­only  Gets copy of index from the Solr master every k  minutes  Responds to queries      
  • 9. Replication  Slaves –­HTTP GET­­> Master  Replication is differential  Configuration is set in solrconfig.xml  http://tinyurl.com/DESolrRepl    
  • 10. Document  RDBMS = row; Solr = document  Denormalized relational data my friend,  Flatten a bunch of related RDBMS rows into a  single Solr document    
  • 11. API  Application programming interface  Primary means of communicating with Solr is an  HTTP API    
  • 12. The Good Stuff: Unix & Diagnostics “This  is  the Unix  philosophy:  Write programs  that  do  one  thing  and  do  it  well.  Write  programs  to  work  together.  Write  programs  to  handle  text  streams, because that is a universal interface.”  ­ Doug McIlroy  Examples of things beyond the scope of this talk:  Cat  Awk  Grep  Sed  Cut  Wc  Sort  Tail  Head  Great read: http://matt.might.net/articles/sql­in­the­shell/    
  • 13. The Good Stuff: Unix & Diagnostics  You cannot effectively troubleshoot without parsing logs  You cannot effectively parse logs without good text­parsing tools:  Cat  Awk  Grep  Sed  Cut  Wc  Sort  Tail  Head  No *nix OS? PowerShell!    
  • 14. The Good Stuff: Unix & Diagnostics  Example commands:  tail -f /var/log/celery/project.log  Output the Celery log to stdout, in real time  cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9] {0,5})'|grep -oE '[0-9]{0,5}'|sort --unique  Parse the Celery log, printing a list of unique BUIDs  cat /ebs2/log/celery/project.log|grep -B 15 "DocumentInvalid"|grep -E 'Download complete for BUID ([0-9] {1,5})'|awk '{sub(/[/, "");print $1 " " $2 " " $7 ":" $8}'  Parse the Celery log, outputting a list of BUID the feed file for which failed for some reason:    
  • 15. Conclusion  RTFreakingM  http://wiki.apache.org/solr/SolrQuerySyntax  http://wiki.apache.org/solr/SolrCaching  http://wiki.apache.org/solr/SchemaXml  http://django­haystack.readthedocs.org/en/latest/  Experiment & tinker & reinvent the wheel  Get comfortable with the command line – you can't effectively administer Solr   (or any sufficiently complex system) with a web GUI  Read the logs  Connect Solr behavior to application operations    
  • 16.