Dumbledore explains to Harry that the Pensieve allows one to extract thoughts and examine them in the basin, making it easier to spot patterns and links. Dumbledore confirms that the contents of the Pensieve are indeed his thoughts.
4. Solr is Lucenebased
Lucene = text search engine library written in Java
All kinds of crazy goodies:
Ranked search
Multiple indexing
Simultaneous read & write
Daterange search
...the list goes on
Platformindependent (thanks, Java!)
Fast & efficient
Index size ~= 2030% size of indexed data
Very high throughput indexing (95GB/hour)
5. Solr is NoSQL
NoSQL == Nonrelational database
RDBMS metaphor:
One database
One table
Denormalized data
Query parameters instead of SQL
“Documents” instead of rows
Bottom line: it's a persistent datastore, and we use it to store data
persistently.
7. Master
There can be only one
Read & write operations
Must be secure
Younger, stronger brother of production DB
Home base for Solr slaves
8. Slave
There are many copies
They have a plan: replication
Readonly
Gets copy of index from the Solr master every k
minutes
Responds to queries
9. Replication
Slaves –HTTP GET> Master
Replication is differential
Configuration is set in solrconfig.xml
http://tinyurl.com/DESolrRepl
10. Document
RDBMS = row; Solr = document
Denormalized relational data
my friend,
Flatten a bunch of related RDBMS rows into a
single Solr document
11. API
Application programming interface
Primary means of communicating with Solr is an
HTTP API
12. The Good Stuff:
Unix & Diagnostics
“This is the Unix philosophy: Write programs that
do one thing and do it well. Write programs to
work together. Write programs to handle text
streams, because that is a universal interface.”
Doug McIlroy
Examples of things beyond the scope of this talk:
Cat
Awk
Grep
Sed
Cut
Wc
Sort
Tail
Head
Great read: http://matt.might.net/articles/sqlintheshell/
13. The Good Stuff:
Unix & Diagnostics
You cannot effectively troubleshoot without parsing logs
You cannot effectively parse logs without good textparsing tools:
Cat
Awk
Grep
Sed
Cut
Wc
Sort
Tail
Head
No *nix OS? PowerShell!
14. The Good Stuff:
Unix & Diagnostics
Example commands:
tail -f /var/log/celery/project.log
Output the Celery log to stdout, in real time
cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9]
{0,5})'|grep -oE '[0-9]{0,5}'|sort --unique
Parse the Celery log, printing a list of unique BUIDs
cat /ebs2/log/celery/project.log|grep -B 15
"DocumentInvalid"|grep -E 'Download complete for BUID ([0-9]
{1,5})'|awk '{sub(/[/, "");print $1 " " $2 " " $7 ":" $8}'
Parse the Celery log, outputting a list of BUID the feed
file for which failed for some reason:
15. Conclusion
RTFreakingM
http://wiki.apache.org/solr/SolrQuerySyntax
http://wiki.apache.org/solr/SolrCaching
http://wiki.apache.org/solr/SchemaXml
http://djangohaystack.readthedocs.org/en/latest/
Experiment & tinker & reinvent the wheel
Get comfortable with the command line – you can't effectively administer Solr
(or any sufficiently complex system) with a web GUI
Read the logs
Connect Solr behavior to application operations