Tldr solr-courseload

Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind,
                      pours them into the basin, and examines them at one's leisure. It becomes
                      easier to spot patterns and links, you understand, when they are in this form."
Harry:           "You mean... that stuff's your thoughts?"
Dumbledore: "Certainly."

Solr is Lucenebased

Lucene = text search engine library written in Java

All kinds of crazy goodies:

Ranked search

Multiple indexing

Simultaneous read & write

Daterange search

...the list goes on

Platformindependent (thanks, Java!)

Fast & efficient

Index size ~= 2030% size of indexed data

Very high throughput indexing (95GB/hour)

Solr is NoSQL

NoSQL == Nonrelational database

RDBMS metaphor:

One database

One table

Denormalized data

Query parameters instead of SQL

“Documents” instead of rows

Bottom line: it's a persistent datastore, and we use it to store data
persistently.

Vocabulary

Master

Slave

Replication

Document

API

Master

There can be only one

Read & write operations

Must be secure

Younger, stronger brother of production DB

Home base for Solr slaves

Slave

There are many copies

They have a plan: replication

Readonly

Gets copy of index from the Solr master every k
minutes

Responds to queries

Replication

Slaves –HTTP GET> Master

Replication is differential

Configuration is set in solrconfig.xml

http://tinyurl.com/DESolrRepl

Document

RDBMS = row; Solr = document

Denormalized relational data

my friend,


Flatten a bunch of related RDBMS rows into a
single Solr document

API

Application programming interface

Primary means of communicating with Solr is an
HTTP API

The Good Stuff:
Unix & Diagnostics
“This is the Unix philosophy: Write programs that
do one thing and do it well. Write programs to
work together. Write programs to handle text
streams, because that is a universal interface.”
Doug McIlroy


Examples of things beyond the scope of this talk:

Cat

Awk

Grep

Sed

Cut

Wc

Sort

Tail

Head

Great read: http://matt.might.net/articles/sqlintheshell/

The Good Stuff:
Unix & Diagnostics

You cannot effectively troubleshoot without parsing logs

You cannot effectively parse logs without good textparsing tools:

Cat

Awk

Grep

Sed

Cut

Wc

Sort

Tail

Head

No *nix OS? PowerShell!

The Good Stuff:
Unix & Diagnostics

Example commands:

tail -f /var/log/celery/project.log

Output the Celery log to stdout, in real time

cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9]
{0,5})'|grep -oE '[0-9]{0,5}'|sort --unique

Parse the Celery log, printing a list of unique BUIDs

cat /ebs2/log/celery/project.log|grep -B 15
"DocumentInvalid"|grep -E 'Download complete for BUID ([0-9]
{1,5})'|awk '{sub(/[/, "");print $1 " " $2 " " $7 ":" $8}'

Parse the Celery log, outputting a list of BUID the feed
file for which failed for some reason:

Conclusion

RTFreakingM

http://wiki.apache.org/solr/SolrQuerySyntax

http://wiki.apache.org/solr/SolrCaching

http://wiki.apache.org/solr/SchemaXml

http://djangohaystack.readthedocs.org/en/latest/

Experiment & tinker & reinvent the wheel

Get comfortable with the command line – you can't effectively administer Solr
(or any sufficiently complex system) with a web GUI

Read the logs

Connect Solr behavior to application operations

Tldr solr-courseload

More Related Content

Similar to Tldr solr-courseload (20)

Tldr solr-courseload