SlideShare a Scribd company logo
http://www.dkd.de
Freitag, 10. Juni 2011
d dkdevelopment
kommunikation
design
Freitag, 10. Juni 2011
Welcome
Olivier Dobberkau
CEO
dkd Internet Service GmbH
Frankfurt am Main, Germany
Freitag, 10. Juni 2011
Agenda
What is search?
Search in TYPO3
Search expectations today
Apache Solr
Why and how?
Watch out!
Freitag, 10. Juni 2011
Aboutme
Freitag, 10. Juni 2011
OlivierDobberkau
Founder of dkd Internet Service GmbH
aka „the reverend never-end“
Met TYPO3 with Version 3.2 beta 3
Member of T3A BCC
43 years old
olivier.dobberkau@dkd.de
Twitter: @T3RevNeverEnd
Freitag, 10. Juni 2011
WhatisSearch?
Freitag, 10. Juni 2011
DefinitionofInformationRetrieval
Information retrieval (IR) is the area of study
concerned with searching for documents, for
information within documents, and for metadata
about documents, as well as that of searching
relational databases and the World Wide Web.
Wikipedia:
http://en.wikipedia.org/wiki/Information_retrieval
Freitag, 10. Juni 2011
FactorsinInformationRetrieval
Recall
Precision
Fall-out
Scalability
Performance
Freitag, 10. Juni 2011
FactorsinInformationRetrieval
Recall
Precision
Fall-out
Scalability
Performance
Simplicity
Flexibility
Freitag, 10. Juni 2011
Recall
Percent of documents that are returned
400 documents
100 containing information
25% recall
Freitag, 10. Juni 2011
Precision
Percentage of documents that are relevant
500 returned, 100 relevant
20% precision
Freitag, 10. Juni 2011
Best would be:
100% Recall with 100% Precision
Freitag, 10. Juni 2011
Index
The purpose of storing an index is to optimize
speed and performance in finding relevant
documents for a search query.
Freitag, 10. Juni 2011
Index
Index
Document 5
Document 4
Document 3
Document 2
Document 1
Extbase
TYPO3
San
Baseball
My
is
Francisco
is
cat
T3CON
my
is
a
rocks
Fort
cool
Ghetto
Mason
Sport
Freitag, 10. Juni 2011
PostingFile
Word Document
My 1,2
cat 1
is 1,2,5
cool 1
Baseball 2
Sport 2
San 3
Freitag, 10. Juni 2011
SearchinTYPO3
Freitag, 10. Juni 2011
IndexedSearch
Indexed Search since TYPO3 Version 3.5
Frontend Indexing through the Frontend
Searches in Pages and in some Filetypes
Works with Languages and Accessrights
Freitag, 10. Juni 2011
IndexedSearch
Index in Database
Problems with large websites
Slow
no sorting
no Templating
OK for small websites
Freitag, 10. Juni 2011
Search
Expectations
Freitag, 10. Juni 2011
Expectationvs.Experience
Users expect „Google-Like“ interface and
behaviour in search
No one navigates through an online shop
up to 30% of users use the search instead of
going through text or navigation
Search is mediocre on a lot of websites
Slow and incomplete
Lots of improvement possible
Freitag, 10. Juni 2011
ApacheSolr
Enterprise Search Server
Freitag, 10. Juni 2011
ApacheSolr
Apache Software Foundation
Enterprise Search Server
uses the Lucene Index
Lots of great Features
CNet, Netflix, Zappos.com and many more...
Freitag, 10. Juni 2011
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Freitag, 10. Juni 2011
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Freitag, 10. Juni 2011
SolrKey-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Speed
Freitag, 10. Juni 2011
Howdoesitwork?
REST like Interface
Indexing with POST
Search with GET
Results in XML, JSON, PHP and many more
Libraries for many programming languages
SolrPhpClient
Freitag, 10. Juni 2011
Whyandhow?
Freitag, 10. Juni 2011
ScratchingourItch
Why?
Indexed Search was too slow
misses a lot of now a days requirements
Freitag, 10. Juni 2011
History
Prototype im Summer 2008
Kick-off February 2009
„Acts like Indexed Search“
Early Access Program
T3CON September 2009 Version 1.0
Freitag, 10. Juni 2011
Components
Indexing
Search
Flexible Templating
Analysis and Statistics
Administration
Freitag, 10. Juni 2011
Challenges
Page Rendering in TYPO3
Access Rights
File Indexing
Easy Setup for Non Java People
Integrating Solr in general
Freitag, 10. Juni 2011
Solutions
Record Monitor und Indexing Queue
Solr Query Parser Plugin
Integration of Apache Tika
Fully Automated bash Install Script
SolrPhpClient
Freitag, 10. Juni 2011
Features
Facetted Search
File Indexing
Multi-language Support
Did you mean
Freitag, 10. Juni 2011
Features
Search Word Highlighting
Autocomplete / Suggestions
Access Rights Support
More to come
Freitag, 10. Juni 2011
Watchout!
Freitag, 10. Juni 2011
„I do not have any solution. I admire the problem.“
Ashleight Brillant, Cartonist and Author.
Freitag, 10. Juni 2011
CommonProblems
Relanvancy Perception Trap
Assumption: Search should display a certain
result like an Employee Name
Query: Mike Miller
Results: Mill 100% Relanvancy
Miller 75% Relanvancy
Possible Issue: Stemming on proper Names
Solution: Don‘t stemm Fields with Names
Freitag, 10. Juni 2011
CommonProblems
Finding Corpses in your Corpus
While Searching you find „interesting“ Results
You have forgotten to hide content
You have not set the „no search“ Flag
You have made copies of records and
forgotten them
Freitag, 10. Juni 2011
CommonProblems
Data updates without using the TCE Main
You wonder: Why do my new records of table
XY not show up
You have updated the tables with i.e
phpMyAdmin
You might have forgotten to add the Language
id in the records
Freitag, 10. Juni 2011
CommonProblems
Can‘t access the Solr Server
You can not access the Solr Server on another
Machine
Possible Solution
Freitag, 10. Juni 2011
CommonProblems
Help my Index gets deleted
Syntom: Your Index is empty
Possible Cause: Your Solr Server is not secured
Freitag, 10. Juni 2011
CommonProblems
My news are not being indexed
News that you have in a Sysfolder are not
showing up in your Results
The Folder in not in the rootline of the Website
Configure the PID of the Sysfolder correctly
Freitag, 10. Juni 2011
Questions?
Freitag, 10. Juni 2011
d dk
development
kommunikation
design
Thankyou.
Freitag, 10. Juni 2011

More Related Content

PDF
SLIMTRACE - so bezwingt man das Traceability-Monster (Roman Mildner)
PDF
TYPO3 Camp Poznan - Solr Usecases with Hosted Solr
KEY
jQueryMobile mit Extbase/Fluid
PPTX
12-13_presentación-lag
PDF
"Празник на приказката " група "Арлекино"-2011г
PDF
Einstieg in TYPO3 Solr
PPTX
Tema1- Bases de datos
PPTX
Tech toolbox for teachers
SLIMTRACE - so bezwingt man das Traceability-Monster (Roman Mildner)
TYPO3 Camp Poznan - Solr Usecases with Hosted Solr
jQueryMobile mit Extbase/Fluid
12-13_presentación-lag
"Празник на приказката " група "Арлекино"-2011г
Einstieg in TYPO3 Solr
Tema1- Bases de datos
Tech toolbox for teachers

Similar to Searching does not mean finding Stuff - Apache Solr for TYPO3 (20)

PDF
10 Tips For improving Traffic and Conversions on your Drupal Site
PDF
Online journalism: thinking about platforms
PDF
Going Global - Workshop Version - Fall 2011
PDF
Introduction to Confluence Blueprints
PPT
Open Data Driven Scholarly Communication in 2020
PDF
Mobile apps using drupal as base system SumitK DrupalCon Chicago
PDF
Mwrc2011 cookbook design patterns
PDF
Data Journalism 2: Interrogating, Visualising and Mashing
PDF
Opera Mobile HTML5 CSS3 Standards
PDF
Generating Print Sales Leads with LinkedIn session 1
PDF
RIA Unleashed - Developing for the TV with litl os
PDF
Best. Plone. Ever! Presenting Plone 3.
PDF
20100608sigmod
PDF
WebShell - confoo 2011 - sean coates
PPTX
Drupal & Summon: Keeping Article Discovery in the Library
PPT
Apachecon 2011 stanbol_ogrisel
PPTX
IUG 2011 Intelligent Webpac
PPT
Searching for X: Search Interface Usability
PPTX
Most important features when choosing an electronic lab notebook
PPT
Apache Stanbol 
and the Web of Data - ApacheCon 2011
10 Tips For improving Traffic and Conversions on your Drupal Site
Online journalism: thinking about platforms
Going Global - Workshop Version - Fall 2011
Introduction to Confluence Blueprints
Open Data Driven Scholarly Communication in 2020
Mobile apps using drupal as base system SumitK DrupalCon Chicago
Mwrc2011 cookbook design patterns
Data Journalism 2: Interrogating, Visualising and Mashing
Opera Mobile HTML5 CSS3 Standards
Generating Print Sales Leads with LinkedIn session 1
RIA Unleashed - Developing for the TV with litl os
Best. Plone. Ever! Presenting Plone 3.
20100608sigmod
WebShell - confoo 2011 - sean coates
Drupal & Summon: Keeping Article Discovery in the Library
Apachecon 2011 stanbol_ogrisel
IUG 2011 Intelligent Webpac
Searching for X: Search Interface Usability
Most important features when choosing an electronic lab notebook
Apache Stanbol 
and the Web of Data - ApacheCon 2011
Ad

More from Olivier Dobberkau (20)

PDF
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
PDF
Apache Solr for TYPO3: More than a search engine
PDF
TYPO3 v8 LTS in the cloud
PDF
With a little help from my friends (english)
PDF
With a little help from my friends
PDF
TYPO3 & You
PDF
Sonnenschein für ihre Website
PDF
Apache Solr Revisited 2015
PDF
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
PDF
TYPO3 and CMIS
PDF
ForgetIT: Beyond the page: Giving content a meaning and value
PDF
ForgetIT Project TYPO3Camp Milano 2014
PDF
Explain TYPO3 Association March 2014
PDF
Apache Solr for TYPO3 CMS 101
PDF
EXPLAIN #t3a
PDF
Outside the Box - Panel on CMS at TYPO3 Camp Mallorca
PDF
Status & Outlook on EXT:solr for TYPO3 CMS
PDF
The future of CMS @T3UNI 2013 Annecy France
PDF
Digital dark age - Are we doing enough to preserve our website heritage?
PDF
Everything you always wanted to know about search in typo3
Meet TYPO3 Vienna - Solr die Suchmachine für TYPO3
Apache Solr for TYPO3: More than a search engine
TYPO3 v8 LTS in the cloud
With a little help from my friends (english)
With a little help from my friends
TYPO3 & You
Sonnenschein für ihre Website
Apache Solr Revisited 2015
Your Content hides a treasure (and you might have not found it) - ForgetIT Pr...
TYPO3 and CMIS
ForgetIT: Beyond the page: Giving content a meaning and value
ForgetIT Project TYPO3Camp Milano 2014
Explain TYPO3 Association March 2014
Apache Solr for TYPO3 CMS 101
EXPLAIN #t3a
Outside the Box - Panel on CMS at TYPO3 Camp Mallorca
Status & Outlook on EXT:solr for TYPO3 CMS
The future of CMS @T3UNI 2013 Annecy France
Digital dark age - Are we doing enough to preserve our website heritage?
Everything you always wanted to know about search in typo3
Ad

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative analysis of optical character recognition models for extracting...
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Searching does not mean finding Stuff - Apache Solr for TYPO3