SlideShare a Scribd company logo
Leveraging Publisher’s Search
 Engines to Deliver Relevant
      Results to Users
                 Presented by
        Abe Lederman, President and CTO
          Deep Web Technologies, LLC

   28th Annual Scholarly Publishing Meeting – Virginia – June 9, 2006
Abe’s Background
• Earned B.S. and M.S. Computer Science degrees, MIT
• 18 years experience developing sophisticated
  information retrieval applications
• Cofounded Verity, 1988
• Consulted to LANL, 1994-2000
• Deployed first “federated search” portal in the Federal
  government, 1999
• Founded Deep Web Technologies (DWT), 2002
 DWT is a New Mexico based company focused on providing
  state-of-the-art software solutions which search, retrieve,
 aggregate, and analyze content from web-based databases.
The Problem:

   Searching a
 large number of
sources can lead
   to a flood of
      results
Relevance
  ranking
 begins as
soon as the
user clicks
the Search
   button
Ranking Recipe
INGREDIENTS
 Source Selection
 Query Language
 Search Conductor
 Ranking Algorithms

MIX WELL AND SERVE UP
RELEVANT RESULTS
Source Selection Optimizer
                   Search
                  Conductor


               Source Selection
                  Optimizer


  Source                          Previous
Descriptions                      Results
Powerful Query Language
• Takes advantage of search capabilities of
  each source
• Supports full Boolean operators where
  possible
• Supports fielded search
• Translates natural language questions into
  query syntax
Search Conductor
                 Select sources
                   to search


                 Perform Search



                    Enough
Get Next                          YES   Deliver results
                      good
Results             results?                to user

                        NO
                    Can I get
           YES    more results
                  from “good”
                    sources?
                         NO
Challenges in Organizing and
      Ranking Results

       Multi-tier Relevance
             Ranking


       User-driven Ranking



       Clustering of Results
Multi-tier Relevance Ranking
• QuickRank – Ranks results based
  on occurrence of search terms in
  title, author, and snippet

• MetaRank – Ranks results utilizing
  custom algorithms applied to meta-
  data

• DeepRank – Downloads and             HEAVY LIFTING
  indexes full-text documents           REQUIRED!
User-driven Ranking

Credibility of source    Geographic proximity
Date range               Popularity of document
Document length          Reading level
Document type            Relevance

   Desired: Blending (weighing) of above criteria
Clustering
Attributes of Successful
          Federated Search
• Powerful query language that takes
  advantage of publisher search capabilities
• Source selection optimizer will reduce
  unnecessary searches
• Search conductor gets more results from
  sources bringing back good results
• A tool that highlights best search results
• Caching of search results
Advice for Publishers
• Use good search engines with good
  relevance ranking
• Return 100 or more results at a time
• Return meta-data (author, journal, snippet)
  as part of result list
• Provide access to your content through
  XML Gateway or Web Services
• Speed up search time
180 sspcc3 b_lederman
180 sspcc3 b_lederman
180 sspcc3 b_lederman
180 sspcc3 b_lederman
180 sspcc3 b_lederman
180 sspcc3 b_lederman
Thank You!


Abe Lederman
301 N Guadalupe, Ste 201
Santa Fe, NM 87501
abe@deepwebtech.com
www.deepwebtech.com

More Related Content

PPT
Search engine Optimization
PPT
webclustering engine
PPTX
Tumor de wilms
PPTX
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
PDF
Webinar: Modern Techniques for Better Search Relevance with Fusion
PPTX
search engines
PPT
Introduction into Search Engines and Information Retrieval
PDF
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Search engine Optimization
webclustering engine
Tumor de wilms
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Webinar: Modern Techniques for Better Search Relevance with Fusion
search engines
Introduction into Search Engines and Information Retrieval
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape

Similar to 180 sspcc3 b_lederman (20)

PPT
5 Accessing Information Resources
PPTX
How search engines work Anand Saini
PDF
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
KEY
02 Web Search
PDF
Better Search Engine Testing - Eric Pugh
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
PDF
Modern Search: Using ML & NLP advances to enhance search and discovery
PPT
How search engines work
PDF
Information Discovery and Search Strategies for Evidence-Based Research
PDF
What IA, UX and SEO Can Learn from Each Other
PDF
Information Retrieval (for beginners)
PDF
Everything You Wish You Knew About Search
PPTX
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
PDF
Sweeny ux-seo om-cap 2014_v3
PPTX
Relevancy and Search Quality Analysis - Search Technologies
PDF
Smx toronto adv-kw-research-final
PDF
Advanced Keyword Research SMX Toronto March 2013
PDF
SEARCH ENGINE THROUGH GOOGLE API
PDF
Search engines .pdf
PPTX
How to SEO a Terrific - and Profitable - User Experience
5 Accessing Information Resources
How search engines work Anand Saini
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
02 Web Search
Better Search Engine Testing - Eric Pugh
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Modern Search: Using ML & NLP advances to enhance search and discovery
How search engines work
Information Discovery and Search Strategies for Evidence-Based Research
What IA, UX and SEO Can Learn from Each Other
Information Retrieval (for beginners)
Everything You Wish You Knew About Search
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
Sweeny ux-seo om-cap 2014_v3
Relevancy and Search Quality Analysis - Search Technologies
Smx toronto adv-kw-research-final
Advanced Keyword Research SMX Toronto March 2013
SEARCH ENGINE THROUGH GOOGLE API
Search engines .pdf
How to SEO a Terrific - and Profitable - User Experience
Ad

More from Society for Scholarly Publishing (20)

PPTX
10052016 ssp seminar2_newsham
PPTX
10052016 ssp seminar2_rivera
PPTX
10052016 ssp seminar2_pesanelli
PDF
10052016 ssp seminar2_harley
PPTX
10042016 ssp seminar1_session4_myers
PPTX
10042016 ssp seminar1_session4_demers
PPTX
10042016 ssp seminar1_session4_cochran
PPTX
10042016 ssp seminar1_session3_stanley
PPTX
10042016 ssp seminar1_session3_ranganathan
PPTX
10042016 ssp seminar1_session3_odike
PPTX
10042016 ssp seminar1_session3_cochran
PPTX
10042016 ssp seminar1_session2_walker
PPTX
10042016 ssp seminar1_session2_ivins
PPTX
10042016 ssp seminar1_session2_holland
PPTX
10042016 ssp seminar1_session1_stanley
PPTX
10042016 ssp seminar1_session1_keane
PPTX
10042016 ssp seminar1_session1_ivins
PPTX
10042016 ssp seminar1_session1_asadilari
PDF
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
PPTX
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
10052016 ssp seminar2_newsham
10052016 ssp seminar2_rivera
10052016 ssp seminar2_pesanelli
10052016 ssp seminar2_harley
10042016 ssp seminar1_session4_myers
10042016 ssp seminar1_session4_demers
10042016 ssp seminar1_session4_cochran
10042016 ssp seminar1_session3_stanley
10042016 ssp seminar1_session3_ranganathan
10042016 ssp seminar1_session3_odike
10042016 ssp seminar1_session3_cochran
10042016 ssp seminar1_session2_walker
10042016 ssp seminar1_session2_ivins
10042016 ssp seminar1_session2_holland
10042016 ssp seminar1_session1_stanley
10042016 ssp seminar1_session1_keane
10042016 ssp seminar1_session1_ivins
10042016 ssp seminar1_session1_asadilari
04142015 ssp webinar_theworldisflatforscholarlypublishing_caitlinmeadows
04142015 ssp webinar_theworldisflatforscholarlypublishing_bruceheterick
Ad

180 sspcc3 b_lederman

  • 1. Leveraging Publisher’s Search Engines to Deliver Relevant Results to Users Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC 28th Annual Scholarly Publishing Meeting – Virginia – June 9, 2006
  • 2. Abe’s Background • Earned B.S. and M.S. Computer Science degrees, MIT • 18 years experience developing sophisticated information retrieval applications • Cofounded Verity, 1988 • Consulted to LANL, 1994-2000 • Deployed first “federated search” portal in the Federal government, 1999 • Founded Deep Web Technologies (DWT), 2002 DWT is a New Mexico based company focused on providing state-of-the-art software solutions which search, retrieve, aggregate, and analyze content from web-based databases.
  • 3. The Problem: Searching a large number of sources can lead to a flood of results
  • 4. Relevance ranking begins as soon as the user clicks the Search button
  • 5. Ranking Recipe INGREDIENTS Source Selection Query Language Search Conductor Ranking Algorithms MIX WELL AND SERVE UP RELEVANT RESULTS
  • 6. Source Selection Optimizer Search Conductor Source Selection Optimizer Source Previous Descriptions Results
  • 7. Powerful Query Language • Takes advantage of search capabilities of each source • Supports full Boolean operators where possible • Supports fielded search • Translates natural language questions into query syntax
  • 8. Search Conductor Select sources to search Perform Search Enough Get Next YES Deliver results good Results results? to user NO Can I get YES more results from “good” sources? NO
  • 9. Challenges in Organizing and Ranking Results Multi-tier Relevance Ranking User-driven Ranking Clustering of Results
  • 10. Multi-tier Relevance Ranking • QuickRank – Ranks results based on occurrence of search terms in title, author, and snippet • MetaRank – Ranks results utilizing custom algorithms applied to meta- data • DeepRank – Downloads and HEAVY LIFTING indexes full-text documents REQUIRED!
  • 11. User-driven Ranking Credibility of source Geographic proximity Date range Popularity of document Document length Reading level Document type Relevance Desired: Blending (weighing) of above criteria
  • 13. Attributes of Successful Federated Search • Powerful query language that takes advantage of publisher search capabilities • Source selection optimizer will reduce unnecessary searches • Search conductor gets more results from sources bringing back good results • A tool that highlights best search results • Caching of search results
  • 14. Advice for Publishers • Use good search engines with good relevance ranking • Return 100 or more results at a time • Return meta-data (author, journal, snippet) as part of result list • Provide access to your content through XML Gateway or Web Services • Speed up search time
  • 21. Thank You! Abe Lederman 301 N Guadalupe, Ste 201 Santa Fe, NM 87501 abe@deepwebtech.com www.deepwebtech.com