SlideShare a Scribd company logo
Microsoft Office SharePoint Server 2007 Search Workshop 游家德  Jade Yu 敦群數位科技股份有限公司
Microsoft Office SharePoint Server 2007 Enterprise Search Enterprise Search Advanced Training – Building and Implementing Enterprise  Search Solutions
Workshop Agenda Day 1 –  Search Overview Microsoft Search Landscape MOSS 2007 Walkthrough Architecture and Deployment Scenarios Crawl and Query Processes Search Object Model Day 2 – Customization and Management Search Object Model Business Data Catalog (BDC) Search Extensibility and Integration Administration Capacity Planning
Assumptions Some knowledge and experience with Search functionality Knowledge of the Business Data Catalog in general  (new in Office 2007 System) Office 2007 System Content Creation/Contribution experience Knowledge of Web site creation and management in general Knowledge of MS platform (Windows 2003 Server, ADS, IIS, SQL 2005 & Office Clients) Knowledge of ASP.NET 2.0 and XSLT
Workshop Objectives Explain how to use the Office 2007 Search functionality Interpret the Office 2007 System Search Terminology Describe the rich feature set of Office 2007 System Search - Servers and Clients Describe how to use the platform well enough to use its APIs to extend the products Explain how Office 2007 System Search will solve enterprise business requirements
Module 1 Enterprise Search Overview
Module Agenda Microsoft Enterprise Search Client-side Search Platform Client-side Comparison Server-side Search Platform Key Differences between WSS and MOSS MOSS 2007 for Search Key Features MOSS 2007 for Search and MOSS 2007 Comparison
Microsoft Enterprise Search Server-Side Search Platform Line-of-business systems and structured data sources Unstructured information  People, expertise External Web sites  E-mail messages, appointments,  and  instant messaging Client-Side Search Platform Documents, programs,  and media
Client-Side Search Platform Windows Desktop Search (WDS) for XP and Windows Server You must install an additional program for Search Vista – Integrated Desktop Search Integration in the Operating System Ability to search nearly anywhere Virtual Folders
Client-Side Comparison Microsoft ®  Windows ®  Desktop Search Microsoft ®  Windows ®  Vista Rich, actionable interface X X Integration with Microsoft Outlook X X Polite indexing  (Pauses when computer is in use) X X Live icons & document previews X X Advanced Search integrated into the Operating System X Save searches to search folders X Instant Search X  (on taskbar) X (from start menu)
Server-Side Search Platforms Windows SharePoint Services v3 “ Basic” index / search capabilities to support WSS collaboration and document management  Microsoft Office SharePoint Server (MOSS) 2007 Enterprise search and indexing features “unlocked” Several SKUs to support different scenarios and customer needs
Key Differences Between WSS and MOSS WSS v3 Microsoft Office SharePoint Server (MOSS) Can Index Local SharePoint content XSharePoint sites / collections, Exchange Public Folders, File Shares, Web Content, Lotus Notes, LOB Apps, and others . . . Rich, relevant results X Alerts, RSS, Did you mean, Duplicate collapsing X Scopes, Managed Properties X Best Bets, Result Removal, Query Reports X Search Center Tabs X BDC Search X API’s provided   Query Query + Admin
MOSS 2007 for Search A Search-only solution for intranets and public-facing Web (Internet) sites Two versions Standard Edition limited to 500,000 docs Enterprise Edition with unlimited docs Includes Out of the box search for file shares, Web sites, SharePoint sites, Exchange Public Folders, Lotus Notes databases Extensibility to 3rd party document repositories and file types
MOSS 2007 and MOSS FS Usage Scenarios Description Scenario MOSS 2007 An information management solution that includes enterprise search  integrated with portal, collaboration, web content management, ECM, forms, and BI functionalities Customers who desire search as an integrated part of a broader information management solution MOSS FS A core search-only solution for intranet and public-facing web sites Customers who require a core search-only product that can be integrated into their existing infrastructure Customers who require search functionality for their public-facing web (Internet) sites
MOSS 2007 for Search and MOSS 2007 Features Comparison Features MOSS 2007  for Search  (Standard Edition) MOSS 2007  for Search  (Enterprise Edition) MOSS 2007 (Standard CAL) MOSS 2007 (Standard plus Enterprise CAL) File shares X X X X Web sites X X X X SharePoint sites  X X X X Microsoft Exchange Server public folders X X X X Lotus Notes databases  X X X X Third party document repositories  1   X X X X Secure content access control  X X X X Enhanced Search Center user interface  X X Search for people and expertise X X Business Data Catalog (BDC)  X Search structured data sources  X Document limit 500,000 No Limit 2 No Limit 2 No Limit 2
Questions?
Module 2 Microsoft Office SharePoint Search 2007 –  Walkthrough
Module Agenda End-User Improvements Relevance People and Expertise Business Data Search Administration Improvements Design Goals Indexing Management Security  Customization Query Reporting Performance Improvements Demo MOSS 2007
End-User Improvements Relevance Dramatically improved relevance is the top goal of this release New ingredients added including: Anchor text Click distance URL depth  Missing metadata creation Result is noticeably more relevant search 100% better on all queries 500% better on common queries
End-User Improvements  People and Expertise Bring people into the Search experience Getting your job done means working with the right people Find subject-matter experts based on their knowledge and contacts Numerous improvements over SPS 2003 Index any LDAP V3 directory Dedicated tab for finding people Results grouped by “social distance” to you
End-User Improvements  Business Data Search Information in Line of Business (LOB) systems is often hard to access MOSS 2007 can bring that data to your users Data is accessed through the  Business Data Catalog Exposed to many features in SharePoint Search can easily index the data No need to write code Highly customizable results Integrated with scopes and Search center
Address SPS 2003 administration user interface pain points Unify WSS and MOSS search Enable full programmability via the object model Even better scalability and performance Administration Improvements Design Goals
Streamlined experience and more control One index per shared service; no need to worry about managing discrete indexes Multiple start addresses per content source MOSS indexes can drive the WSS search experience Allow upgrade from WSS to MOSS Administration Improvements   Indexing Management
Administration Improvements   Security Query-time security trimming in SPS 2003 File shares, WSS/SPS 2003, Exchange, Lotus Notes (via mapping) Now supports pluggable authentication for content in WSS/MOSS sites Based on ASP.NET 2.0 model Minimum required crawler permission is now just Full Read, not Administrator Still provides the same security trimming functionality Ability to remove single items
Administration Improvements   Customization Search in  every  company is different Different metadata might matter: Documents: Title, Author, File location, Size Records:  Patient, Doctor, Healthcare provider, SSN… How users meaningfully scope searches differs: “ All finance documents” “ All patient records” “ All published documents” Customize results to “pop” metadata that matters Customization offered at many levels Web Parts, XSLT/CSS, full object model…
Administration Improvements   Query Reporting Best way to improve Search is to understand current usage New out-of-box usage reporting: Query volume trends, top queries,  click-through rates, queries with zero results, etc.  At both site and service provider levels Export data for extended reporting in Excel Respond to feedback with configuration changes or editorial results
Performance Improvements Key new features make the crawls faster so the content is fresher More efficient SharePoint crawling (Change Log Crawl) Continuous propagation Unified WSS and MOSS search Security Change Only Crawl Maximum scale is  10s of millions of documents per indexer
Demo – MOSS 2007 G oal of demo is a high level overview with focus on: Search boxes and advanced search Search results experience Search Center Admin experience
Questions?
Module 3  Architecture and Deployment Scenarios
Agenda Key concepts  MS Search Architecture Deployment Building Blocks WSS v3 Search Topologies MOSS 2007 Search Topologies  Search Topology scenarios  Small  Medium  Large  Geographically distributed Solution scenarios  Collaboration sites  Enterprise portal  Internet facing portal
Microsoft Search Architecture Notes Query Engine Index Engine Protocol Handlers iFilters Content Index OOB Search UI/Custom Search Apps Query OM and Web Service Information … Exchange Folders Network Shares External Web Sites SharePoint Sites Business Data Stemmers WordBreakers Results Query Content Sources Crawl Log Scopes Schema Best Bets Keywords Ranking Search Configuration Data
SharePoint Search Topologies: Deployment Building Blocks Physical building blocks:  Web Front-End Servers Application servers (Query, Index, Excel Services, etc.) SQL Databases  Search functionality segmented into two roles:  Indexer  Query MOSS 2007 specific Shared Service Provider (SSP) Indexer Web Application(s) Site Collection(s) Content Database(s) Virtual Server(s) (IIS)
WSS v3 Search Topology Basics WSS uses both server roles on the same machine (“Search Server”) Indexing Query  Ability to index local content only Site Collection (content database(s)) Content is automatically indexed minimal search administration  Ability to query at a site and below it stsadm command exposes some admin operations Can Crawl Multiple content databases
Sample  WSS  v3  Topology
WSS v3 - Topology Considerations Scale out just like WSS Add content databases for content Add search servers for search Each search server can serve up to 100 content databases Could be lower depending on the data in the content database
Adds new functionality over base WSS Search Application server roles can be separated: Indexer Query server Propagation from indexer to query servers Crawl local + external content Enhanced administration experience Ability to search across site collections MOSS 2007 Search Topology Basics
MOSS 2007 Search Topology Basics (cont) Query role can be assigned to one or more servers Indexing role can only be assigned to a single server Multiple query servers not allowed IF server is providing both indexing and query services Only one index per SSP . . . although you can have multiple SSPs
Sample  MOSS  2007  Topology Query servers separated from indexer Indexer crawling local + external content
MOSS 2007 – Search Topology Considerations Indexing operations are CPU intensive Dedicated query servers *might* be better in a query heavy environment MOSS / WSS crawls do involve making HTTP requests against the WFE(s) Dual role, WFE / Query servers more efficient with security trimming All servers should be on same network segment
MOSS 2007 – Search Topology Considerations (cont) Each farm can index up to 50 million items Beyond this, add more farms Hardware is important
Shared Search Service Shared Service Provider (SSP) – grouped high-value, resource intensive services Shared services are consumed by web applications (and sites within them) “ Always on” shared services – all sites in a web application use the same index Resource intensive operations controlled centrally Some admin experience is manageable at site level
Search Shared Service Search service People service … Shared Service  Provider (SSP) http://sales http://finance http://hr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers   Content  Databases External content
Search Shared Service Search service People service … Shared Service  Provider http://sales http://finance http://hr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers  Content Indexed Content  Databases External  content
Common Search Topologies Deployment scenarios  Small  Medium  Large  Geographically Distributed (MOSS only)
Small Search Deployment WSS Single Search Server with both roles Index Single Site Collection only! Single Set of Content Databases Query MOSS Single Server Dual Role Index SSP Based – Multiple Site Collections Multiple Set of Content Databases Query MOSS for Search Single Server / Dual Role (Index and Query)
Medium Search Deployment WSS Multiple Search Servers with the following limitations Single Index Server Single Site Collection Single Set of Content Databases Multiple Query Servers MOSS Three Servers One Index Server Two Query Servers running on two Web Front-End servers MOSS for Search Three Servers One Index Server Two Query Servers
Large Search Deployment WSS Multiple Search Servers with the following limitations Multiple Index Servers (64-bit) Each Indexing a Single Site Collection with their own Set of Content Databases Index Servers are not redundant from one another. Multiple Query Servers each associated with their own single Index Server running on the same machine (64-bit) Query servers are not redundant from one another MOSS One Index Server (64-bit) Many Separate Query servers (64-bit) MOSS for Search One Index Server (64-bit) Many Separate Query servers (64-bit)
Geographically Distributed Sites MOSS Search Deployment Other Locations Corp. Sites Search service  People service  --- Shared Service  Provider (SSP) Index Corp, EMEA, APAC and other locations http://sales http://finance http://hr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers  External content Search service  People service  --- Shared Service  Provider (SSP) Index APAC only http://apacsales http://apacfinance http://apachr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers  External content Search service  People service  --- Shared Service  Provider (SSP) Index EMEA only http://emeasales http://emeafinance http://emeahr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers  External content
Deployment Scenarios Collaboration Environment (WSS v3) Enterprise Portal (MOSS 2007) Internet Facing Portal (MOSS 2007)
Collaboration Environment Scenario  WSS v3 iTech – startup software consulting firm Large number of disjoint teams working on projects of varying durations Team sites used for collaboration and communication No organizational needs across sites
Collaboration Environment Scenario  WSS v3 (cont) WSS farm with single IIS virtual server http://team  Scales to large number of team sites  Content indexed automatically  WSS v3 standalone topology  1 Search box (both roles)
Collaboration Environment Scenario  WSS v3 (cont) Search – core feature of WSS Contextual scopes – site and list No search across sites http://team team1 team2 spweb spweb Virtual Server team3 spweb spweb SPSites Content  Databases
Enterprise Portal Scenario MOSS 2007 iTech – growing company with growing needs iTech – needs a single point for information access for employees They now need to search over other repositories: Personnel records – People search Seibel sources – BDC search File Shares / Web sites – other external data
Enterprise Portal Scenario MOSS 2007 (cont) Upgrade from WSS    MOSS Search is a shared service through the SSP Central enterprise portal – http://itech Existing virtual server http://team associated with SSP – search box switches to use MOSS Base WSS search is not running – but search available to sites through shared search service Indexes – local and external content
Enterprise Portal Scenario MOSS 2007 (cont) Farm http://team team1 team2 spweb spweb Virtual Server team3 spweb spweb SPSites Content  Databases Search service People service … Shared Service  Provider External content http://itech HR Sales spweb spweb Virtual Server Finance spweb spweb SPSites Content  Databases
Enterprise Portal Scenario MOSS 2007 (cont) Topology with indexer and query servers Load balanced query servers Scale out and scale up – new SSP dimension Query Servers added for throughput Single indexer crawls logical SSP = local + external content
Internet Facing Portal Scenario - MOSS 2007 Internet facing site for customers – www.itech.com High traffic focused on content presentation Public access More publishing and less collaboration Controlled and tightly managed content
Internet Facing Portal Scenario - MOSS 2007 (cont) Two separate farms:  Production and test farms MOSS installation Controlled publishing of content to production farm from test farm Single shared service provider per farm Shared search service in each farm crawls content in each farm independently
Internet Facing Portal Scenario - MOSS 2007 (cont) www.itech.com Services Customers spweb spweb Virtual Server About  itech spweb spweb Content  Databases SPSites Search service People service --- SSP Production  farm http://itechtest Services Customers spweb spweb Virtual Server About  itech spweb spweb Content  Databases SPSites Search service People service --- SSP Test  Farm
Questions?
Module 4   Crawl and Query Processes
Agenda The Crawl Process Crawl Walkthrough Index Propagation The Query Process
Crawl Walkthrough When a crawl is requested . . . Indexer grabs the start address of content source Start address is prefixed with protocol associated with accessing the content Appropriate protocol handler invoked to traverse the content source During traversal, the handler will identify content nodes it needs to index
Crawl Walkthrough (cont) Protocol handler invokes IFilter associated with content node type IFilter identifies and extracts properties from content node Protocol handler supplements IFilter data with additional property information Data associated with content node is added to index Index “delta” propagates to search servers
Crawl Overview Diagram
Index Propagation Farm Sample Indexer Load Balancer Crawling Web  front  ends Index Propagation Query Servers User Requests
Propagation will occur only when the index and search components are on separate servers Continuous propagation Changes sent incrementally to all query servers associated with the index server. Merging of the index occurs on the query servers after propagation. Query servers continue serving queries while propagation is in progress Index Propagation
Index Propagation Index File Location Set in Office SharePoint Server Search Service settings Default location:  C: \ Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications Can be programmatically set using the stsadm command Index Server: “ stsadm.exe -o editssp –indexlocation  index file path” Query Server “ stsadm.exe –o osearch –propagationlocation  index file path”
The Query Process Query Initiation and Results Presentation Query Execution Query Walkthrough
Query Initiation and Results Presentation Typically, provided by the WSS / MOSS WFE role, through OOB WebParts Could be an Office client or other custom application Responsible for constructing the “full” query and communicating with the query execution services
Query Execution Always provided by a server tagged with the Query role Consumes a query request Executes the request using the query index on the file system as well as the SSP search database (if MOSS) Handles OOB security trimming Returns requested properties of the result set to the caller
Query Walkthrough (cont) When a query is requested . . . Query terms collected Terms supplemented with contextual information Query formulated and issued through the Query OM or the Web Service Query is executed against the index and property store Query results returned Results are ordered according to their relevance to the query words Trimmed based on the user’s permissions.
Questions?
Module 5 The Search End-User Experience
Module Agenda Introducing the Search End-User Experience Customizing Search People Search
Introducing the Search End-User Experience Complete Search experience Search is everywhere Tab-based user interface for easy navigation Easy to extend and customize
Introducing the End-User Search Experience Search Boxes Search Center Search Web Parts
Query Results Http: Get Http: Post Search Box XML Web Parts XSL Transformation Query OM  Advanced Search Hidden Object XML XML OOB Search UI/Custom Search Apps Query OM and Web Service
Search WebParts Nine  Standard Search Web Parts  Search Box Core Results High Confidence Statistics Pagination Action Links Matching Keywords and Best Bets Search Summary  (Did you mean?) Advanced Search
Result page infrastructure  Data shared through hidden object All Search Web Parts within the same page share the same hidden object Connection between Search Web Part is automatically done Need only to Drag and Drop (or select) a Search Web Part on the page Allows for rapid page design Hidden Object is internal and cannot be used by custom Web Parts All Search Web Parts derive from Data Form Web Part
Advanced Search  Allows power searchers to exercise greater control on how they query A link from the search box Control what is displayed in the page by modifying the xml stored in the web part property “Properties” i.e., can be used for displaying a new  language check box Not provided by WSS Search UI Implemented using the SQL syntax
Customizing the End User Experience Search in every   company is different Different metadata might matter Documents: Title, Author, File location, size Records:  Patient, Doctor, Healthcare provider, SSN… Multi- or single-languages How users meaningfully scope searches differs “ All finance documents” “ All patient records” “ All published documents” Customize results to “pop” metadata that matters Customization offered at many levels Web Parts, XSLT/CSS, full Object Model…
Customization Choices Search Center Simple Site with few pages Default Page Result Page Advanced Search Page People Search Page Results Pages All Sites Results Page People Results Page Advanced Search Page and Web Part Show Scope Picker Scopes Property Picker Languages Search Web Parts
Customizing Search Adding Search Center Tabs Customizing Search Web Parts Customizing Search Results
People Search Bring people into the search experience Getting your job done means working with the right people Find subject matter experts based on their knowledge and contacts People list can come from AD, SQL, others Discovering Experts People are as important as data!
People Search People Results Customizing Results
Refine Your People Search  Refine by Job Title Searches for the selected Job Title Refine by Department  Searches for the selected Department “ Show more options” link (6+)  Listed in order of frequency
People Search Web Parts  Two OOB People Search Web Parts  People Search Box People Search Core Results Inherit from the Search Core Results Web Part Can be mixed on the same page with other Search Web Parts
People Results Search Web Parts Web Part properties such as: (similar to Core Search WP) Formatting (i.e. width of the search box) Number of Results per page Display “Alert Me”, “RSS” links Turn stemming on/off  (default “off”) Remove Duplicate Results on/off (default “on”) Fixed keyword Query Select Columns Results formatting with XSL Social Distance (view)
Social Distance Colleagues  Suggested Colleague list members are mined from: Microsoft Windows Messenger (IM) Microsoft Office Outlook e-mail (Outlook Add-In)
Questions?
Module 6 Search Object Model
Workshop Agenda Scenarios for Extending Search Query Syntax Query Object Model Query Web Service
Topic:  Scenarios for Extending Search In this first section we will examine 2 scenarios for extending Search: Integrate with Search Center  Integrate Search into 3rd party sites and applications
Integrate with MOSS Search Center Use cases: Use Search URL request parameters to add predefined saved searches Build custom search box Web parts for custom look and feel Build custom search core result Web parts for own look and feel and customized querying  Extending Search
Integrate MOSS Search into 3rd Party Sites and Applications Build 3rd party user interface which leverages MOSS Search through Web Services Use cases Add MOSS Search features into existing Web sites Add MOSS Search into existing line of business or custom applications Extending Search
Topic: Query Syntax In this section we will examine the three types of search syntax for building search queries supported by MOSS: Keyword URL SQL
Keyword Syntax Used in standard Search Box New keyword syntax Simple and easy to use Consistent property:value syntax across Office, Windows and Live search  Overview gallery hinges –brass site:http//supportdesk scope:Products
Build-in support for using include and exclude terms Look for term bike, but not related to fitness Look for phrase “SharePoint Services” but not the term v2 Include is implied when is no (+/-) prefix Keyword Syntax Include/Exclude bike -fitness +”SharePoint Services”-v2
Narrowing results by default Searches using “AND” between query terms Does not recognize logical operators like “OR”, “NEAR” as keywords – it treats them all as search terms Does not support complex queries like  (A AND B) OR (C AND D) Complex Boolean searches are supported by the engine and the SQL syntax Keyword Syntax Boolean Search
Keyword Syntax Property restrictions Supports property:value as part of the keyword string Can use any managed property Supports the use of phrases Can be used for exact matches when the property value includes spaces Without quotes then prefix matching is done.  Supports word stemming
No wildcard support in Keyword Syntax Search box does not do wildcard searching. The following is not recognized as a wildcard search   Use Advanced Search property restrictions to look for parts of a word Requires new search results Web parts Wildcards are supported by the engine and the SQL query syntax Keyword Syntax No wildcard support ShareP*
URL Syntax Use Case Launching a URL in custom application Save Searches Custom search boxes Request Parameters Content: results.aspx?k=fish Scopes: results.aspx?k=fish&s=BBC Sort:  results.aspx?v=date  results.aspx?v=relevance Page: results.aspx?start=21
SQL Syntax Overview SQL Syntax offers: Consistent SQL across enterprise and desktop Complex queries and Boolean searches Comparison operators Arbitrary groupings for AND, OR, NOT Freetext() CONTAINS() LIKE ORDER BY ASC | DESC Custom SQL query statements Wildcard support
Write complex Boolean searches using AND, OR, NOT SQL Syntax Complex Boolean Searches
Returns documents for which the following is true: Document contains all the search terms in at least one of the columns specified One of the search terms must also be found in the Contents column Use only one FREETEXT predicate for most optimal ranking The FREETEXT predicate also supports (+/-) SQL Syntax FREETEXT predicate
Get wildcard support using the CONTAINS predicate: Wildcard: Words or phrases with an asterisk (*) added to the end.  WHERE CONTAINS ('  "compu*" NEAR "soft*" ') SQL Syntax Wildcard Support
Removed in MOSS 2007  Query property weights UNION ALL  MATCHES SELECT * COALESCE TABLE   SQL Syntax Removed from SQL syntax
Topic:  Query Object Model In this section we will examine: The Query Object Model The Query Object Path The Query Web Service
Query Object Model New object model Use the query object model to: Build custom search user interface, like Web parts or ASPX applications Gain direct access to query and results properties Invoke custom queries 2 types of query syntaxes: Keyword SQL
Query Object Model Features Managed code API Single request – multiple results Result Types Relevant results High confidence results Special terms Definitions Optional parameters # of Sentences in Summary Implicit - AND/OR Number of results Ignore noise words Enable stemming Language
Query Object Path Query OM Input Output SQL Query Optional Parameters Query Engine ResultTableCollection ResultTable: IDataReader Relevant results High confidence Special terms Definitions Site UI Custom Client Local Remote Keyword Query Execute()
Query Web Service Use and Methods Use Case Leverage Search in remote sites or application  Office Research Pane Methods Query  QueryEx  GetSearchMetaData  Registration Status
Query Web Service Search Center Features Standard Search Center features not built into the Web service Hit highlighting Search usage reporting Search logging Search statistics Result type icons Using Query vs. QueryEx  Implementing hit highlighting
Questions?
Module 7 Administration
Module Agenda Administrative Architecture Farm Administration SSP Administration Site Collection Administration Site Administration Search Usage Reporting Administrative Tools Lab: Adding Content Sources Lab: Search Schema
Administrative Architecture Shared Services Business unit IT Service-level  configuration E.g. Create search content source, Search Scopes Central Administration IT Administrators Farm-level  Status Resource management One per farm E.g. Create new site Site Settings Business site owner Site specific  configuration and tasks e.g. Create new list Three Tier Administration Web-based Role- and Task-delineated Controlled Delegation Secure Isolation
Farm Management (IT Administrators)
SharePoint 3.0 Central Administration Common Tasks Manage Topology and Services Servers in Farm Services in Server Security Configuration Update Farm Administrator’s Group Backup and Restore Index Search Database Global Configuration Timer Job Definitions Timer Job Status Manage Search Service
Using Central Admin
Operations – Topology and Services Servers in Farm / Services on Server Query Server(s) Office SharePoint Server Search Service Stop / Start Office SharePoint Services Help Search Service Stop / Start Index Server(s) Office SharePoint Server Search Service Stop / Start
Operations – Backup and Restore Perform a backup Restore from backup
Operations – Global Configuration Timer Job Definitions SharePoint Services Search Refresh Disable / Enable  (Change and update WSS search configuration) Indexing Schedule Manager on MOSS Disable / Enable Timer Job Status Succeeded / Failed
Search Application Management Manage Search Service Farm-level Search settings Proxy Server settings Query and Index Servers Server Listing and their Search service Shared Service Providers with Search enabled SSP name listing Crawler Impact Rules
Crawler Impact Rules Configured through Central Administration Allows “throttling” of the indexer to reduce impact of a crawl on a particular server Supports wildcards Used in conjunction with crawl schedules
Crawler Impact Rules (cont) Use . . . To . . . * as the site name Apply the rule to all sites *.* as the site name Apply the rule to sites with a dot in their name *. site_name .com as the site name Apply the rule to all sites in the  site_name .com domain *.top-level_domain_name (such as *.com or *.net) as the site name Apply the rule to all sites that end with a specific top-level domain name ? Replace any single character in a rule
Shared Services Provider (SSP) Management (SSP Administrators) (Content Oriented Administration)
Common Tasks Configure Search Settings  Content Sources Crawl Settings Authoritative Pages Settings Scopes
Content Sources Represent an arbitrary container of information Require at least one start address, although multiple start addresses can be provided  Start address cannot be reused Requires a registered protocol handler Five out-of-box content source types are available, mapping to the five out-of-box protocol handlers
SharePoint Content Source Includes both SPS 2003, MOSS 2007, WSS v2, and WSS v3 sites Can limit crawl to only sites specified in start address or all sites found below one or more provided hostnames Crawler will use target site’s APIs to include security information around content in the index For SPS 2003 content sources, crawler account requires “change” rights, which necessitates the crawler having administrator rights Examples:  sps3://moss-01/ or  http://moss-01/sitecollection/   Content sources decoupled from scopes
Web Site Content Source Any content source available over HTTP or HTTPS If a SharePoint URL is provided, the crawler will detect this and index it as though it were a SharePoint content source (this can be overridden with crawl rules) Page depth and server hops can be controlled
Web Site Content Source  (cont) Security information around content is not included in index Dynamic personalization will result in the index being populated with what the crawler is presented with Example:  http://website  or  http://www.somesite.com
File Shares Content Source Any content visible over a Windows server shared folder Some non-Windows shares *may* be crawled, if that share can be presented as a Windows share (for instance, Samba with Linux, Services for Unix) Start address can be the share root or subfolders beneath it Security information is picked up by the gatherer
Exchange Public Folders Content Source Allows the indexer to crawl a public folder that exists on Exchange Requires Outlook Web Access, as crawl is done over HTTP Includes messages, conversations, and other collaborative content URL presented in the search results will point to a deep link within OWA Example:  http://owa/public/folder
Business Data Content Source Allows the indexer to crawl metadata exposed through the Business Data Catalog Can elect to include all Business Data Applications or a selected number of them
Lotus Notes Content Source
Crawling Schedules Allow administrator to indicate the frequency at which a content source will be re-crawled (daily, weekly, monthly) Can indicate what time the content source should be crawled Schedule should be driven by: Anticipated change at the content source (is this static content or content that is constantly changing) Business expectations around when content changes should be reflected in the index Schedule can always be modified
Maximum File Size Default file size limit is 16MB To change the limit, you must add in the registry new DWORD entry MaxDownloadSize at  HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager   Make sure to increase timeout value to avoid timeout exceptions Change the value using the Manage Search Service page of the Central Admin
Crawl Rules Define exceptions to the “typical” crawl process Addresses can be pattern matched for special treatment Support exclusion Support altering the authentication mechanism Examples of Crawl Rules Testing of Crawl Rules
Search Result Removal (From Live Index) Typically used when someone discovers something in the index that shouldn’t be there Permits administrator to immediately remove that content from the index Crawl rule automatically created to prevent that content from being indexed in the future Restoring that content requires dropping the crawl rule and re-indexing
Default Content Access Account Account used for crawling, by default Can be overridden in the Crawl Rules Set the default account to use when crawling content Minimum crawler permission is “Full Read” (still provides the same security trimming functionality) Automatically configured for new sites Do not use an Administrator Account to avoid crawling unpublished versions of a document.
Metadata Property Mappings
Server Name Mapping Override how MOSS displays search results Hide file path Sample: “file://moss/HOL” to “http://moss.litwareinc.com”
Search-based Alerts Can be Activated / Deactivated Deactivated after a reset of crawled content Users can subscribe to an alert on a search query  Alert is triggered if there are new or changed items that satisfy the search query An item is considered changed if its content or metadata has changed Timer service is used to issue all alerts notifications (See User Alerts in Site Settings) Frequency can be set to Daily / Weekly “ Alert Me” and RSS links can be added/removed using their Web Part property
Reset Crawled Content Powerful action! Will delete the content index! Search Results will no longer be available on the farm until the index has been rebuild! Search alerts are deactivated unless the administrator unchecks the check box.  Alerts should be activated after a full crawl was performed.
Specify Authoritative Pages Helps prioritize Search Results - a way to influence relevance results that are linked to the authoritative pages, which will benefit from a boost in rank. Most authoritative Second-level authoritative Third-level authoritative Sites to demote
Scopes Scopes are filters applied to search results to narrow the results of a search query Types of Scopes Scope Rules and Behaviors Single-rule Scopes Multi-rule Scopes
Site Collection Management (Site Collection Administrators)  (Application Administrators)
Site Collection Administration Options Common Tasks Search Settings Search Scopes Search Keywords
Search Settings Two Options Use the Search Center and custom scopes in the dropdown The way to change standard Search Center URL for search boxes Do not use the Search Center – no custom scopes
Site Level Scopes Site Level Scopes display all scopes associated with a Site Collection Display Scopes are a site-level feature that is purely UI Administrator – Combine multiple scopes into one selectable item Visitors – UI Search dropdown box (or checked boxes for the Advanced Search page) populated with the scopes included in the display group +
Keywords and Best Bets Prominently present editorially selected search results Keywords: Glossary of important terms within your organization Best Bets are associated with particular search keywords Not available across site collections
Search Settings for Fields - NoCrawl Set a NoCrawl  attribute on one or more columns within the site collection Column content will not be indexed!  Associated with Site Columns (Content Types)
Search Visibility Site level Allow or deny the site to appear in search results. If denied, the site will not be indexed. Control ASPX pages within the site for visibility. Will take into consideration item’s specific permissions. List Level Allow or deny the list to appear in search results. If denied, the list will not be indexed. Document Libraries and Folder Level Allow or deny the document library or folder to appear in search results. If denied, the Document Library (or folder) will not be indexed.
Search Usage Reports
Benefits of Search Queries and Results Reporting Allows Site and SSP Administrators to: Have a visual look at end-user queries through charts and graphs Quickly quantify the success or failure of the optimizations they can make to crawlers and indexes Export data to Microsoft Excel to further analyze and mine
To Improve the Overall Search Experience One Must… Best way to improve search is to  understand visitors’ current search usage! Understand what visitors are searching for Products, features, services, general Information about the company, etc. Understand if their search was successful Have they clicked on one of the results? Were there any results – does content exist? Were they offered suggestions specifically associated with their query? Have they misspelled the words within their query?
Reporting Tools Two sets of reports Search Query Reports Search Results Reports Two different levels of reports Shared Service Provider (SSP) Site Collection Enabled by default Enabled within the SSP Do not log queries from the Search Web Service and from the custom Web Parts administrators Note: Data Stored in the SSP database
Reporting Tools At the SSP level For enterprise content oriented administrators
Reporting Tools At the Site Collection level For Site Collection administrators
Search Query Reporting – SSP Tracks Queries that users issued for  all sites managed by this SSP Five Different Reports Queries Over Previous 30 Days Queries Over Previous 12 Months Top Query Origin Site Collection Over Previous 30 Days* Query for Scopes Over Previous 30 Days Top Queries Over Previous 30 Days Also has Tabular View for most reports * Specific to SSP
Search Query Reporting – Site Collection Tracks Queries issued  within this Site Collection Four Different Reports Queries Over Previous 30 Days Queries Over Previous 12 Months Top Queries Over Previous 30 Days Query for Scopes Over Previous 30 Days Also has Tabular View for most reports
Search Results Reporting – SSP Tracks Result Click Selections by users within the sites managed by this SSP Five Different Reports Search Results Top Destination Pages Queries with Zero Results Most Clicked Best Bets Queries With Zero Best Bets Queries With Low Click-through
Search Results Reporting – Site Collection Tracks Result Click Selections by users for this Site Collection Five Different Reports Search Results Top Destination Pages Queries with Zero Results Most Clicked Best Bets (Editorial Results) Queries With Zero Best Bets Queries With Low Click-through Same list reports as SSP but, for Site Collection
Exporting Results Export data for extended reporting in Excel and/or Excel Services
Questions?
Module 8  Performance, Scalability, and  Capacity Planning
Module Agenda Introduction Search Capacity Planning in SPS 2003 MOSS 2007 Search Capacity Planning Topology  Querying Indexing Test Environment Real World Experiences  Microsoft Intranet Microsoft Technology Center Proof of Concept (PoC)
MOSS 2007 Search Capacity Planning Improvement highlights Topology restrictions removed Indexing limitations improved Continuous propagation
Topology Deployment options Collapse index and query services on the same server Enable index service on one server and query service on one or more different servers  For both options you can have only one index server  Scale up versus scaling out
Topology (cont) Topology restrictions from v2 removed Can mix indexer/search roles Service can be managed after initial setup or later on Use mixed x86 and x64 hardware architectures Ifilter, Protocol Handler limitations Index server is very CPU intensive Plan for availablity requirements
Topology (cont) Topology Scaling Reccomandations (for Search): Query servers: 8 per farm Front end servers: 8 per farm Index servers: 4 per farm
MOSS 2007 Search Topology Indexer Load Balancer Propagation of indexes Content databases External content Web  front  ends Query servers User Requests Query servers separated from indexer
Querying Performance parameters Scaling factors
Querying – Performance Parameters Network always is responsible on query performances to end-user experience: In querying the Index Catalog, a front-end always hits SQL database for getting information on search results and for Security Trimming. In querying the Property Store, the Query server is not involved since the Property Store is now on SQL Search database.
Querying – Performance Parameters
Querying – Performance Parameters Query server memory: The more memory is available, the less the Search service will have to access the hard disk to satisfy a given query. Ideally, enough memory should be installed on the query servers to accommodate the entire index. Query server disk speed: RAID 10 is recommended.
Querying – Scaling Factors Processor architecture Use 64-bit servers Planning for performances: separate query from front-end Dedicated processor time Much available RAM for caching Planning for availability: add more than one query server in your farm This will require a dedicated machine for index, as described before Tested maximum of eight query servers
Indexing Planning Performance optimization Storage Limitations Scaling
Indexing Planning Customer environment Number of users Network and connectivity Disperse locations Expected workloads Pilot Rollout plan Estimate indexing window
Indexing Planning (cont) Corpus definition: A corpus is defined as the sum of all content that is being indexed. This includes all valid content sources, like Web pages, items, documents, BDC, and any metadata and security information associated with this content.
Indexing Planning (cont) For each content source estimate: Number of items Storage used  Types of items Security Latency requirements Connectivity Estimate indexing window Expected yearly growth
Indexing -  Performance   Optimization Use dedicated front-end for best indexing performance No other services allowed on that server Adjust the  indexing performance level  Use Maximum for best performance Use Crawler Impact Rules Carefully test impact Continuous propagation Average time is 3 to 27 seconds WSS Change log for incremental crawls
Indexing -  Performance   Optimization Index server CPU: As many processors are available as much crawl speed increases Index server memory: The greater the memory capacity the more documents the crawler can process in parallel Having much available memory means to improve crawl speed Index Server Disk Speed: Raid 10 with 2 ms access time and greater than 150 MB/sec write time
Index Storage Planning index storage as ratio of corpus Sizing depends on content in corpus Type of content source Document formats Level of metadata and security information Plan for expected growth rates
Index Storage (cont) Index / Query Server disk space requirements: Index catalog size is normally in a range of 5% to trough 12% of corpus size Recommended initial disk space is a minimum of 2.5 times of index catalog size That means: recommended initial disk space is  at lease 30%  of indexed corpus size
Index Storage (cont) Search database Contains metadata, ACLs, hit highlighting, crawl history, and usage reports Estimated 2K per crawled document Sizing depends on corpus content Requires more space than the index catalog Recommended initial disk space is a minimum of 4 times of index catalog size
Index Capacity Limitations Supported limit for a single index server is 50 million documents In this scenario we recommand only one Index server per farm One index server per SSP More SSPs can use the same indexer All MOSS 2007 for Search Editions  are limited  to one SSP per farm MOSS 2007 is limited to 20 SSPs per farm MOSS 2007 for Search Standard Edition limited to 500,000 documents per farm
Index Scaling First scale up (recommended) Optimal ranking and user experience Best managability Scale up system resources Use x64 architecture Add more CPUs to increase performance Plan for minimum 4GB of memory RAID 10 is recommended for optimal disk speeds
Index Scaling Scale out Add multiple SSPs each crawling unique parts of the corpus Complete isolation between SSPs Querying across multiple SSPs to get a single relevant results set is not possible Tested maximum of four index servers per farm Recommended limit per farm across all indexes is 50 million items For scenarios higher than 50 million items, add more farms
Test Environment Establish a starting point topology Use monitoring to establish actual performance and capacity data Use Performance Monitor to collect processor, memory, and disk information for each server Look for resource bottlenecks Scale up available resources Scale out server roles
Real World Experiences Microsoft Intranet Microsoft Technology Center PoC
Microsoft Intranet Environment Estimate of indexed content  Around 12 TB in SharePoint Content Databases (mix of 2003 / 2007), unknown size outside of this environment Total size of the index SSP search database ~282GB SSP profiles database ~51GB Index size on disk ~156GB Total number of objects  23 million objects 30 content sources, 6 with daily crawls Typical 'real world' query response time from this implementation  ~2 seconds, although the product group is looking into ways we can optimize this for our environment
Microsoft Technology Center PoC Objectives Indexing large numbers of secure files on file shares Verify MOSS 2007 search architecture Test and recommend capacity planning and scale
Topology Indexed corpus Search db Index catalog Propagated catalog 1TB 23GB 25GB
Results For the biggest test run, which included indexing 2.4 million secure files, here are the key metrics: Full first-time indexing of entire corpus took 23.1 hours. Incremental crawls, where 4.7% of the corpus was updated, took 3.7 hours. Total size of index, versus the corpus, was 2.4%, and for the search database, it was 2.1%.  Full corpus crawl versus average number of items indexed per minute was 1642 files/minute.
Results (cont)
Summary of Known Limits and Restrictions Tested recommendation of 50 million items per farm Hard limits: 1 indexer per SSP 20 indexes per MOSS 2007 farm 1 index per MOSS 2007 for Search farm 500 content sources per SSP 500 start addresses per content source 500,000 documents limit for MOSS 2007 for Search Standard Edition
Capacity Planning References Planning for performance and capacity: http://technet2.microsoft.com/Office/en-us/library/eb2493e8-e498-462a-ab5d-1b779529dc471033.mspx Plan for software boundaries: http://technet2.microsoft.com/Office/en-us/library/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe1033.mspx Estimate performance and capacity requirements for search environments http://technet2.microsoft.com/Office/en-us/library/5465aa2b-aec3-4b87-bce0-8601ff20615e1033.mspx
Questions?
 

More Related Content

PDF
Hcl digital experience
PDF
Oracle WebCenter portal
PPTX
Introduction to SharePoint 2013
PPTX
MEEC Baltimore SharePoint 2010 presentation
PPTX
Meec 2010 SharePoint 2010
PPTX
SharePoint Saturday DC, From SharePoint Foundation to SharePoint Server
PDF
Benefits of SharePoint 2013 over SharePoint 2010
PPSX
Microsoft SharePoint 2010 Overview Session 1
Hcl digital experience
Oracle WebCenter portal
Introduction to SharePoint 2013
MEEC Baltimore SharePoint 2010 presentation
Meec 2010 SharePoint 2010
SharePoint Saturday DC, From SharePoint Foundation to SharePoint Server
Benefits of SharePoint 2013 over SharePoint 2010
Microsoft SharePoint 2010 Overview Session 1

What's hot (20)

PPTX
4 - Silverlight y SharePoint, por Rodrigo Diaz y Mauricio Angulo
PPT
Introduction to SharePoint 2013
PPTX
Sharepoint Server 2010 Genel Bilgilendirme
PPSX
Share point 2010 Fundamentals
PDF
Introduction and What is New: Microsoft SharePoint 2013
PPTX
SharePoint Pros & Cons (2007-2010)
PPTX
Introduction to SharePoint 2013 by Michael Blumenthal
PPTX
SharePoint 2013 features overview
PPTX
Chris McNulty - Managed Metadata and Taxonomies
PPT
Web Page Composer Webinar
PPTX
Share point 2010 overview
PDF
First look at SharePoint 2013
PPTX
Introduction To SharePoint 2010
PPSX
Share Point Presentation Introduction To Sharepoint
PPTX
Data Centric Composites and mashups In SharePoint 2010
PPTX
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
PPTX
SharePoint 2013 overview jeremy thake
PDF
Microsoft SharePoint 2013 Overview from Atidan
PPTX
Share point overview
PPTX
Top 10 in SharePoint 2013
4 - Silverlight y SharePoint, por Rodrigo Diaz y Mauricio Angulo
Introduction to SharePoint 2013
Sharepoint Server 2010 Genel Bilgilendirme
Share point 2010 Fundamentals
Introduction and What is New: Microsoft SharePoint 2013
SharePoint Pros & Cons (2007-2010)
Introduction to SharePoint 2013 by Michael Blumenthal
SharePoint 2013 features overview
Chris McNulty - Managed Metadata and Taxonomies
Web Page Composer Webinar
Share point 2010 overview
First look at SharePoint 2013
Introduction To SharePoint 2010
Share Point Presentation Introduction To Sharepoint
Data Centric Composites and mashups In SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
SharePoint 2013 overview jeremy thake
Microsoft SharePoint 2013 Overview from Atidan
Share point overview
Top 10 in SharePoint 2013
Ad

Similar to Microsoft Enterprise Seach using SharePoint (20)

PDF
Microsoft Search Server 2008 - Technical Overview
PPTX
Share Point2007 Best Practices Final
PPT
Search overview
PPTX
Search In Sharepoint 101
PPT
Search Server Presentation
PPT
Introduction wss-3-and-moss-2007-12324
PPTX
The evolution of Search spscinci
PPTX
Search Server 2010
PPT
Easy Learning Presentation Moss 2007 Usman
PPT
Easy Learning Presentation Moss 2007 Usman
PPTX
Find It With Share Point Search
PPT
Running SharePoint on Small Business Server: Pros and Cons - 100 Level
PDF
Microsoft Enterprise Search
PPT
SharePoint Server 2007 Overview - TechMentor 2007 with Joel Oleson
PDF
Enhancing-Relevancy-and-user-experience-with-SharePoint-search-spsdc-2015
PPTX
SharePoint - You've got it, now what?
PDF
Searching in SharePoint
PPTX
SharePoint 2010 Enterprise Search isn’t just for SharePoint 2010
PPTX
Irw Generic Power Point
PPTX
Introduction To Enterprise Search - OKCSUG 2010
Microsoft Search Server 2008 - Technical Overview
Share Point2007 Best Practices Final
Search overview
Search In Sharepoint 101
Search Server Presentation
Introduction wss-3-and-moss-2007-12324
The evolution of Search spscinci
Search Server 2010
Easy Learning Presentation Moss 2007 Usman
Easy Learning Presentation Moss 2007 Usman
Find It With Share Point Search
Running SharePoint on Small Business Server: Pros and Cons - 100 Level
Microsoft Enterprise Search
SharePoint Server 2007 Overview - TechMentor 2007 with Joel Oleson
Enhancing-Relevancy-and-user-experience-with-SharePoint-search-spsdc-2015
SharePoint - You've got it, now what?
Searching in SharePoint
SharePoint 2010 Enterprise Search isn’t just for SharePoint 2010
Irw Generic Power Point
Introduction To Enterprise Search - OKCSUG 2010
Ad

More from 群智信息有限公司 CAREY Software Service (20)

PPTX
PPTX
Share point 2016_walkthrough_01
PPTX
食安全面品質管理平台
PPTX
2014 developer connect tw_smartbooth
PPTX
食安全面品質管理平台0523
PPTX
E commerce solution based on share point server 2010
PDF
企業問卷調查平台加值套件
PPTX
企業問卷調查平台加值套件
PPTX
您值得更好的解決方案
PPTX
Share point server 2010升級建議與案例
PPTX
PPT
如何在編輯器中直接上傳圖檔
PDF
中鼎工程MOSS系統開發教育訓練
PPT
SharePoint平台客製與開發-2
PPT
Share Point平台客製與開發
PPT
敦群學院-SharePoint精英計畫-系統開發-Day 4
PPT
敦群學院-SharePoint精英計畫-系統開發-Day 3
PPT
敦群學院-SharePoint精英計畫-系統開發-Day 2
PPT
敦群學院-SharePoint精英計畫-系統開發-Day 1
Share point 2016_walkthrough_01
食安全面品質管理平台
2014 developer connect tw_smartbooth
食安全面品質管理平台0523
E commerce solution based on share point server 2010
企業問卷調查平台加值套件
企業問卷調查平台加值套件
您值得更好的解決方案
Share point server 2010升級建議與案例
如何在編輯器中直接上傳圖檔
中鼎工程MOSS系統開發教育訓練
SharePoint平台客製與開發-2
Share Point平台客製與開發
敦群學院-SharePoint精英計畫-系統開發-Day 4
敦群學院-SharePoint精英計畫-系統開發-Day 3
敦群學院-SharePoint精英計畫-系統開發-Day 2
敦群學院-SharePoint精英計畫-系統開發-Day 1

Recently uploaded (20)

PDF
Laughter Yoga Basic Learning Workshop Manual
PDF
WRN_Investor_Presentation_August 2025.pdf
DOCX
Business Management - unit 1 and 2
PDF
Business model innovation report 2022.pdf
PDF
Nidhal Samdaie CV - International Business Consultant
PDF
How to Get Funding for Your Trucking Business
PDF
A Brief Introduction About Julia Allison
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
PPTX
Lecture (1)-Introduction.pptx business communication
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PPT
Data mining for business intelligence ch04 sharda
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
IFRS Notes in your pocket for study all the time
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
Laughter Yoga Basic Learning Workshop Manual
WRN_Investor_Presentation_August 2025.pdf
Business Management - unit 1 and 2
Business model innovation report 2022.pdf
Nidhal Samdaie CV - International Business Consultant
How to Get Funding for Your Trucking Business
A Brief Introduction About Julia Allison
Belch_12e_PPT_Ch18_Accessible_university.pptx
DOC-20250806-WA0002._20250806_112011_0000.pdf
Lecture (1)-Introduction.pptx business communication
New Microsoft PowerPoint Presentation - Copy.pptx
unit 1 COST ACCOUNTING AND COST SHEET
Data mining for business intelligence ch04 sharda
ICG2025_ICG 6th steering committee 30-8-24.pptx
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
IFRS Notes in your pocket for study all the time
Roadmap Map-digital Banking feature MB,IB,AB
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider

Microsoft Enterprise Seach using SharePoint

  • 1. Microsoft Office SharePoint Server 2007 Search Workshop 游家德 Jade Yu 敦群數位科技股份有限公司
  • 2. Microsoft Office SharePoint Server 2007 Enterprise Search Enterprise Search Advanced Training – Building and Implementing Enterprise Search Solutions
  • 3. Workshop Agenda Day 1 – Search Overview Microsoft Search Landscape MOSS 2007 Walkthrough Architecture and Deployment Scenarios Crawl and Query Processes Search Object Model Day 2 – Customization and Management Search Object Model Business Data Catalog (BDC) Search Extensibility and Integration Administration Capacity Planning
  • 4. Assumptions Some knowledge and experience with Search functionality Knowledge of the Business Data Catalog in general (new in Office 2007 System) Office 2007 System Content Creation/Contribution experience Knowledge of Web site creation and management in general Knowledge of MS platform (Windows 2003 Server, ADS, IIS, SQL 2005 & Office Clients) Knowledge of ASP.NET 2.0 and XSLT
  • 5. Workshop Objectives Explain how to use the Office 2007 Search functionality Interpret the Office 2007 System Search Terminology Describe the rich feature set of Office 2007 System Search - Servers and Clients Describe how to use the platform well enough to use its APIs to extend the products Explain how Office 2007 System Search will solve enterprise business requirements
  • 6. Module 1 Enterprise Search Overview
  • 7. Module Agenda Microsoft Enterprise Search Client-side Search Platform Client-side Comparison Server-side Search Platform Key Differences between WSS and MOSS MOSS 2007 for Search Key Features MOSS 2007 for Search and MOSS 2007 Comparison
  • 8. Microsoft Enterprise Search Server-Side Search Platform Line-of-business systems and structured data sources Unstructured information People, expertise External Web sites E-mail messages, appointments, and instant messaging Client-Side Search Platform Documents, programs, and media
  • 9. Client-Side Search Platform Windows Desktop Search (WDS) for XP and Windows Server You must install an additional program for Search Vista – Integrated Desktop Search Integration in the Operating System Ability to search nearly anywhere Virtual Folders
  • 10. Client-Side Comparison Microsoft ® Windows ® Desktop Search Microsoft ® Windows ® Vista Rich, actionable interface X X Integration with Microsoft Outlook X X Polite indexing (Pauses when computer is in use) X X Live icons & document previews X X Advanced Search integrated into the Operating System X Save searches to search folders X Instant Search X (on taskbar) X (from start menu)
  • 11. Server-Side Search Platforms Windows SharePoint Services v3 “ Basic” index / search capabilities to support WSS collaboration and document management Microsoft Office SharePoint Server (MOSS) 2007 Enterprise search and indexing features “unlocked” Several SKUs to support different scenarios and customer needs
  • 12. Key Differences Between WSS and MOSS WSS v3 Microsoft Office SharePoint Server (MOSS) Can Index Local SharePoint content XSharePoint sites / collections, Exchange Public Folders, File Shares, Web Content, Lotus Notes, LOB Apps, and others . . . Rich, relevant results X Alerts, RSS, Did you mean, Duplicate collapsing X Scopes, Managed Properties X Best Bets, Result Removal, Query Reports X Search Center Tabs X BDC Search X API’s provided   Query Query + Admin
  • 13. MOSS 2007 for Search A Search-only solution for intranets and public-facing Web (Internet) sites Two versions Standard Edition limited to 500,000 docs Enterprise Edition with unlimited docs Includes Out of the box search for file shares, Web sites, SharePoint sites, Exchange Public Folders, Lotus Notes databases Extensibility to 3rd party document repositories and file types
  • 14. MOSS 2007 and MOSS FS Usage Scenarios Description Scenario MOSS 2007 An information management solution that includes enterprise search integrated with portal, collaboration, web content management, ECM, forms, and BI functionalities Customers who desire search as an integrated part of a broader information management solution MOSS FS A core search-only solution for intranet and public-facing web sites Customers who require a core search-only product that can be integrated into their existing infrastructure Customers who require search functionality for their public-facing web (Internet) sites
  • 15. MOSS 2007 for Search and MOSS 2007 Features Comparison Features MOSS 2007 for Search (Standard Edition) MOSS 2007 for Search (Enterprise Edition) MOSS 2007 (Standard CAL) MOSS 2007 (Standard plus Enterprise CAL) File shares X X X X Web sites X X X X SharePoint sites X X X X Microsoft Exchange Server public folders X X X X Lotus Notes databases X X X X Third party document repositories 1 X X X X Secure content access control X X X X Enhanced Search Center user interface X X Search for people and expertise X X Business Data Catalog (BDC) X Search structured data sources X Document limit 500,000 No Limit 2 No Limit 2 No Limit 2
  • 17. Module 2 Microsoft Office SharePoint Search 2007 – Walkthrough
  • 18. Module Agenda End-User Improvements Relevance People and Expertise Business Data Search Administration Improvements Design Goals Indexing Management Security Customization Query Reporting Performance Improvements Demo MOSS 2007
  • 19. End-User Improvements Relevance Dramatically improved relevance is the top goal of this release New ingredients added including: Anchor text Click distance URL depth Missing metadata creation Result is noticeably more relevant search 100% better on all queries 500% better on common queries
  • 20. End-User Improvements People and Expertise Bring people into the Search experience Getting your job done means working with the right people Find subject-matter experts based on their knowledge and contacts Numerous improvements over SPS 2003 Index any LDAP V3 directory Dedicated tab for finding people Results grouped by “social distance” to you
  • 21. End-User Improvements Business Data Search Information in Line of Business (LOB) systems is often hard to access MOSS 2007 can bring that data to your users Data is accessed through the Business Data Catalog Exposed to many features in SharePoint Search can easily index the data No need to write code Highly customizable results Integrated with scopes and Search center
  • 22. Address SPS 2003 administration user interface pain points Unify WSS and MOSS search Enable full programmability via the object model Even better scalability and performance Administration Improvements Design Goals
  • 23. Streamlined experience and more control One index per shared service; no need to worry about managing discrete indexes Multiple start addresses per content source MOSS indexes can drive the WSS search experience Allow upgrade from WSS to MOSS Administration Improvements Indexing Management
  • 24. Administration Improvements Security Query-time security trimming in SPS 2003 File shares, WSS/SPS 2003, Exchange, Lotus Notes (via mapping) Now supports pluggable authentication for content in WSS/MOSS sites Based on ASP.NET 2.0 model Minimum required crawler permission is now just Full Read, not Administrator Still provides the same security trimming functionality Ability to remove single items
  • 25. Administration Improvements Customization Search in every company is different Different metadata might matter: Documents: Title, Author, File location, Size Records: Patient, Doctor, Healthcare provider, SSN… How users meaningfully scope searches differs: “ All finance documents” “ All patient records” “ All published documents” Customize results to “pop” metadata that matters Customization offered at many levels Web Parts, XSLT/CSS, full object model…
  • 26. Administration Improvements Query Reporting Best way to improve Search is to understand current usage New out-of-box usage reporting: Query volume trends, top queries, click-through rates, queries with zero results, etc. At both site and service provider levels Export data for extended reporting in Excel Respond to feedback with configuration changes or editorial results
  • 27. Performance Improvements Key new features make the crawls faster so the content is fresher More efficient SharePoint crawling (Change Log Crawl) Continuous propagation Unified WSS and MOSS search Security Change Only Crawl Maximum scale is 10s of millions of documents per indexer
  • 28. Demo – MOSS 2007 G oal of demo is a high level overview with focus on: Search boxes and advanced search Search results experience Search Center Admin experience
  • 30. Module 3 Architecture and Deployment Scenarios
  • 31. Agenda Key concepts MS Search Architecture Deployment Building Blocks WSS v3 Search Topologies MOSS 2007 Search Topologies Search Topology scenarios Small Medium Large Geographically distributed Solution scenarios Collaboration sites Enterprise portal Internet facing portal
  • 32. Microsoft Search Architecture Notes Query Engine Index Engine Protocol Handlers iFilters Content Index OOB Search UI/Custom Search Apps Query OM and Web Service Information … Exchange Folders Network Shares External Web Sites SharePoint Sites Business Data Stemmers WordBreakers Results Query Content Sources Crawl Log Scopes Schema Best Bets Keywords Ranking Search Configuration Data
  • 33. SharePoint Search Topologies: Deployment Building Blocks Physical building blocks: Web Front-End Servers Application servers (Query, Index, Excel Services, etc.) SQL Databases Search functionality segmented into two roles: Indexer Query MOSS 2007 specific Shared Service Provider (SSP) Indexer Web Application(s) Site Collection(s) Content Database(s) Virtual Server(s) (IIS)
  • 34. WSS v3 Search Topology Basics WSS uses both server roles on the same machine (“Search Server”) Indexing Query Ability to index local content only Site Collection (content database(s)) Content is automatically indexed minimal search administration Ability to query at a site and below it stsadm command exposes some admin operations Can Crawl Multiple content databases
  • 35. Sample WSS v3 Topology
  • 36. WSS v3 - Topology Considerations Scale out just like WSS Add content databases for content Add search servers for search Each search server can serve up to 100 content databases Could be lower depending on the data in the content database
  • 37. Adds new functionality over base WSS Search Application server roles can be separated: Indexer Query server Propagation from indexer to query servers Crawl local + external content Enhanced administration experience Ability to search across site collections MOSS 2007 Search Topology Basics
  • 38. MOSS 2007 Search Topology Basics (cont) Query role can be assigned to one or more servers Indexing role can only be assigned to a single server Multiple query servers not allowed IF server is providing both indexing and query services Only one index per SSP . . . although you can have multiple SSPs
  • 39. Sample MOSS 2007 Topology Query servers separated from indexer Indexer crawling local + external content
  • 40. MOSS 2007 – Search Topology Considerations Indexing operations are CPU intensive Dedicated query servers *might* be better in a query heavy environment MOSS / WSS crawls do involve making HTTP requests against the WFE(s) Dual role, WFE / Query servers more efficient with security trimming All servers should be on same network segment
  • 41. MOSS 2007 – Search Topology Considerations (cont) Each farm can index up to 50 million items Beyond this, add more farms Hardware is important
  • 42. Shared Search Service Shared Service Provider (SSP) – grouped high-value, resource intensive services Shared services are consumed by web applications (and sites within them) “ Always on” shared services – all sites in a web application use the same index Resource intensive operations controlled centrally Some admin experience is manageable at site level
  • 43. Search Shared Service Search service People service … Shared Service Provider (SSP) http://sales http://finance http://hr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers Content Databases External content
  • 44. Search Shared Service Search service People service … Shared Service Provider http://sales http://finance http://hr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers Content Indexed Content Databases External content
  • 45. Common Search Topologies Deployment scenarios Small Medium Large Geographically Distributed (MOSS only)
  • 46. Small Search Deployment WSS Single Search Server with both roles Index Single Site Collection only! Single Set of Content Databases Query MOSS Single Server Dual Role Index SSP Based – Multiple Site Collections Multiple Set of Content Databases Query MOSS for Search Single Server / Dual Role (Index and Query)
  • 47. Medium Search Deployment WSS Multiple Search Servers with the following limitations Single Index Server Single Site Collection Single Set of Content Databases Multiple Query Servers MOSS Three Servers One Index Server Two Query Servers running on two Web Front-End servers MOSS for Search Three Servers One Index Server Two Query Servers
  • 48. Large Search Deployment WSS Multiple Search Servers with the following limitations Multiple Index Servers (64-bit) Each Indexing a Single Site Collection with their own Set of Content Databases Index Servers are not redundant from one another. Multiple Query Servers each associated with their own single Index Server running on the same machine (64-bit) Query servers are not redundant from one another MOSS One Index Server (64-bit) Many Separate Query servers (64-bit) MOSS for Search One Index Server (64-bit) Many Separate Query servers (64-bit)
  • 49. Geographically Distributed Sites MOSS Search Deployment Other Locations Corp. Sites Search service People service --- Shared Service Provider (SSP) Index Corp, EMEA, APAC and other locations http://sales http://finance http://hr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers External content Search service People service --- Shared Service Provider (SSP) Index APAC only http://apacsales http://apacfinance http://apachr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers External content Search service People service --- Shared Service Provider (SSP) Index EMEA only http://emeasales http://emeafinance http://emeahr spsite spsite spsite spsite spsite spsite spweb spweb spweb spweb spweb spweb Virtual Servers External content
  • 50. Deployment Scenarios Collaboration Environment (WSS v3) Enterprise Portal (MOSS 2007) Internet Facing Portal (MOSS 2007)
  • 51. Collaboration Environment Scenario WSS v3 iTech – startup software consulting firm Large number of disjoint teams working on projects of varying durations Team sites used for collaboration and communication No organizational needs across sites
  • 52. Collaboration Environment Scenario WSS v3 (cont) WSS farm with single IIS virtual server http://team Scales to large number of team sites Content indexed automatically WSS v3 standalone topology 1 Search box (both roles)
  • 53. Collaboration Environment Scenario WSS v3 (cont) Search – core feature of WSS Contextual scopes – site and list No search across sites http://team team1 team2 spweb spweb Virtual Server team3 spweb spweb SPSites Content Databases
  • 54. Enterprise Portal Scenario MOSS 2007 iTech – growing company with growing needs iTech – needs a single point for information access for employees They now need to search over other repositories: Personnel records – People search Seibel sources – BDC search File Shares / Web sites – other external data
  • 55. Enterprise Portal Scenario MOSS 2007 (cont) Upgrade from WSS  MOSS Search is a shared service through the SSP Central enterprise portal – http://itech Existing virtual server http://team associated with SSP – search box switches to use MOSS Base WSS search is not running – but search available to sites through shared search service Indexes – local and external content
  • 56. Enterprise Portal Scenario MOSS 2007 (cont) Farm http://team team1 team2 spweb spweb Virtual Server team3 spweb spweb SPSites Content Databases Search service People service … Shared Service Provider External content http://itech HR Sales spweb spweb Virtual Server Finance spweb spweb SPSites Content Databases
  • 57. Enterprise Portal Scenario MOSS 2007 (cont) Topology with indexer and query servers Load balanced query servers Scale out and scale up – new SSP dimension Query Servers added for throughput Single indexer crawls logical SSP = local + external content
  • 58. Internet Facing Portal Scenario - MOSS 2007 Internet facing site for customers – www.itech.com High traffic focused on content presentation Public access More publishing and less collaboration Controlled and tightly managed content
  • 59. Internet Facing Portal Scenario - MOSS 2007 (cont) Two separate farms: Production and test farms MOSS installation Controlled publishing of content to production farm from test farm Single shared service provider per farm Shared search service in each farm crawls content in each farm independently
  • 60. Internet Facing Portal Scenario - MOSS 2007 (cont) www.itech.com Services Customers spweb spweb Virtual Server About itech spweb spweb Content Databases SPSites Search service People service --- SSP Production farm http://itechtest Services Customers spweb spweb Virtual Server About itech spweb spweb Content Databases SPSites Search service People service --- SSP Test Farm
  • 62. Module 4 Crawl and Query Processes
  • 63. Agenda The Crawl Process Crawl Walkthrough Index Propagation The Query Process
  • 64. Crawl Walkthrough When a crawl is requested . . . Indexer grabs the start address of content source Start address is prefixed with protocol associated with accessing the content Appropriate protocol handler invoked to traverse the content source During traversal, the handler will identify content nodes it needs to index
  • 65. Crawl Walkthrough (cont) Protocol handler invokes IFilter associated with content node type IFilter identifies and extracts properties from content node Protocol handler supplements IFilter data with additional property information Data associated with content node is added to index Index “delta” propagates to search servers
  • 67. Index Propagation Farm Sample Indexer Load Balancer Crawling Web front ends Index Propagation Query Servers User Requests
  • 68. Propagation will occur only when the index and search components are on separate servers Continuous propagation Changes sent incrementally to all query servers associated with the index server. Merging of the index occurs on the query servers after propagation. Query servers continue serving queries while propagation is in progress Index Propagation
  • 69. Index Propagation Index File Location Set in Office SharePoint Server Search Service settings Default location: C: \ Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications Can be programmatically set using the stsadm command Index Server: “ stsadm.exe -o editssp –indexlocation index file path” Query Server “ stsadm.exe –o osearch –propagationlocation index file path”
  • 70. The Query Process Query Initiation and Results Presentation Query Execution Query Walkthrough
  • 71. Query Initiation and Results Presentation Typically, provided by the WSS / MOSS WFE role, through OOB WebParts Could be an Office client or other custom application Responsible for constructing the “full” query and communicating with the query execution services
  • 72. Query Execution Always provided by a server tagged with the Query role Consumes a query request Executes the request using the query index on the file system as well as the SSP search database (if MOSS) Handles OOB security trimming Returns requested properties of the result set to the caller
  • 73. Query Walkthrough (cont) When a query is requested . . . Query terms collected Terms supplemented with contextual information Query formulated and issued through the Query OM or the Web Service Query is executed against the index and property store Query results returned Results are ordered according to their relevance to the query words Trimmed based on the user’s permissions.
  • 75. Module 5 The Search End-User Experience
  • 76. Module Agenda Introducing the Search End-User Experience Customizing Search People Search
  • 77. Introducing the Search End-User Experience Complete Search experience Search is everywhere Tab-based user interface for easy navigation Easy to extend and customize
  • 78. Introducing the End-User Search Experience Search Boxes Search Center Search Web Parts
  • 79. Query Results Http: Get Http: Post Search Box XML Web Parts XSL Transformation Query OM Advanced Search Hidden Object XML XML OOB Search UI/Custom Search Apps Query OM and Web Service
  • 80. Search WebParts Nine Standard Search Web Parts Search Box Core Results High Confidence Statistics Pagination Action Links Matching Keywords and Best Bets Search Summary (Did you mean?) Advanced Search
  • 81. Result page infrastructure Data shared through hidden object All Search Web Parts within the same page share the same hidden object Connection between Search Web Part is automatically done Need only to Drag and Drop (or select) a Search Web Part on the page Allows for rapid page design Hidden Object is internal and cannot be used by custom Web Parts All Search Web Parts derive from Data Form Web Part
  • 82. Advanced Search Allows power searchers to exercise greater control on how they query A link from the search box Control what is displayed in the page by modifying the xml stored in the web part property “Properties” i.e., can be used for displaying a new language check box Not provided by WSS Search UI Implemented using the SQL syntax
  • 83. Customizing the End User Experience Search in every company is different Different metadata might matter Documents: Title, Author, File location, size Records: Patient, Doctor, Healthcare provider, SSN… Multi- or single-languages How users meaningfully scope searches differs “ All finance documents” “ All patient records” “ All published documents” Customize results to “pop” metadata that matters Customization offered at many levels Web Parts, XSLT/CSS, full Object Model…
  • 84. Customization Choices Search Center Simple Site with few pages Default Page Result Page Advanced Search Page People Search Page Results Pages All Sites Results Page People Results Page Advanced Search Page and Web Part Show Scope Picker Scopes Property Picker Languages Search Web Parts
  • 85. Customizing Search Adding Search Center Tabs Customizing Search Web Parts Customizing Search Results
  • 86. People Search Bring people into the search experience Getting your job done means working with the right people Find subject matter experts based on their knowledge and contacts People list can come from AD, SQL, others Discovering Experts People are as important as data!
  • 87. People Search People Results Customizing Results
  • 88. Refine Your People Search Refine by Job Title Searches for the selected Job Title Refine by Department Searches for the selected Department “ Show more options” link (6+) Listed in order of frequency
  • 89. People Search Web Parts Two OOB People Search Web Parts People Search Box People Search Core Results Inherit from the Search Core Results Web Part Can be mixed on the same page with other Search Web Parts
  • 90. People Results Search Web Parts Web Part properties such as: (similar to Core Search WP) Formatting (i.e. width of the search box) Number of Results per page Display “Alert Me”, “RSS” links Turn stemming on/off (default “off”) Remove Duplicate Results on/off (default “on”) Fixed keyword Query Select Columns Results formatting with XSL Social Distance (view)
  • 91. Social Distance Colleagues Suggested Colleague list members are mined from: Microsoft Windows Messenger (IM) Microsoft Office Outlook e-mail (Outlook Add-In)
  • 93. Module 6 Search Object Model
  • 94. Workshop Agenda Scenarios for Extending Search Query Syntax Query Object Model Query Web Service
  • 95. Topic: Scenarios for Extending Search In this first section we will examine 2 scenarios for extending Search: Integrate with Search Center Integrate Search into 3rd party sites and applications
  • 96. Integrate with MOSS Search Center Use cases: Use Search URL request parameters to add predefined saved searches Build custom search box Web parts for custom look and feel Build custom search core result Web parts for own look and feel and customized querying Extending Search
  • 97. Integrate MOSS Search into 3rd Party Sites and Applications Build 3rd party user interface which leverages MOSS Search through Web Services Use cases Add MOSS Search features into existing Web sites Add MOSS Search into existing line of business or custom applications Extending Search
  • 98. Topic: Query Syntax In this section we will examine the three types of search syntax for building search queries supported by MOSS: Keyword URL SQL
  • 99. Keyword Syntax Used in standard Search Box New keyword syntax Simple and easy to use Consistent property:value syntax across Office, Windows and Live search Overview gallery hinges –brass site:http//supportdesk scope:Products
  • 100. Build-in support for using include and exclude terms Look for term bike, but not related to fitness Look for phrase “SharePoint Services” but not the term v2 Include is implied when is no (+/-) prefix Keyword Syntax Include/Exclude bike -fitness +”SharePoint Services”-v2
  • 101. Narrowing results by default Searches using “AND” between query terms Does not recognize logical operators like “OR”, “NEAR” as keywords – it treats them all as search terms Does not support complex queries like (A AND B) OR (C AND D) Complex Boolean searches are supported by the engine and the SQL syntax Keyword Syntax Boolean Search
  • 102. Keyword Syntax Property restrictions Supports property:value as part of the keyword string Can use any managed property Supports the use of phrases Can be used for exact matches when the property value includes spaces Without quotes then prefix matching is done. Supports word stemming
  • 103. No wildcard support in Keyword Syntax Search box does not do wildcard searching. The following is not recognized as a wildcard search  Use Advanced Search property restrictions to look for parts of a word Requires new search results Web parts Wildcards are supported by the engine and the SQL query syntax Keyword Syntax No wildcard support ShareP*
  • 104. URL Syntax Use Case Launching a URL in custom application Save Searches Custom search boxes Request Parameters Content: results.aspx?k=fish Scopes: results.aspx?k=fish&s=BBC Sort: results.aspx?v=date results.aspx?v=relevance Page: results.aspx?start=21
  • 105. SQL Syntax Overview SQL Syntax offers: Consistent SQL across enterprise and desktop Complex queries and Boolean searches Comparison operators Arbitrary groupings for AND, OR, NOT Freetext() CONTAINS() LIKE ORDER BY ASC | DESC Custom SQL query statements Wildcard support
  • 106. Write complex Boolean searches using AND, OR, NOT SQL Syntax Complex Boolean Searches
  • 107. Returns documents for which the following is true: Document contains all the search terms in at least one of the columns specified One of the search terms must also be found in the Contents column Use only one FREETEXT predicate for most optimal ranking The FREETEXT predicate also supports (+/-) SQL Syntax FREETEXT predicate
  • 108. Get wildcard support using the CONTAINS predicate: Wildcard: Words or phrases with an asterisk (*) added to the end. WHERE CONTAINS (' "compu*" NEAR "soft*" ') SQL Syntax Wildcard Support
  • 109. Removed in MOSS 2007 Query property weights UNION ALL MATCHES SELECT * COALESCE TABLE   SQL Syntax Removed from SQL syntax
  • 110. Topic: Query Object Model In this section we will examine: The Query Object Model The Query Object Path The Query Web Service
  • 111. Query Object Model New object model Use the query object model to: Build custom search user interface, like Web parts or ASPX applications Gain direct access to query and results properties Invoke custom queries 2 types of query syntaxes: Keyword SQL
  • 112. Query Object Model Features Managed code API Single request – multiple results Result Types Relevant results High confidence results Special terms Definitions Optional parameters # of Sentences in Summary Implicit - AND/OR Number of results Ignore noise words Enable stemming Language
  • 113. Query Object Path Query OM Input Output SQL Query Optional Parameters Query Engine ResultTableCollection ResultTable: IDataReader Relevant results High confidence Special terms Definitions Site UI Custom Client Local Remote Keyword Query Execute()
  • 114. Query Web Service Use and Methods Use Case Leverage Search in remote sites or application Office Research Pane Methods Query QueryEx GetSearchMetaData Registration Status
  • 115. Query Web Service Search Center Features Standard Search Center features not built into the Web service Hit highlighting Search usage reporting Search logging Search statistics Result type icons Using Query vs. QueryEx Implementing hit highlighting
  • 118. Module Agenda Administrative Architecture Farm Administration SSP Administration Site Collection Administration Site Administration Search Usage Reporting Administrative Tools Lab: Adding Content Sources Lab: Search Schema
  • 119. Administrative Architecture Shared Services Business unit IT Service-level configuration E.g. Create search content source, Search Scopes Central Administration IT Administrators Farm-level Status Resource management One per farm E.g. Create new site Site Settings Business site owner Site specific configuration and tasks e.g. Create new list Three Tier Administration Web-based Role- and Task-delineated Controlled Delegation Secure Isolation
  • 120. Farm Management (IT Administrators)
  • 121. SharePoint 3.0 Central Administration Common Tasks Manage Topology and Services Servers in Farm Services in Server Security Configuration Update Farm Administrator’s Group Backup and Restore Index Search Database Global Configuration Timer Job Definitions Timer Job Status Manage Search Service
  • 123. Operations – Topology and Services Servers in Farm / Services on Server Query Server(s) Office SharePoint Server Search Service Stop / Start Office SharePoint Services Help Search Service Stop / Start Index Server(s) Office SharePoint Server Search Service Stop / Start
  • 124. Operations – Backup and Restore Perform a backup Restore from backup
  • 125. Operations – Global Configuration Timer Job Definitions SharePoint Services Search Refresh Disable / Enable (Change and update WSS search configuration) Indexing Schedule Manager on MOSS Disable / Enable Timer Job Status Succeeded / Failed
  • 126. Search Application Management Manage Search Service Farm-level Search settings Proxy Server settings Query and Index Servers Server Listing and their Search service Shared Service Providers with Search enabled SSP name listing Crawler Impact Rules
  • 127. Crawler Impact Rules Configured through Central Administration Allows “throttling” of the indexer to reduce impact of a crawl on a particular server Supports wildcards Used in conjunction with crawl schedules
  • 128. Crawler Impact Rules (cont) Use . . . To . . . * as the site name Apply the rule to all sites *.* as the site name Apply the rule to sites with a dot in their name *. site_name .com as the site name Apply the rule to all sites in the site_name .com domain *.top-level_domain_name (such as *.com or *.net) as the site name Apply the rule to all sites that end with a specific top-level domain name ? Replace any single character in a rule
  • 129. Shared Services Provider (SSP) Management (SSP Administrators) (Content Oriented Administration)
  • 130. Common Tasks Configure Search Settings Content Sources Crawl Settings Authoritative Pages Settings Scopes
  • 131. Content Sources Represent an arbitrary container of information Require at least one start address, although multiple start addresses can be provided Start address cannot be reused Requires a registered protocol handler Five out-of-box content source types are available, mapping to the five out-of-box protocol handlers
  • 132. SharePoint Content Source Includes both SPS 2003, MOSS 2007, WSS v2, and WSS v3 sites Can limit crawl to only sites specified in start address or all sites found below one or more provided hostnames Crawler will use target site’s APIs to include security information around content in the index For SPS 2003 content sources, crawler account requires “change” rights, which necessitates the crawler having administrator rights Examples: sps3://moss-01/ or http://moss-01/sitecollection/ Content sources decoupled from scopes
  • 133. Web Site Content Source Any content source available over HTTP or HTTPS If a SharePoint URL is provided, the crawler will detect this and index it as though it were a SharePoint content source (this can be overridden with crawl rules) Page depth and server hops can be controlled
  • 134. Web Site Content Source (cont) Security information around content is not included in index Dynamic personalization will result in the index being populated with what the crawler is presented with Example: http://website or http://www.somesite.com
  • 135. File Shares Content Source Any content visible over a Windows server shared folder Some non-Windows shares *may* be crawled, if that share can be presented as a Windows share (for instance, Samba with Linux, Services for Unix) Start address can be the share root or subfolders beneath it Security information is picked up by the gatherer
  • 136. Exchange Public Folders Content Source Allows the indexer to crawl a public folder that exists on Exchange Requires Outlook Web Access, as crawl is done over HTTP Includes messages, conversations, and other collaborative content URL presented in the search results will point to a deep link within OWA Example: http://owa/public/folder
  • 137. Business Data Content Source Allows the indexer to crawl metadata exposed through the Business Data Catalog Can elect to include all Business Data Applications or a selected number of them
  • 139. Crawling Schedules Allow administrator to indicate the frequency at which a content source will be re-crawled (daily, weekly, monthly) Can indicate what time the content source should be crawled Schedule should be driven by: Anticipated change at the content source (is this static content or content that is constantly changing) Business expectations around when content changes should be reflected in the index Schedule can always be modified
  • 140. Maximum File Size Default file size limit is 16MB To change the limit, you must add in the registry new DWORD entry MaxDownloadSize at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager Make sure to increase timeout value to avoid timeout exceptions Change the value using the Manage Search Service page of the Central Admin
  • 141. Crawl Rules Define exceptions to the “typical” crawl process Addresses can be pattern matched for special treatment Support exclusion Support altering the authentication mechanism Examples of Crawl Rules Testing of Crawl Rules
  • 142. Search Result Removal (From Live Index) Typically used when someone discovers something in the index that shouldn’t be there Permits administrator to immediately remove that content from the index Crawl rule automatically created to prevent that content from being indexed in the future Restoring that content requires dropping the crawl rule and re-indexing
  • 143. Default Content Access Account Account used for crawling, by default Can be overridden in the Crawl Rules Set the default account to use when crawling content Minimum crawler permission is “Full Read” (still provides the same security trimming functionality) Automatically configured for new sites Do not use an Administrator Account to avoid crawling unpublished versions of a document.
  • 145. Server Name Mapping Override how MOSS displays search results Hide file path Sample: “file://moss/HOL” to “http://moss.litwareinc.com”
  • 146. Search-based Alerts Can be Activated / Deactivated Deactivated after a reset of crawled content Users can subscribe to an alert on a search query Alert is triggered if there are new or changed items that satisfy the search query An item is considered changed if its content or metadata has changed Timer service is used to issue all alerts notifications (See User Alerts in Site Settings) Frequency can be set to Daily / Weekly “ Alert Me” and RSS links can be added/removed using their Web Part property
  • 147. Reset Crawled Content Powerful action! Will delete the content index! Search Results will no longer be available on the farm until the index has been rebuild! Search alerts are deactivated unless the administrator unchecks the check box. Alerts should be activated after a full crawl was performed.
  • 148. Specify Authoritative Pages Helps prioritize Search Results - a way to influence relevance results that are linked to the authoritative pages, which will benefit from a boost in rank. Most authoritative Second-level authoritative Third-level authoritative Sites to demote
  • 149. Scopes Scopes are filters applied to search results to narrow the results of a search query Types of Scopes Scope Rules and Behaviors Single-rule Scopes Multi-rule Scopes
  • 150. Site Collection Management (Site Collection Administrators) (Application Administrators)
  • 151. Site Collection Administration Options Common Tasks Search Settings Search Scopes Search Keywords
  • 152. Search Settings Two Options Use the Search Center and custom scopes in the dropdown The way to change standard Search Center URL for search boxes Do not use the Search Center – no custom scopes
  • 153. Site Level Scopes Site Level Scopes display all scopes associated with a Site Collection Display Scopes are a site-level feature that is purely UI Administrator – Combine multiple scopes into one selectable item Visitors – UI Search dropdown box (or checked boxes for the Advanced Search page) populated with the scopes included in the display group +
  • 154. Keywords and Best Bets Prominently present editorially selected search results Keywords: Glossary of important terms within your organization Best Bets are associated with particular search keywords Not available across site collections
  • 155. Search Settings for Fields - NoCrawl Set a NoCrawl attribute on one or more columns within the site collection Column content will not be indexed! Associated with Site Columns (Content Types)
  • 156. Search Visibility Site level Allow or deny the site to appear in search results. If denied, the site will not be indexed. Control ASPX pages within the site for visibility. Will take into consideration item’s specific permissions. List Level Allow or deny the list to appear in search results. If denied, the list will not be indexed. Document Libraries and Folder Level Allow or deny the document library or folder to appear in search results. If denied, the Document Library (or folder) will not be indexed.
  • 158. Benefits of Search Queries and Results Reporting Allows Site and SSP Administrators to: Have a visual look at end-user queries through charts and graphs Quickly quantify the success or failure of the optimizations they can make to crawlers and indexes Export data to Microsoft Excel to further analyze and mine
  • 159. To Improve the Overall Search Experience One Must… Best way to improve search is to understand visitors’ current search usage! Understand what visitors are searching for Products, features, services, general Information about the company, etc. Understand if their search was successful Have they clicked on one of the results? Were there any results – does content exist? Were they offered suggestions specifically associated with their query? Have they misspelled the words within their query?
  • 160. Reporting Tools Two sets of reports Search Query Reports Search Results Reports Two different levels of reports Shared Service Provider (SSP) Site Collection Enabled by default Enabled within the SSP Do not log queries from the Search Web Service and from the custom Web Parts administrators Note: Data Stored in the SSP database
  • 161. Reporting Tools At the SSP level For enterprise content oriented administrators
  • 162. Reporting Tools At the Site Collection level For Site Collection administrators
  • 163. Search Query Reporting – SSP Tracks Queries that users issued for all sites managed by this SSP Five Different Reports Queries Over Previous 30 Days Queries Over Previous 12 Months Top Query Origin Site Collection Over Previous 30 Days* Query for Scopes Over Previous 30 Days Top Queries Over Previous 30 Days Also has Tabular View for most reports * Specific to SSP
  • 164. Search Query Reporting – Site Collection Tracks Queries issued within this Site Collection Four Different Reports Queries Over Previous 30 Days Queries Over Previous 12 Months Top Queries Over Previous 30 Days Query for Scopes Over Previous 30 Days Also has Tabular View for most reports
  • 165. Search Results Reporting – SSP Tracks Result Click Selections by users within the sites managed by this SSP Five Different Reports Search Results Top Destination Pages Queries with Zero Results Most Clicked Best Bets Queries With Zero Best Bets Queries With Low Click-through
  • 166. Search Results Reporting – Site Collection Tracks Result Click Selections by users for this Site Collection Five Different Reports Search Results Top Destination Pages Queries with Zero Results Most Clicked Best Bets (Editorial Results) Queries With Zero Best Bets Queries With Low Click-through Same list reports as SSP but, for Site Collection
  • 167. Exporting Results Export data for extended reporting in Excel and/or Excel Services
  • 169. Module 8 Performance, Scalability, and Capacity Planning
  • 170. Module Agenda Introduction Search Capacity Planning in SPS 2003 MOSS 2007 Search Capacity Planning Topology Querying Indexing Test Environment Real World Experiences Microsoft Intranet Microsoft Technology Center Proof of Concept (PoC)
  • 171. MOSS 2007 Search Capacity Planning Improvement highlights Topology restrictions removed Indexing limitations improved Continuous propagation
  • 172. Topology Deployment options Collapse index and query services on the same server Enable index service on one server and query service on one or more different servers For both options you can have only one index server Scale up versus scaling out
  • 173. Topology (cont) Topology restrictions from v2 removed Can mix indexer/search roles Service can be managed after initial setup or later on Use mixed x86 and x64 hardware architectures Ifilter, Protocol Handler limitations Index server is very CPU intensive Plan for availablity requirements
  • 174. Topology (cont) Topology Scaling Reccomandations (for Search): Query servers: 8 per farm Front end servers: 8 per farm Index servers: 4 per farm
  • 175. MOSS 2007 Search Topology Indexer Load Balancer Propagation of indexes Content databases External content Web front ends Query servers User Requests Query servers separated from indexer
  • 177. Querying – Performance Parameters Network always is responsible on query performances to end-user experience: In querying the Index Catalog, a front-end always hits SQL database for getting information on search results and for Security Trimming. In querying the Property Store, the Query server is not involved since the Property Store is now on SQL Search database.
  • 179. Querying – Performance Parameters Query server memory: The more memory is available, the less the Search service will have to access the hard disk to satisfy a given query. Ideally, enough memory should be installed on the query servers to accommodate the entire index. Query server disk speed: RAID 10 is recommended.
  • 180. Querying – Scaling Factors Processor architecture Use 64-bit servers Planning for performances: separate query from front-end Dedicated processor time Much available RAM for caching Planning for availability: add more than one query server in your farm This will require a dedicated machine for index, as described before Tested maximum of eight query servers
  • 181. Indexing Planning Performance optimization Storage Limitations Scaling
  • 182. Indexing Planning Customer environment Number of users Network and connectivity Disperse locations Expected workloads Pilot Rollout plan Estimate indexing window
  • 183. Indexing Planning (cont) Corpus definition: A corpus is defined as the sum of all content that is being indexed. This includes all valid content sources, like Web pages, items, documents, BDC, and any metadata and security information associated with this content.
  • 184. Indexing Planning (cont) For each content source estimate: Number of items Storage used Types of items Security Latency requirements Connectivity Estimate indexing window Expected yearly growth
  • 185. Indexing - Performance Optimization Use dedicated front-end for best indexing performance No other services allowed on that server Adjust the indexing performance level Use Maximum for best performance Use Crawler Impact Rules Carefully test impact Continuous propagation Average time is 3 to 27 seconds WSS Change log for incremental crawls
  • 186. Indexing - Performance Optimization Index server CPU: As many processors are available as much crawl speed increases Index server memory: The greater the memory capacity the more documents the crawler can process in parallel Having much available memory means to improve crawl speed Index Server Disk Speed: Raid 10 with 2 ms access time and greater than 150 MB/sec write time
  • 187. Index Storage Planning index storage as ratio of corpus Sizing depends on content in corpus Type of content source Document formats Level of metadata and security information Plan for expected growth rates
  • 188. Index Storage (cont) Index / Query Server disk space requirements: Index catalog size is normally in a range of 5% to trough 12% of corpus size Recommended initial disk space is a minimum of 2.5 times of index catalog size That means: recommended initial disk space is at lease 30% of indexed corpus size
  • 189. Index Storage (cont) Search database Contains metadata, ACLs, hit highlighting, crawl history, and usage reports Estimated 2K per crawled document Sizing depends on corpus content Requires more space than the index catalog Recommended initial disk space is a minimum of 4 times of index catalog size
  • 190. Index Capacity Limitations Supported limit for a single index server is 50 million documents In this scenario we recommand only one Index server per farm One index server per SSP More SSPs can use the same indexer All MOSS 2007 for Search Editions are limited to one SSP per farm MOSS 2007 is limited to 20 SSPs per farm MOSS 2007 for Search Standard Edition limited to 500,000 documents per farm
  • 191. Index Scaling First scale up (recommended) Optimal ranking and user experience Best managability Scale up system resources Use x64 architecture Add more CPUs to increase performance Plan for minimum 4GB of memory RAID 10 is recommended for optimal disk speeds
  • 192. Index Scaling Scale out Add multiple SSPs each crawling unique parts of the corpus Complete isolation between SSPs Querying across multiple SSPs to get a single relevant results set is not possible Tested maximum of four index servers per farm Recommended limit per farm across all indexes is 50 million items For scenarios higher than 50 million items, add more farms
  • 193. Test Environment Establish a starting point topology Use monitoring to establish actual performance and capacity data Use Performance Monitor to collect processor, memory, and disk information for each server Look for resource bottlenecks Scale up available resources Scale out server roles
  • 194. Real World Experiences Microsoft Intranet Microsoft Technology Center PoC
  • 195. Microsoft Intranet Environment Estimate of indexed content  Around 12 TB in SharePoint Content Databases (mix of 2003 / 2007), unknown size outside of this environment Total size of the index SSP search database ~282GB SSP profiles database ~51GB Index size on disk ~156GB Total number of objects 23 million objects 30 content sources, 6 with daily crawls Typical 'real world' query response time from this implementation ~2 seconds, although the product group is looking into ways we can optimize this for our environment
  • 196. Microsoft Technology Center PoC Objectives Indexing large numbers of secure files on file shares Verify MOSS 2007 search architecture Test and recommend capacity planning and scale
  • 197. Topology Indexed corpus Search db Index catalog Propagated catalog 1TB 23GB 25GB
  • 198. Results For the biggest test run, which included indexing 2.4 million secure files, here are the key metrics: Full first-time indexing of entire corpus took 23.1 hours. Incremental crawls, where 4.7% of the corpus was updated, took 3.7 hours. Total size of index, versus the corpus, was 2.4%, and for the search database, it was 2.1%. Full corpus crawl versus average number of items indexed per minute was 1642 files/minute.
  • 200. Summary of Known Limits and Restrictions Tested recommendation of 50 million items per farm Hard limits: 1 indexer per SSP 20 indexes per MOSS 2007 farm 1 index per MOSS 2007 for Search farm 500 content sources per SSP 500 start addresses per content source 500,000 documents limit for MOSS 2007 for Search Standard Edition
  • 201. Capacity Planning References Planning for performance and capacity: http://technet2.microsoft.com/Office/en-us/library/eb2493e8-e498-462a-ab5d-1b779529dc471033.mspx Plan for software boundaries: http://technet2.microsoft.com/Office/en-us/library/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe1033.mspx Estimate performance and capacity requirements for search environments http://technet2.microsoft.com/Office/en-us/library/5465aa2b-aec3-4b87-bce0-8601ff20615e1033.mspx
  • 203.