SlideShare a Scribd company logo
Getting Rid of Duplicate Content Issues Once and For All PubCon, Las Vegas November 13, 2008 Ben D’Angelo Software Engineer
What are “duplicate content issues”? Multiple disjoint situations! Duplicate content within your site or sites Multiple URLs pointing to the same page, similar pages Different countries (same language) Duplicate content across other sites Syndicated content Scraped content
Guiding principle One URL for one piece of content Why? Users don’t like duplicates in results Saves resources in our index—more room for other pages from your site! Saves resources on your server
Sources of duplicates within your sites Multiple URLs pointing to the same page www vs non-www Session ids, URL parameters Printable versions of pages CNAMEs Similar content on different pages Manufacturer’s databases Different countries
Many systems for de-duping URLs at various stages in our crawl/index pipeline General idea: cluster pages, choose the “best” representative Different filters are used for different types of duplicate content Goal: serve one version of the content in search results Generally just a filter: it will not destroy your site How does Google handle this?
What can you do about your site? For exact dupes: 301 Tracking URLs www vs non-www (also Google Webmaster Tools) Near duplicates: noindex / robots.txt Printable pages Clones of other sites Domains by country Different languages is not duplicate content Use unique content specific to the country Use different TLDs (also Google Webmaster Tools) for geo-targeting Url parameters Put data which does not affect the substance of a page in a cookie
What can you do about your site? Choose www or non-www as preferred
What can you do about your site?
What can you do about another site? Include original absolute URL in syndicated content Syndicate different content If you use syndicated content, manage your expectations Don’t worry about scrapers or proxies too much; they generally don’t affect your rankings If you are concerned, file a DMCA request ( http://www.google.com/dmca.html ) Spam report ( https://www.google.com/webmasters/tools/spamreport )
Best practices for Google Avoid duplicate URLs / sites Generate unique, compelling content for users Don’t be overly concerned with duplicate content Let us know about any issues at the Webmaster Help Forum
Useful links Webmaster Central     http://google.com/webmasters/ Webmaster Central Blog http://googlewebmastercentral.blogspot.com/ Webmaster Help Center http://www.google.com/support/webmasters/ Webmaster Discussion Group http://groups.google.com/group/Google_Webmaster_Help
Thank You!

More Related Content

PDF
Web Development Workshop (Front End)
PDF
Introduction to HTML 5
PPT
Hour 3
 
PPTX
Web essentials
PPT
On the incoherencies in web browser access control
PPTX
10x10 on <link />
PDF
Introduction to the Web and HTML
PPT
Ndim1 2009 Web Design
Web Development Workshop (Front End)
Introduction to HTML 5
Hour 3
 
Web essentials
On the incoherencies in web browser access control
10x10 on <link />
Introduction to the Web and HTML
Ndim1 2009 Web Design

What's hot (17)

PPTX
HTML language
PPTX
Questionnaires for open source cms proposal
PDF
HTML an introduction
PPT
The Ulta-Handy HTML Guide
PPTX
Web Pages
PPTX
Artistic Web Applications - Week3 - Part 3
PDF
Adaptive Blue Sem Tech Meetup Nyc
PPTX
Html.ppt
PPTX
Html
PPTX
What's a web page made of?
PDF
Web Fundamentals Crash Course
PPT
1. html introduction
PDF
Don't Re-Invent the Genealogy App Wheel
PPTX
HTML to FTP
PPTX
Html part 2
PPTX
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009
PDF
Q and a design2 web
HTML language
Questionnaires for open source cms proposal
HTML an introduction
The Ulta-Handy HTML Guide
Web Pages
Artistic Web Applications - Week3 - Part 3
Adaptive Blue Sem Tech Meetup Nyc
Html.ppt
Html
What's a web page made of?
Web Fundamentals Crash Course
1. html introduction
Don't Re-Invent the Genealogy App Wheel
HTML to FTP
Html part 2
Elastic: Why WYSIWYG is the future of WordPress themes — WordCamp NYC 2009
Q and a design2 web
Ad

Viewers also liked (7)

PDF
Služby k elektronickým zdrojům NTK - SFX, EZproxy, Zotero
PPT
Zvirata
PPT
Google Reader
KEY
Hur Man Skriver För Google
DOC
Google dokumenti
PDF
Getting Started Guide
PPT
Defense in Depth Web Inkognito 12/2013
Služby k elektronickým zdrojům NTK - SFX, EZproxy, Zotero
Zvirata
Google Reader
Hur Man Skriver För Google
Google dokumenti
Getting Started Guide
Defense in Depth Web Inkognito 12/2013
Ad

Similar to getting_rid_of_duplicate_content_iss-ben_dangelo.ppt (20)

PPT
Duplicate content presentation March 2012
PPT
getting_rid_of_duplicate_content_iss-priyank_garg.ppt
PPTX
A Technical Solution To Content Duplication
PPTX
BrightonSEO - A Technical Solution To Content Duplication
PPT
REAL PALMAS GRUCOMSA - Tutorial: Google for Webmasters
PPT
GRUPO CONSTRUCTOR MIAHUATLAN - Tutorial: Google for Webmasters
PDF
Google for webmasters
PDF
Google Tutorial For Webmasters Sites
PPT
ваш сантехник в Питере - Tutorial: Google for Webmasters
PPT
ваш сантехник в Питере - Tutorial: Google for Webmasters
PPT
CATOLICO LUCHADOR - Tutorial: Google for Webmasters
PPT
Winnipeg Pay Per Click Advertising - Tutorial: Google for Webmas
PPT
Cubrickz - Tutorial: Google for Webmasters
PDF
SEO Cannibalisation of Your Own SEO Success
PPT
Google for webmasters
PDF
seo - on page - part iv - link structure
PPT
Tutorial Google For Webmasters
PPT
Chewy Trewella - Google Searchtips
PPT
Prueba - Tutorial: Google for Webmasters
PPT
annachbiz - Tutorial: Google for Webmasters
Duplicate content presentation March 2012
getting_rid_of_duplicate_content_iss-priyank_garg.ppt
A Technical Solution To Content Duplication
BrightonSEO - A Technical Solution To Content Duplication
REAL PALMAS GRUCOMSA - Tutorial: Google for Webmasters
GRUPO CONSTRUCTOR MIAHUATLAN - Tutorial: Google for Webmasters
Google for webmasters
Google Tutorial For Webmasters Sites
ваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmasters
CATOLICO LUCHADOR - Tutorial: Google for Webmasters
Winnipeg Pay Per Click Advertising - Tutorial: Google for Webmas
Cubrickz - Tutorial: Google for Webmasters
SEO Cannibalisation of Your Own SEO Success
Google for webmasters
seo - on page - part iv - link structure
Tutorial Google For Webmasters
Chewy Trewella - Google Searchtips
Prueba - Tutorial: Google for Webmasters
annachbiz - Tutorial: Google for Webmasters

More from zachbrowne (20)

PPT
Understanding the Complex Socia-Cameron Olthuis.ppt
PPT
Twenty Six Steps Revisited.ppt
PPT
Top Secret Tools of the Trade Rand Fishkin.ppt
PPT
the_wonderful_world_of_widgets-will_price.ppt
PPT
the_wonderful_world_of_widgets-lawrence_coburn.ppt
PPT
the_secret_life_of_on_site_search_exposed-marc_cull.ppt
PPT
taking_your_analytics_data_beyon-geoff_mack.ppt
PPT
taking_your_analytics_data_beyon-davide_leigh.ppt
PPT
taking_your_analytics_data_-shuman_ghosemajumder.ppt
PPT
tag_you_are_it_how_to_leverage_you-dan_zarrella.ppt
PPT
tag_you_are_it_how_to_leverag-geoff_livingston.ppt
PPT
social_media_the_big_sexy_buzz-kent_schoen.ppt
PPT
social_media_the_big_sexy_buzz-guillaume_bouchard.ppt
PPT
seo_design_and_organic_site_structure-mark_jackson.ppt
PPT
seo_design_and_organic_site_structure-alan_knecht.ppt
PPT
seo_design_and_organic-lyndsay_walker_blahut.ppt
PPT
seo_and_big_search-maile_ohye.ppt
PPT
seo_and_big_search-david_roth.ppt
PPT
reputation_monitoring_and_management-andy_beal.ppt
PPT
reputation_monitoring_and_managememt-jessica_berlin.ppt
Understanding the Complex Socia-Cameron Olthuis.ppt
Twenty Six Steps Revisited.ppt
Top Secret Tools of the Trade Rand Fishkin.ppt
the_wonderful_world_of_widgets-will_price.ppt
the_wonderful_world_of_widgets-lawrence_coburn.ppt
the_secret_life_of_on_site_search_exposed-marc_cull.ppt
taking_your_analytics_data_beyon-geoff_mack.ppt
taking_your_analytics_data_beyon-davide_leigh.ppt
taking_your_analytics_data_-shuman_ghosemajumder.ppt
tag_you_are_it_how_to_leverage_you-dan_zarrella.ppt
tag_you_are_it_how_to_leverag-geoff_livingston.ppt
social_media_the_big_sexy_buzz-kent_schoen.ppt
social_media_the_big_sexy_buzz-guillaume_bouchard.ppt
seo_design_and_organic_site_structure-mark_jackson.ppt
seo_design_and_organic_site_structure-alan_knecht.ppt
seo_design_and_organic-lyndsay_walker_blahut.ppt
seo_and_big_search-maile_ohye.ppt
seo_and_big_search-david_roth.ppt
reputation_monitoring_and_management-andy_beal.ppt
reputation_monitoring_and_managememt-jessica_berlin.ppt

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25 Week I
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding

getting_rid_of_duplicate_content_iss-ben_dangelo.ppt

  • 1. Getting Rid of Duplicate Content Issues Once and For All PubCon, Las Vegas November 13, 2008 Ben D’Angelo Software Engineer
  • 2. What are “duplicate content issues”? Multiple disjoint situations! Duplicate content within your site or sites Multiple URLs pointing to the same page, similar pages Different countries (same language) Duplicate content across other sites Syndicated content Scraped content
  • 3. Guiding principle One URL for one piece of content Why? Users don’t like duplicates in results Saves resources in our index—more room for other pages from your site! Saves resources on your server
  • 4. Sources of duplicates within your sites Multiple URLs pointing to the same page www vs non-www Session ids, URL parameters Printable versions of pages CNAMEs Similar content on different pages Manufacturer’s databases Different countries
  • 5. Many systems for de-duping URLs at various stages in our crawl/index pipeline General idea: cluster pages, choose the “best” representative Different filters are used for different types of duplicate content Goal: serve one version of the content in search results Generally just a filter: it will not destroy your site How does Google handle this?
  • 6. What can you do about your site? For exact dupes: 301 Tracking URLs www vs non-www (also Google Webmaster Tools) Near duplicates: noindex / robots.txt Printable pages Clones of other sites Domains by country Different languages is not duplicate content Use unique content specific to the country Use different TLDs (also Google Webmaster Tools) for geo-targeting Url parameters Put data which does not affect the substance of a page in a cookie
  • 7. What can you do about your site? Choose www or non-www as preferred
  • 8. What can you do about your site?
  • 9. What can you do about another site? Include original absolute URL in syndicated content Syndicate different content If you use syndicated content, manage your expectations Don’t worry about scrapers or proxies too much; they generally don’t affect your rankings If you are concerned, file a DMCA request ( http://www.google.com/dmca.html ) Spam report ( https://www.google.com/webmasters/tools/spamreport )
  • 10. Best practices for Google Avoid duplicate URLs / sites Generate unique, compelling content for users Don’t be overly concerned with duplicate content Let us know about any issues at the Webmaster Help Forum
  • 11. Useful links Webmaster Central  http://google.com/webmasters/ Webmaster Central Blog http://googlewebmastercentral.blogspot.com/ Webmaster Help Center http://www.google.com/support/webmasters/ Webmaster Discussion Group http://groups.google.com/group/Google_Webmaster_Help