SlideShare a Scribd company logo
What I learned from
analysing thousands
of robots.txt files
samgipson
# brightonSEO
2020
Of all things...why robots.txt?
samgipson
# brightonSEO
2nd July 2019.
samgipson
# brightonSEO
samgipson
# brightonSEO
Google
Webmasters
blog post
samgipson
# brightonSEO
Robots
Exclusion
Checker
samgipson
# brightonSEO
How many top performing
sites still use unsupported
or incorrect rules?
What are the most
common mistakes within
robots.txt?
samgipson
# brightonSEO
Robots.txt: The history
samgipson
# brightonSEO
Based on Robots
Exclusion
Protocol (REP)
samgipson
# brightonSEO
Millions of sites use a robots.txt file
samgipson
# brightonSEO
Despite not an
official internet
standard
samgipson
# brightonSEO
samgipson
# brightonSEO
Control the content
crawlers can and
can’t access
It’s hugely
powerful.
Mistakes can
cost you big.
samgipson
# brightonSEO
Did you guess the year?
samgipson
# brightonSEO
1994!
samgipson
# brightonSEO
In 2019 Google submitted a
revised REP draft to try to make it
an official standard
samgipson
# brightonSEO
FACT
Robots.txt: The basics
samgipson
# brightonSEO
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
{field}
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
samgipson
# brightonSEO
User-agent: googlebot
Disallow: /checkout/
samgipson
# brightonSEO
User-agent: googlebot
Disallow: /checkout/
{value}
samgipson
# brightonSEO
User-agent: *
Allow:Dis /checkout/
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
{directive or rule}
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
{path}
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
{group}
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
User-agent: googlebot
Disallow: /checkout/
Disallow: /basket/
{group A}
{group B}
robots.txt controls crawling
not indexation
samgipson
# brightonSEO
FACT
samgipson
# brightonSEO
Here’s where it got confusing...
samgipson
# brightonSEO
Google used to support
unofficial directives
samgipson
# brightonSEO
samgipson
# brightonSEO
HTML <head>
<meta name=”robots”
content=”noindex, nofollow”>
samgipson
# brightonSEO
HTTP header
X-Robots-Tag: googlebot:
noindex, nofollow
samgipson
# brightonSEO
robots.txt
User-agent: googlebot
Noindex: /checkout/
Nofollow: /checkout/
Webmasters / SEOs realised that
noindex: worked
samgipson
# brightonSEO
samgipson
# brightonSEO
Google
Webmasters
blog post
My analysis
samgipson
# brightonSEO
STEP ONE
samgipson
# brightonSEO
Identified top
traffic driving
sites across a
range of sectors
samgipson
# brightonSEO
Automotive
Computing
Cooking/Recipes
Electronics
Fashion
Gambling
Hardware
samgipson
# brightonSEO
Health/Medical
Insurance
Jobs
News
Real Estate
Telecoms
Travel
samgipson
# brightonSEO
STEP TWO
samgipson
# brightonSEO
Extracted the
robots.txt files
for 40,000
unique domains
samgipson
# brightonSEO
STEP THREE
samgipson
# brightonSEO
Noindex:
Nofollow:
Crawl-delay:
samgipson
# brightonSEO
<field>
<value>
<directive>
<path>
samgipson
# brightonSEO
Results: Unsupported rules
samgipson
# brightonSEO
samgipson
# brightonSEO
Out of the 40,000 site analysed,
0.5% used unsupported rules
Nofollow:
samgipson
# brightonSEO
1 Gambling
40,000 domains analysed
Crawl-delay:
samgipson
# brightonSEO
2,600
40,000 domains analysed
Crawl-delay:
samgipson
# brightonSEO
Real Estate
Hardware/DIY
Fashion
2,600
40,000 domains analysed
Noindex:
samgipson
# brightonSEO
220
40,000 domains analysed
Noindex:
samgipson
# brightonSEO
220
Retail
Finance
Jobs
Health
40,000 domains analysed
Brands using outdated rules
samgipson
# brightonSEO
Results: Basic Mistakes
samgipson
# brightonSEO
Issue 1
samgipson
# brightonSEO
<field> name spelt
incorrectly
samgipson
# brightonSEO
<field> name is case
insensitive
FACT
This is ok:
samgipson
# brightonSEO
User-Agent
user-agent
USER-AGENT
UsEr-AgEnt
This ISN’T:
samgipson
# brightonSEO
useragent
user agent
er-agent
ser-agent
user-agnet
<field> name errors
samgipson
# brightonSEO
Telecoms30
40,000 domains analysed
Issue 2
samgipson
# brightonSEO
Incorrect user-agent
<value>
samgipson
# brightonSEO
User-agent <value> is
case insensitive
FACT
This is ok:
samgipson
# brightonSEO
Googlebot
googlebot
GOOGLEBOT
Bingbot
bingbot
This is a grey area:
samgipson
# brightonSEO
Googlebotrandomtext
Google bot
goglebot
Google
Issue 3
samgipson
# brightonSEO
Incorrect directives
samgipson
# brightonSEO
<directives> are case
insensitive
FACT
This is ok:
samgipson
# brightonSEO
allow:
ALLOW:
Allow:
disallow:
DISALLOW:
Disallow:
This ISN’T:
samgipson
# brightonSEO
dissalow:
dissallow:
disallo:
disalow:
allw:
<directive> errors
samgipson
# brightonSEO
All18
40,000 domains analysed
Issue 3
samgipson
# brightonSEO
Invalid <path>
format
samgipson
# brightonSEO
URL <path> should start
with a /
FACT
This is ok:
samgipson
# brightonSEO
Disallow: /checkout/
Disallow: /*?delivery_type
Disallow: *?delivery_type
This ISN’T:
samgipson
# brightonSEO
Disallow: .js
Disallow: .css
Disallow: WebResource.axd
Disallow: ScriptResource.axd
Disallow: js/
Disallow: http://site.com/page
Incorrect <path>
samgipson
# brightonSEO
Equal spread
across sectors
231
40,000 domains analysed
Brands using incorrect <path>
samgipson
# brightonSEO
samgipson
# brightonSEO
URL <path> IS case
sensitive
FACT
Additional takeaways
samgipson
# brightonSEO
samgipson
# brightonSEO
A specific user-agent
overrules a catchall
FACT
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
Disallow: /*?delivery_type
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
Disallow: /*?delivery_type
User-agent: googlebot
Disallow: /another-folder/
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
Disallow: /*?delivery_type
User-agent: googlebot
Disallow: /another-folder/
samgipson
# brightonSEO
User-agent: *
Disallow: /checkout/
Disallow: /*?delivery_type
User-agent: googlebot
Disallow: /checkout/
Disallow: /*?delivery_type
Disallow: /another-folder/
samgipson
# brightonSEO
The order of
<directives> doesn’t
matter for most bots
FACT
samgipson
# brightonSEO
Specificity (length) of
the matching rule wins
FACT
samgipson
# brightonSEO
https://example.com/page
disallow: /
allow: /p
samgipson
# brightonSEO
https://example.com/page
disallow: /
allow: /p
samgipson
# brightonSEO
https://example.com/page
disallow: /
allow: /p
samgipson
# brightonSEO
Conflict?
Least restrictive WINS
samgipson
# brightonSEO
You can group
user-agents together
FACT
samgipson
# brightonSEO
User-agent: googlebot
Disallow: /checkout/
Disallow: /*?delivery_type
User-agent: bingbot
Disallow: /checkout/
Disallow: /*?delivery_type
samgipson
# brightonSEO
User-agent: googlebot
User-agent: bingbot
Disallow: /checkout/
Disallow: /*?delivery_type
Summary
samgipson
# brightonSEO
samgipson
# brightonSEO
Google are pushing for REP to
become an Internet standard
samgipson
# brightonSEO
We should all be pushing for a
best practice robots.txt
samgipson
# brightonSEO
Avoid Google having to make
allowances for inaccuracies
samgipson
# brightonSEO
Who knows…
they may suddenly stop
samgipson
# brightonSEO
Get the basics right. Many big
brands aren’t.
samgipson
# brightonSEO
Test.
samgipson
# brightonSEO
Dig deeper.
samgipson
# brightonSEO
Nail it.
Further reading/resources
samgipson
# brightonSEO
samgipson
# brightonSEO
Tools Articles
Chrome Extension: Robots Exclusion Checker
samgipson.com/robots/
ContentKing: Robots.txt for SEO
contentkingapp.com/academy/robotstxt/
Ayima: Robots.txt Parser
ayima.com/robots/
Builtvisible: An SEO Guide to Robots.txt
builtvisible.com/wildcards-in-robots-txt/
Google’s Webmaster Robots.txt Testing Tool
google.com/webmasters/tools/robots-testing-tool
Original Robots.txt Draft (1996)
robotstxt.org/norobots-rfc.txt
Google’s C++ robots.txt parser and matcher
github.com/google/robotstxt
Google’s Robot Exclusion Protocol Draft (2019)
ietf.org/archive/id/draft-rep-wg-topic-00.txt
Thank you.
samgipson
# brightonSEO
@samgipson
samgipson
samgipson.com

More Related Content

PPTX
Google Webmaster Tools
PPT
Diagnosing Technical Issues With Search Engine Optimization
PDF
Developing Technical SEO Skills - Brighton SEO Sept 2021
PDF
BrightonSEO 2017 - SEO quick wins from a technical check
PPTX
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
PDF
Crawling & Indexing for JavaScript Heavy Sites brightonSEO 2021
PDF
Technical SEO.pdf
PPTX
Technical SEO Terms for Advanced SEO
Google Webmaster Tools
Diagnosing Technical Issues With Search Engine Optimization
Developing Technical SEO Skills - Brighton SEO Sept 2021
BrightonSEO 2017 - SEO quick wins from a technical check
A Crash Course in Technical SEO from Patrick Stox - Beer & SEO Meetup May 2019
Crawling & Indexing for JavaScript Heavy Sites brightonSEO 2021
Technical SEO.pdf
Technical SEO Terms for Advanced SEO

Similar to What I learned from analysing thousands of robots.txt files | BrightonSEO 2020 (20)

PPTX
SEO Fundamentals - PubCon Las Vegas 2014
PDF
Website Audit Report Sample
PDF
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
DOCX
Article19
PDF
SEO Checklists
PDF
The Conjunction of Search and Social Media Marketing by Gillian Muessig
PPTX
2011-11 Chennai Social Media Summit Keynote
PDF
Technical SEO Checklist for Beginners
DOCX
Read This FirstAnnielytics.com@AnnieCushingNOTES ABOUT THIS WORKBO.docx
PPTX
Search Engine Ranking Factors Demystified: SEO Signals
PDF
How to check your website for Technical SEO
PPTX
Technical SEO Updated
PPTX
Brief
PDF
Seo101-http://seofirstpage.ir/ هانیه غفرانی آموزش سئو سایت-seo book
PDF
Google Critical Changes to SEO - April 2013
PPT
SEO 2012 by Navneet Kaushal
PPTX
Web master guidelines ppt
PPTX
Web master guidelines ppt
PPTX
Web master guidelines ppt
PPTX
Web master guidelines ppt (2)
SEO Fundamentals - PubCon Las Vegas 2014
Website Audit Report Sample
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
Article19
SEO Checklists
The Conjunction of Search and Social Media Marketing by Gillian Muessig
2011-11 Chennai Social Media Summit Keynote
Technical SEO Checklist for Beginners
Read This FirstAnnielytics.com@AnnieCushingNOTES ABOUT THIS WORKBO.docx
Search Engine Ranking Factors Demystified: SEO Signals
How to check your website for Technical SEO
Technical SEO Updated
Brief
Seo101-http://seofirstpage.ir/ هانیه غفرانی آموزش سئو سایت-seo book
Google Critical Changes to SEO - April 2013
SEO 2012 by Navneet Kaushal
Web master guidelines ppt
Web master guidelines ppt
Web master guidelines ppt
Web master guidelines ppt (2)
Ad

Recently uploaded (20)

PPTX
B2B Marketplace India – Connect & Grow..
PDF
Instagram Marketing Agency by IIS INDIA.pdf
PDF
20K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
PPTX
Choose the Right SEO Agency India - 7 Key Tips by Clickbold Media
PDF
Generation Alpha Report 2025 x DKC Analytics.pdf
PDF
Dream Powell - Project and Portfolio 3: Marketing
PDF
Wondershare Filmora Crack Free Download 2025
PPTX
APA Examples Reference Examples Style and
PPTX
Transform Your Business with Top Digital Marketing Services_EGlogics.pptx
PPTX
Digital-Marketing-Strategy-Trends-and-Best-Practices-for-2025 PPT3.pptx
DOCX
procubiz_modern digital marketingblog.docx
PDF
How to Break Into AI Search with Andrew Holland
PPTX
Opening presentation of Sangam Hospital Bodeli
PDF
SEO vs. AEO: Optimizing for Google vs AI-Powered Search Assistants
PDF
Keshav Solutions Pest Control || Trending Branding Digital Solutions
PDF
AI powered Digital Marketing- How AI changes
PDF
Boost Sales Around the Clock with AI Chatbots for Marketing
PPTX
CH 1 AN INTRODUCTION OF INTEGRATED MARKETING COMMUNICATION (COMBINE)
DOCX
IREV Platform: Future of Affiliate Marketing
PDF
5 free to use google tools to understand your customers online behavior in 20...
B2B Marketplace India – Connect & Grow..
Instagram Marketing Agency by IIS INDIA.pdf
20K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
Choose the Right SEO Agency India - 7 Key Tips by Clickbold Media
Generation Alpha Report 2025 x DKC Analytics.pdf
Dream Powell - Project and Portfolio 3: Marketing
Wondershare Filmora Crack Free Download 2025
APA Examples Reference Examples Style and
Transform Your Business with Top Digital Marketing Services_EGlogics.pptx
Digital-Marketing-Strategy-Trends-and-Best-Practices-for-2025 PPT3.pptx
procubiz_modern digital marketingblog.docx
How to Break Into AI Search with Andrew Holland
Opening presentation of Sangam Hospital Bodeli
SEO vs. AEO: Optimizing for Google vs AI-Powered Search Assistants
Keshav Solutions Pest Control || Trending Branding Digital Solutions
AI powered Digital Marketing- How AI changes
Boost Sales Around the Clock with AI Chatbots for Marketing
CH 1 AN INTRODUCTION OF INTEGRATED MARKETING COMMUNICATION (COMBINE)
IREV Platform: Future of Affiliate Marketing
5 free to use google tools to understand your customers online behavior in 20...
Ad

What I learned from analysing thousands of robots.txt files | BrightonSEO 2020