Mastering Robots.txt: A Step-by-Step Guide for Beginners Who Want to Control What Google Sees

Taj Uddin Ahmed

📈 SEO Expert & Content Strategist | 🤖 AI/LLM SEO Specialist for AIO Visibility & Growth | 🎯 Helping Startups & Visionary Companies Scale Through Data-Driven SEO & Content | 🔷 7+ Years of Proven Success

Published May 31, 2025

+ Follow

Ever searched your site on Google and seen things like internal search pages, PDF files, or weird sorting URLs indexed?

If you said yes, your robots.txt file might be missing or misconfigured.

The good news? You don’t need to be a developer to fix it.

Let’s walk through how robots.txt works, why it matters for your SEO, and how you can configure it even if you’re not tech-savvy.

Google's guidelines about Robots.txt.

What Is Robots.txt (And Why Should You Care)?

robots.txt is a small text file in your website’s root directory. It gives instructions to search engine bots — what to crawl, and what to skip.

Common Use Cases of Robots.txt:

Prevent indexing of admin, login, or staging pages
Block duplicate pages like sorting/filtering URLs
Hide internal search result pages from Google
Save your crawl budget by excluding low-value pages
Avoid exposing PDFs, .zip files, or private folders
Direct search bots to your sitemap

Robots.txt: The Language of Search Engines

Search engine crawlers (like Googlebot or Bingbot) read this file first when visiting your site.

They follow specific rules you define in it. Think of robots.txt as your site’s instruction manual for bots.

Step 1: Understanding “User-Agent”

Each bot has a unique name called a User-Agent. You can target all bots using * or give special instructions to one.

User-agent: *         ← applies to all bots  
User-agent: Googlebot ← applies to Google only

Step 2: Disallow What You Don’t Want Indexed

Use the Disallow rule to tell bots NOT to crawl certain folders or pages.

Disallow: /wp-admin/
Disallow: /private-folder/
Disallow: /cart/

Want to block everything?

Disallow: /

Step 3: Allow Specific Pages (Even If Folder Is Blocked)

Let’s say you blocked a folder but want one file to still be indexed. Use Allow.

Disallow: /images/
Allow: /images/logo.png

This tells bots: don’t crawl the whole images folder, but do index the logo.

Step 4: Set Crawl-Delay (If Needed)

Worried about bots slowing down your server? You can ask them to crawl slower.

Crawl-delay: 10

Note: Google doesn’t follow this, but Bing and Yandex do.

Step 5: Add Your Sitemap

This helps search engines find your pages easily.

Sitemap: https://yourdomain.com/sitemap.xml

Add this at the bottom of your robots.txt file.

Step 6: Understanding Robots.txt Syntax

Here’s a quick format refresher so you don’t get stuck:

User-agent: — which bot the rule is for
Disallow: — block access to a folder or file
Allow: — override disallow rules if needed
Crawl-delay: — slow down crawling (optional)
Sitemap: — point to your sitemap URL

Step 7: Place It in the Right Location

Your robots.txt file must live in the root of your domain.

Example:

https://www.example.com/robots.txt

If it’s placed elsewhere (like inside a folder), bots won’t see it.

Step 8: Managing Robots.txt in WordPress (With Rank Math)

If you use WordPress + Rank Math, you can edit robots.txt from your dashboard.

Go to: Rank Math → General Settings → Edit robots.txt

No FTP access. No code editor. Just plain text, editable inside WordPress.

Step 9: Use Safe Default Rules

Here’s a clean and safe starting point for most websites:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xml

This blocks unnecessary folders but keeps your sitemap accessible.

Step 10: Know the Structure of a Robots.txt File

Keep it organized and simple.

Example:

User-agent: *
Disallow: /private/
Disallow: /cart/
Allow: /public/image.png
Sitemap: https://yourdomain.com/sitemap.xml

Leave a blank line between different User-agent groups.

Step 11: Block Filter & Sorting Pages (Common in eCommerce)

Filtered or sorted URLs create duplicate content. Use wildcards to block them:

Disallow: /*?sort=
Disallow: /*?filter=

The * matches anything, and ? tells it to block all sorted versions.

Step 12: Block Internal Search Pages

These often have little SEO value and should be excluded.

Disallow: /?s=

This blocks URLs like: yourdomain.com/?s=seo

Step 13: Block Specific Paths on Your Website

Have a folder you don’t want indexed?

Disallow: /confidential/
Disallow: /test-folder/

Step 14: Block Specific File Types

Want to hide .pdf or .zip files?

Disallow: /*.pdf$
Disallow: /*.zip$

The $ ensures only URLs ending in .pdf are blocked.

Step 15: Allow Only One Bot (Block Others)

Want just Google to crawl your site? Here’s how:

User-agent: *
Disallow: /

User-agent: Googlebot
Disallow:

This tells all bots “stay out”, except Google.

Step 16: Debug Robots.txt Like a Pro

If something isn’t working as expected, don’t panic.

Here’s how to debug:

Use Google Search Console → Robots.txt Tester
Use Ahrefs, Screaming Frog, or SEO Site Checkup
Manually visit yourdomain.com/robots.txt
Double-check for typos, missing slashes, or accidental blocks

Final Thoughts

Your robots.txt file is small, but powerful.

✅ Done right, it protects your crawl budget, hides useless pages, and improves SEO. ❌ Done wrong, it can block your whole site from Google.

Start simple. Test changes. And when in doubt — ask someone who’s done it before.

#TechnicalSEO #RobotsTxt #WordPressSEO #SearchEngineOptimization #SEOTips #SiteAudit #NoCodeSEO

M Usama

SEO Consultant

2mo

Excellent resource, Taj Uddin Ahmed Your clear and detailed explanations are impressive

1 Reaction

Abdul Basit Khan

The LinkedIn Content Guy! I Make Banger LinkedIn Content for CEOs and Founders of Service Businesses. | Results Guaranteed.

2mo

Great work Taj!

1 Reaction

Salman Munir

CEO @ AdLinked | Generated $15M+ in Sales for DTC Ecom & Service-Based Brands Via Paid Ads | Top 1% LinkedIn & Meta Ads Expert (Favikon) | Strategic Consultant & Mentor to 100+ Brands & Marketers

2mo

Taj Uddin Ahmed Small config tweaks like this can have massive impact. Knowing where to block is just as vital as what to rank.

2 Reactions

Asad Najeeb

SEO Consultant | Helping SMBs Scale with 10+ Years of Experience | Book Your Free Consultation

2mo

Blocking unwanted pages can save your site from SEO confusion.

3 Reactions

Annaya ( Imrana )

Personal Branding Strategist | I Help Coaches, Founders & Professionals Build Authority & Attract Opportunities on LinkedIn

2mo

Definitely worth reading

2 Reactions

See more comments

To view or add a comment, sign in

See all

Mastering Robots.txt: A Step-by-Step Guide for Beginners Who Want to Control What Google Sees

Taj Uddin Ahmed

📈 SEO Expert & Content Strategist | 🤖 AI/LLM SEO Specialist for AIO Visibility & Growth | 🎯 Helping Startups & Visionary Companies Scale Through Data-Driven SEO & Content | 🔷 7+ Years of Proven Success

What Is Robots.txt (And Why Should You Care)?

Common Use Cases of Robots.txt:

Robots.txt: The Language of Search Engines

Step 1: Understanding “User-Agent”

Step 2: Disallow What You Don’t Want Indexed

Step 3: Allow Specific Pages (Even If Folder Is Blocked)

Step 4: Set Crawl-Delay (If Needed)

Step 5: Add Your Sitemap

Step 6: Understanding Robots.txt Syntax

Step 7: Place It in the Right Location

Step 8: Managing Robots.txt in WordPress (With Rank Math)

Step 9: Use Safe Default Rules

Step 10: Know the Structure of a Robots.txt File

Step 11: Block Filter & Sorting Pages (Common in eCommerce)

Step 12: Block Internal Search Pages

Step 13: Block Specific Paths on Your Website

Step 14: Block Specific File Types

Step 15: Allow Only One Bot (Block Others)

Step 16: Debug Robots.txt Like a Pro

Here’s how to debug:

Final Thoughts

More articles by this author

Others also viewed

Our Great Copilot Journey: Sparking a firmwide AI Movement

Fake books, real robots, and the future of enterprise AI

AI news and funding updates from the last 24 hours(24th June 2025)

Introducing Agentic_Robots.txt - Automating Agent Access to Websites

SWE-1 Might Be the Real Reason OpenAI Dropped $3B on Windsurf

I Rated Microsoft Copilot's Top Features 1–10 (Do You REALLY Need it?) ⭐⭐⭐⭐⭐

CoPilot gets good and OpenAI gets connected.

Welcome to the Age of Copilot Agents

🚀 Vibe‑Coding Your Way to Faster Health‑Software Prototypes

The 80/20 AI Stack For Agencies

Explore topics

What Is Robots.txt (And Why Should You Care)?

Common Use Cases of Robots.txt:

Robots.txt: The Language of Search Engines

Step 1: Understanding “User-Agent”

Step 2: Disallow What You Don’t Want Indexed

Step 3: Allow Specific Pages (Even If Folder Is Blocked)

Step 4: Set Crawl-Delay (If Needed)

Step 5: Add Your Sitemap

Step 6: Understanding Robots.txt Syntax

Step 7: Place It in the Right Location

Step 8: Managing Robots.txt in WordPress (With Rank Math)

Step 9: Use Safe Default Rules

Step 10: Know the Structure of a Robots.txt File

Step 11: Block Filter & Sorting Pages (Common in eCommerce)

Step 12: Block Internal Search Pages

Step 13: Block Specific Paths on Your Website

Step 14: Block Specific File Types

Step 15: Allow Only One Bot (Block Others)

Step 16: Debug Robots.txt Like a Pro

Here’s how to debug:

Final Thoughts

How I Built My Career: A Journey of Experimentation & Growth

Mar 7, 2025

SEO Changes to Expect in 2025

Dec 16, 2024

SEO & Content Marketing Plan

Mar 8, 2024

Elevate Your SEO Game with a Stellar Content Strategy! 🚀

Mar 4, 2024

Others also viewed

Our Great Copilot Journey: Sparking a firmwide AI Movement

Fake books, real robots, and the future of enterprise AI

AI news and funding updates from the last 24 hours(24th June 2025)

Introducing Agentic_Robots.txt - Automating Agent Access to Websites

SWE-1 Might Be the Real Reason OpenAI Dropped $3B on Windsurf

I Rated Microsoft Copilot's Top Features 1–10 (Do You REALLY Need it?) ⭐⭐⭐⭐⭐

CoPilot gets good and OpenAI gets connected.

Welcome to the Age of Copilot Agents

🚀 Vibe‑Coding Your Way to Faster Health‑Software Prototypes

The 80/20 AI Stack For Agencies

Explore topics

🚀 Vibe‑Coding Your Way to Faster Health‑Software Prototypes