Mastering Robots.txt: A Step-by-Step Guide for Beginners Who Want to Control What Google Sees

Mastering Robots.txt: A Step-by-Step Guide for Beginners Who Want to Control What Google Sees

Ever searched your site on Google and seen things like internal search pages, PDF files, or weird sorting URLs indexed?

If you said yes, your robots.txt file might be missing or misconfigured.

The good news? You don’t need to be a developer to fix it.

Let’s walk through how robots.txt works, why it matters for your SEO, and how you can configure it even if you’re not tech-savvy.

Google's guidelines about Robots.txt.


What Is Robots.txt (And Why Should You Care)?

robots.txt is a small text file in your website’s root directory. It gives instructions to search engine bots — what to crawl, and what to skip.

Common Use Cases of Robots.txt:

  • Prevent indexing of admin, login, or staging pages
  • Block duplicate pages like sorting/filtering URLs
  • Hide internal search result pages from Google
  • Save your crawl budget by excluding low-value pages
  • Avoid exposing PDFs, .zip files, or private folders
  • Direct search bots to your sitemap


Robots.txt: The Language of Search Engines

Search engine crawlers (like Googlebot or Bingbot) read this file first when visiting your site.

They follow specific rules you define in it. Think of robots.txt as your site’s instruction manual for bots.


Step 1: Understanding “User-Agent”

Each bot has a unique name called a User-Agent. You can target all bots using * or give special instructions to one.

User-agent: *         ← applies to all bots  
User-agent: Googlebot ← applies to Google only  
        

Step 2: Disallow What You Don’t Want Indexed

Use the Disallow rule to tell bots NOT to crawl certain folders or pages.

Disallow: /wp-admin/
Disallow: /private-folder/
Disallow: /cart/
        

Want to block everything?

Disallow: /
        

Step 3: Allow Specific Pages (Even If Folder Is Blocked)

Let’s say you blocked a folder but want one file to still be indexed. Use Allow.

Disallow: /images/
Allow: /images/logo.png
        

This tells bots: don’t crawl the whole images folder, but do index the logo.


Step 4: Set Crawl-Delay (If Needed)

Worried about bots slowing down your server? You can ask them to crawl slower.

Crawl-delay: 10
        

Note: Google doesn’t follow this, but Bing and Yandex do.


Step 5: Add Your Sitemap

This helps search engines find your pages easily.

Sitemap: https://yourdomain.com/sitemap.xml
        

Add this at the bottom of your robots.txt file.


Step 6: Understanding Robots.txt Syntax

Here’s a quick format refresher so you don’t get stuck:

  • User-agent: — which bot the rule is for
  • Disallow: — block access to a folder or file
  • Allow: — override disallow rules if needed
  • Crawl-delay: — slow down crawling (optional)
  • Sitemap: — point to your sitemap URL


Step 7: Place It in the Right Location

Your robots.txt file must live in the root of your domain.

Example:

https://www.example.com/robots.txt
        

If it’s placed elsewhere (like inside a folder), bots won’t see it.


Step 8: Managing Robots.txt in WordPress (With Rank Math)

If you use WordPress + Rank Math, you can edit robots.txt from your dashboard.

Go to: Rank Math → General Settings → Edit robots.txt

No FTP access. No code editor. Just plain text, editable inside WordPress.


Step 9: Use Safe Default Rules

Here’s a clean and safe starting point for most websites:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xml
        

This blocks unnecessary folders but keeps your sitemap accessible.


Step 10: Know the Structure of a Robots.txt File

Keep it organized and simple.

Example:

User-agent: *
Disallow: /private/
Disallow: /cart/
Allow: /public/image.png
Sitemap: https://yourdomain.com/sitemap.xml
        

Leave a blank line between different User-agent groups.


Step 11: Block Filter & Sorting Pages (Common in eCommerce)

Filtered or sorted URLs create duplicate content. Use wildcards to block them:

Disallow: /*?sort=
Disallow: /*?filter=
        

The * matches anything, and ? tells it to block all sorted versions.


Step 12: Block Internal Search Pages

These often have little SEO value and should be excluded.

Disallow: /?s=
        

This blocks URLs like: yourdomain.com/?s=seo


Step 13: Block Specific Paths on Your Website

Have a folder you don’t want indexed?

Disallow: /confidential/
Disallow: /test-folder/
        

Step 14: Block Specific File Types

Want to hide .pdf or .zip files?

Disallow: /*.pdf$
Disallow: /*.zip$
        

The $ ensures only URLs ending in .pdf are blocked.


Step 15: Allow Only One Bot (Block Others)

Want just Google to crawl your site? Here’s how:

User-agent: *
Disallow: /

User-agent: Googlebot
Disallow:
        

This tells all bots “stay out”, except Google.


Step 16: Debug Robots.txt Like a Pro

If something isn’t working as expected, don’t panic.

Here’s how to debug:

  • Use Google Search Console → Robots.txt Tester
  • Use Ahrefs, Screaming Frog, or SEO Site Checkup
  • Manually visit yourdomain.com/robots.txt
  • Double-check for typos, missing slashes, or accidental blocks


Final Thoughts

Your robots.txt file is small, but powerful.

✅ Done right, it protects your crawl budget, hides useless pages, and improves SEO. ❌ Done wrong, it can block your whole site from Google.

Start simple. Test changes. And when in doubt — ask someone who’s done it before.

#TechnicalSEO #RobotsTxt #WordPressSEO #SearchEngineOptimization #SEOTips #SiteAudit #NoCodeSEO



Excellent resource, Taj Uddin Ahmed Your clear and detailed explanations are impressive

Abdul Basit Khan

The LinkedIn Content Guy! I Make Banger LinkedIn Content for CEOs and Founders of Service Businesses. | Results Guaranteed.

2mo

Great work Taj!

Salman Munir

CEO @ AdLinked | Generated $15M+ in Sales for DTC Ecom & Service-Based Brands Via Paid Ads | Top 1% LinkedIn & Meta Ads Expert (Favikon) | Strategic Consultant & Mentor to 100+ Brands & Marketers

2mo

Taj Uddin Ahmed  Small config tweaks like this can have massive impact. Knowing where to block is just as vital as what to rank.

Asad Najeeb

SEO Consultant | Helping SMBs Scale with 10+ Years of Experience | Book Your Free Consultation

2mo

Blocking unwanted pages can save your site from SEO confusion.

Annaya ( Imrana )

Personal Branding Strategist | I Help Coaches, Founders & Professionals Build Authority & Attract Opportunities on LinkedIn

2mo

Definitely worth reading

To view or add a comment, sign in

Others also viewed

Explore topics