Mastering Robots.txt: A Step-by-Step Guide for Beginners Who Want to Control What Google Sees
Ever searched your site on Google and seen things like internal search pages, PDF files, or weird sorting URLs indexed?
If you said yes, your robots.txt file might be missing or misconfigured.
The good news? You don’t need to be a developer to fix it.
Let’s walk through how robots.txt works, why it matters for your SEO, and how you can configure it even if you’re not tech-savvy.
What Is Robots.txt (And Why Should You Care)?
robots.txt is a small text file in your website’s root directory. It gives instructions to search engine bots — what to crawl, and what to skip.
Common Use Cases of Robots.txt:
Robots.txt: The Language of Search Engines
Search engine crawlers (like Googlebot or Bingbot) read this file first when visiting your site.
They follow specific rules you define in it. Think of robots.txt as your site’s instruction manual for bots.
Step 1: Understanding “User-Agent”
Each bot has a unique name called a User-Agent. You can target all bots using * or give special instructions to one.
User-agent: * ← applies to all bots
User-agent: Googlebot ← applies to Google only
Step 2: Disallow What You Don’t Want Indexed
Use the Disallow rule to tell bots NOT to crawl certain folders or pages.
Disallow: /wp-admin/
Disallow: /private-folder/
Disallow: /cart/
Want to block everything?
Disallow: /
Step 3: Allow Specific Pages (Even If Folder Is Blocked)
Let’s say you blocked a folder but want one file to still be indexed. Use Allow.
Disallow: /images/
Allow: /images/logo.png
This tells bots: don’t crawl the whole images folder, but do index the logo.
Step 4: Set Crawl-Delay (If Needed)
Worried about bots slowing down your server? You can ask them to crawl slower.
Crawl-delay: 10
Note: Google doesn’t follow this, but Bing and Yandex do.
Step 5: Add Your Sitemap
This helps search engines find your pages easily.
Sitemap: https://yourdomain.com/sitemap.xml
Add this at the bottom of your robots.txt file.
Step 6: Understanding Robots.txt Syntax
Here’s a quick format refresher so you don’t get stuck:
Step 7: Place It in the Right Location
Your robots.txt file must live in the root of your domain.
Example:
https://www.example.com/robots.txt
If it’s placed elsewhere (like inside a folder), bots won’t see it.
Step 8: Managing Robots.txt in WordPress (With Rank Math)
If you use WordPress + Rank Math, you can edit robots.txt from your dashboard.
Go to: Rank Math → General Settings → Edit robots.txt
No FTP access. No code editor. Just plain text, editable inside WordPress.
Step 9: Use Safe Default Rules
Here’s a clean and safe starting point for most websites:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xml
This blocks unnecessary folders but keeps your sitemap accessible.
Step 10: Know the Structure of a Robots.txt File
Keep it organized and simple.
Example:
User-agent: *
Disallow: /private/
Disallow: /cart/
Allow: /public/image.png
Sitemap: https://yourdomain.com/sitemap.xml
Leave a blank line between different User-agent groups.
Step 11: Block Filter & Sorting Pages (Common in eCommerce)
Filtered or sorted URLs create duplicate content. Use wildcards to block them:
Disallow: /*?sort=
Disallow: /*?filter=
The * matches anything, and ? tells it to block all sorted versions.
Step 12: Block Internal Search Pages
These often have little SEO value and should be excluded.
Disallow: /?s=
This blocks URLs like: yourdomain.com/?s=seo
Step 13: Block Specific Paths on Your Website
Have a folder you don’t want indexed?
Disallow: /confidential/
Disallow: /test-folder/
Step 14: Block Specific File Types
Want to hide .pdf or .zip files?
Disallow: /*.pdf$
Disallow: /*.zip$
The $ ensures only URLs ending in .pdf are blocked.
Step 15: Allow Only One Bot (Block Others)
Want just Google to crawl your site? Here’s how:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow:
This tells all bots “stay out”, except Google.
Step 16: Debug Robots.txt Like a Pro
If something isn’t working as expected, don’t panic.
Here’s how to debug:
Final Thoughts
Your robots.txt file is small, but powerful.
✅ Done right, it protects your crawl budget, hides useless pages, and improves SEO. ❌ Done wrong, it can block your whole site from Google.
Start simple. Test changes. And when in doubt — ask someone who’s done it before.
#TechnicalSEO #RobotsTxt #WordPressSEO #SearchEngineOptimization #SEOTips #SiteAudit #NoCodeSEO
SEO Consultant
2moExcellent resource, Taj Uddin Ahmed Your clear and detailed explanations are impressive
The LinkedIn Content Guy! I Make Banger LinkedIn Content for CEOs and Founders of Service Businesses. | Results Guaranteed.
2moGreat work Taj!
CEO @ AdLinked | Generated $15M+ in Sales for DTC Ecom & Service-Based Brands Via Paid Ads | Top 1% LinkedIn & Meta Ads Expert (Favikon) | Strategic Consultant & Mentor to 100+ Brands & Marketers
2moTaj Uddin Ahmed Small config tweaks like this can have massive impact. Knowing where to block is just as vital as what to rank.
SEO Consultant | Helping SMBs Scale with 10+ Years of Experience | Book Your Free Consultation
2moBlocking unwanted pages can save your site from SEO confusion.
Personal Branding Strategist | I Help Coaches, Founders & Professionals Build Authority & Attract Opportunities on LinkedIn
2moDefinitely worth reading