Optimizing Your WordPress Robots.txt File

webpage showing robots.txt directives

Many WordPress site owners ignore the robots.txt file. This often happens because they stick to the default settings. This small file helps search engines crawl your site better. It saves your crawl budget and stops non-valuable pages from being indexed and served on search engines like Google and Bing.

In this guide, we’ll discuss what to include in a WordPress robots.txt file and what to avoid. This will help you keep your site lean, crawl-friendly, and optimized for search engines.

What Is the Robots.txt File?

The robots.txt file is a simple text file located in the root directory of your domain or subdomain. It tells search engine crawlers which pages or sections of your site they’re allowed to crawl and index.

WordPress automatically creates a virtual robots.txt file. You can find it at yoursite.com/robots.txt. This version doesn’t exist on your server. It’s dynamically created and includes only basic rules and instructions.

If you want more control, you can upload a custom file to your site’s root directory. To do this, use an FTP tool or a plugin that provides a file editor, such as an SEO plugin, to manage this directly from your WordPress dashboard.

Why the Default Robotx.txt File is Not Enough.

The default file looks like this:

User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php

This setup is safe. It keeps crawlers out of your admin backend and lets Ajax work properly. However, it doesn’t address optimization chances or reduce crawl waste.

If your site has too many useless pages in Google, or if your crawl stats show wasted resources, it’s time to upgrade your robots.txt.

Always Include XML Sitemaps in Robotx.txt File.

Help search engines find and understand your content by referencing your XML sitemaps in your robots.txt file:

Sitemap: https://rankwiseseo.com/sitemaps.xml

Post Sitemap: https://rankwiseseo.com/post-sitemap1.xml

Page Sitemap: https://rankwiseseo.com/page-sitemap1.xml

Add all important sitemap files here. This is key if you have separate sitemaps for pages, posts, products, or categories.

Items You Should Never Block in Robotx.txt.

Some outdated advice suggests blocking directories like /wp-includes/ or /wp-content/plugins/. Don’t do this. Blocking CSS, JavaScript, or media files can hurt your site’s ability to render properly in Google.

Avoid blocking the following URL paths:

  • /wp-includes/

  • /wp-content/uploads/

  • /wp-content/themes/

  • Any CSS or JavaScript file paths

Let crawlers access the resources they need. This helps them render your pages correctly.

Handling Staging Sites the Right Way.

Disallow crawling for staging or development environments entirely. You can add this to your robots.txt file of your staging or development website.

For example, if your staging site is at https://staging.example.com, the following Robots.txt directive should be available at https://example.staging.com/robots.txt.

User-agent: * Disallow: /

Also, use a noindex meta tag as a backup. If you’re using WordPress, check the box that says “Discourage search engines from indexing this site” in Settings > Reading.

Rankwise PRO Tip: When you go live, remove this block and verify that your production site is crawlable.

Clean Up Non-Essential Paths.

Certain paths have no SEO value and only waste crawl budget. Consider blocking the following:

Disallow: /trackback/

Disallow: /comments/feed/

Disallow: */embed/

Disallow: /cgi-bin/

Disallow: /wp-login.php

This removes junk URLs from your crawl reports and focuses Google’s attention on your actual content.

Control Query Parameters.

URL parameters from plugins or tracking links can flood your crawl stats. Block known offenders to reduce crawl waste:

Disallow: /*?*replytocom=

Disallow: /*?*print=

Disallow: /*?*utm_source=

Disallow: /*?*fbclid=

Use Search Console’s URL Parameters tool to monitor and evaluate others for potential disallow rules.

Trim Down Low-Value Pages

Many WordPress sites auto-generate archive and tag pages that don’t provide much value. Consider blocking these pages:

Disallow: /tag/

Disallow: /?s=

Disallow: /page/

However, if your strategy relies on tag pages for internal linking or discovery, leave them open, but manage them intentionally.

Use Rankwise for Robots.txt Optimization.

Most site owners aren’t sure what to block or allow, so that’s where our team at Rankwise comes in. Our SEO agency looks at your crawl data, site structure, and search performance. Then, we create a custom robots.txt file that fits your business goals.

We go beyond basic directives. We check your whole SEO setup. We ensure your robots.txt helps with faster indexing, improved rankings, and fewer crawl errors.

Need help? Talk to our technical SEO team about improving your crawl control today.

Monitor and Validate Robots.txt Performance.

After setting up your custom Robots.txt file, track its impact:

  • Use Google Search Console > Settings > Crawl Stats.

  • Test blocked URLs with the URL Inspection Tool.

  • Check if important pages are missing from Google’s index.

Also, review your sitemap reports and ensure they reflect only index-worthy URLs.

Simulate Changes with Crawl Tools.

Tools like Screaming Frog SEO Spider help you test robots.txt changes locally before you deploy them. This allows you to validate your disallow rules and ensure they won’t block something critical.

You can also simulate user agents, crawl depths, and JavaScript rendering to mirror how bots crawl your site.

Block AI Crawlers and Non-Compliant Bots.

Some crawlers harvest your content or cause load issues without offering value. Add these blocks to limit bad actors:

User-agent: GPTBot Disallow: /

User-agent: CCBot Disallow: /

User-agent: AhrefsBot Disallow: /

Keep this list updated as new crawlers emerge. Blocking resource-intensive bots helps preserve server resources.

Don’t Forget: Update Robots.txt with Every Major Website Change.

Anytime you:

  • Launch a new content section.

  • Redesign your structure.

  • Migrate to a new domain.

…you should review and update your robots.txt. It’s not a one-and-done file.

Call on Rankwise for SEO Cleanup.

At Rankwise, we help many WordPress site owners fix crawl structures, reduce bloat, and boost indexing.

Need help with your crawl stats? Want to trim unneeded content from Google’s index? Or do you need to set up a robots.txt correctly? Our expert SEO team is ready.

Please book your free consult and let’s tighten up your SEO.

Add a Layer of Protection With HTTP Auth on Staging Sites.

Besides Disallow: /, consider using HTTP authentication for staging environments. This ensures that no bots (or people) can access the environment unless you want them to.

It’s another safety net if a staging site accidentally gets linked.

Use Robots Meta Tags in Tandem with Robots.txt.

Sometimes, robots.txt isn’t enough. Supplement Robots.txt directive with noindex, nofollow, and other robots meta directives directly in your webpage HTML to control which pages get indexed on search engines like Google and Bing.

Robots.txt is for crawling, not indexing. Meta tags handle what appears in the index.

If you have webpages that you don’t want to appear in Google search results, add a noindex HTML tag. This tells search engines like Google and Bing not to display the page on the search engines results page even if their crawler finds the webpage.

Rankwise Can Help Beyond Robots.txt.

If crawl control is just one piece of the puzzle, we also offer full-site audits, technical SEO fixes, and content pruning strategies.

Our goal? To make your site faster, smarter, and easier to understand for Google and more profitable for you.

Let’s talk. Contact Rankwise for a full website SEO review.

Final Thoughts

Your robots.txt file is more powerful than you think. With just a few lines, you can reduce index bloat, prioritize your most valuable pages, and make Google’s job easier.

But don’t treat it like a static asset. Review and refine it as your site evolves.

And when in doubt, don’t guess. Work with the SEO experts at Rankwise to ensure your robots.txt file and your entire SEO strategy stay ahead of the curve.

Author

Picture of Noah Adam

Noah Adam

Noah is an experienced SEO consultant and the founder of Rankwise SEO, a premier SEO agency based in Orange County, California. With over a decade of hands-on experience in technical SEO, content strategy, and organic growth, Noah has helped law firms, tech startups, government agencies, and e-commerce brands boost their search visibility and dominate Google rankings. When he’s not optimizing websites, he’s testing new SEO tactics, decoding algorithm updates, or mentoring businesses on how to grow through smart, ethical SEO.