Robots.txt Validator

Validate your robots.txt file for syntax errors, warnings, and SEO issues. Paste content or fetch from any domain.

Validation Results

Parsed Rules

User-AgentDirectiveValue

Test a URL

Enter a path and select a user-agent to check if it would be blocked or allowed.

How to use the Robots.txt Validator

robots.txt syntax has subtleties — wildcards, end-anchors, conflicting allow/disallow rules. The validator simulates a crawler request and tells you whether a specific URL would be allowed or blocked.

1

Paste your robots.txt content

Either paste the full robots.txt text or fetch from a live URL. The validator parses it the way Google's crawler does.

2

Enter test URLs

Add the URLs you want to verify — admin paths you intend to block, public paths you intend to allow, and any edge cases (URLs with query strings, trailing slashes).

3

Pick the user-agent

Test as Googlebot, Bingbot, AhrefsBot, GPTBot, etc. User-agent-specific rules override the wildcard * block, so per-bot testing reveals conflicts.

4

Review the verdict per URL

Each URL shows ALLOWED or BLOCKED, plus the specific rule that decided it. If the result differs from your intent, fix the rule before deploying.

Why validating robots.txt before deploy is non-negotiable

A misconfigured robots.txt can deindex your entire site in 48 hours. The fix is fast (one file edit) but the recovery — getting Google to re-crawl and re-index — takes weeks. Always validate before deploy.

Common robots.txt syntax mistakes

How Google interprets conflicts

When a URL matches both an Allow and a Disallow rule, Google picks the rule with the longer pattern, not the one that appears first. Disallow: /products/ + Allow: /products/featured/ means /products/featured/ is allowed (because /products/featured/ is longer than /products/). Test this — humans frequently get it wrong.

Frequently asked questions

What does a robots.txt validator do?

It simulates a web crawler reading your robots.txt and tells you which URLs would be allowed or blocked under the rules. This catches misconfigurations before they affect Google's crawl — wildcard mistakes, path mismatches, allow/disallow conflicts.

Why is my page blocked when I didn't intend to?

The most common causes: a parent directory disallow (Disallow: /api/ blocks /api/public/), a wildcard rule matching unintentionally (Disallow: /*.json$ may block files you wanted indexed), or a User-agent-specific rule overriding the wildcard. The validator points to the exact rule that decided each verdict.

How does Google handle conflicts in robots.txt?

When a URL matches both an Allow and a Disallow rule, Google picks the rule with the longer pattern. So Disallow: /admin/ with Allow: /admin/public/ means /admin/public/ is allowed (longer pattern). Order in the file doesn't matter.

Can I block specific user agents?

Yes. Add a separate User-agent: block per bot you want different rules for. User-agent: GPTBot followed by Disallow: / blocks ChatGPT's crawler. The wildcard User-agent: * applies to bots not explicitly listed.

Does robots.txt prevent indexation?

No — only crawling. A blocked page can still appear in search results if other sites link to it; Google just won't have a description. To prevent indexation, use noindex meta tags or the X-Robots-Tag HTTP header. Important: blocked pages can't be seen by Google to read the noindex tag.

Want AI-generated blog content that ranks? Try Autorank free.

Get Started Free →

Robots.txt Best Practices

The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot request. It lives at the root of your domain (e.g. https://example.com/robots.txt) and follows a simple text-based protocol.

Valid Directives

Common Mistakes to Avoid

Wildcard Patterns

Google and Bing support * (match any sequence) and $ (end of URL) in path patterns. For example, Disallow: /*.pdf$ blocks all PDF files.